Category: BGP

Hot Potatoes Routing

What is it?

  1. A technique to send transit traffic out of an AS using the shortest path with the goal of reduction in transit traffic

How it is done?

  1. MED and Hot Potatoes are normally exclusive to each other. Reset MED from another AS, and use IGP best path to route traffic out to another AS

RR and Hot Potato Routing

  1. When RR is present, hot potato routing is broken. Because RR only reflect one BEST path, from its perspective. That may not be the best path for the PE perspective.
  2. The solution is to use UNIQUE RD for MPLS VPN, or ADDITIONAL PATH

What feature we need to enable ADDITIONAL PATH?

  1. Enable BGP multipath in ingress PE so the PE can store multiple paths in its RIB
  2. That is all you need if you use iBGP full mesh.
  3. If you have MPLS VPNv4, you can use unique RD
  4. However if you have RR, the RR needs to have ADD-PATH feature enable
  5. If you modify local preference to make a PE the preferred egress route, you need to enable “BGP BEST EXTERNAL” so that additional external route will be advertise via iBGP

Graceful Restart, NSF, NSR and LFA

Key Concepts

  1. Graceful Restart and Non-Stop-Forwarding refers to the same technology
  2. Non-stop-routing is NOT the same as GR (NSF)
  3. When you have LFA and GR, which one do you prefer? LFA is preferred. If there is an pre-calculated alternative path, we can to switch to that alternative path, instead of routing through the current interface.
  4. What GR trying to do is to PREVENT convergence, and keeping the FIB. It is completely different than LFA trying to do, which is to converge as FAST as possible, with a pre-calculated LFA, normally within 50ms.
  5. ISSU requires GR and SSO. GR is to keep routing protocol states, while SSO is to keep the same for everything else (like router config sync, CEF tables).
  6. GR (NSF) is NOT support on SHAM-LINKS and virtual-links (OSPF)
  7. GR requires SSO support. So GR is only for platform with dual RP.
  8. Graceful Restart restarts the routing process, while NSR refreshes the routing process
  9. Graceful Restart is an IETF standard
  10. Graceful Restart MUST be supported on both peers, while NSR does not
  11. GR is supported for OSPF, IS-IS, EIGRP, BGP, LDP. NSR is supported for IS-IS, and BGP
  12. GR uses less memory than NSR, because NSR needs to transfer all states to the standby RP, while GR actually restart the routing process
  13. Both GR and NSR archive the same result, but GR is preferred, because it uses less memory. However if peer does not support GR, use NSR.
  14. Use GR for devices you have control, for example between RR and PE. Use NSR for devices you have no control, for example PE to CE.
  15. Careful thought need to be considered when tuning IGP fast hello when used in conjunction with GR/NSF/SSO/ISSU. If the dead timer is smaller than the the time it takes to perform state-ful failover, GR is not as useful because the IGP already detected link changed and triggered a LSA/LSP update. That negates the reason why we want to use GR. The purpose of GR is to avoid letter routing protocols to know there is a change on the link, to prevent SPF calculation.

External Links:


Enable Backup Path on BGP for Fast Convergence

Backup Path

In order to have fast convergence, we need to have a backup path already on the local BGP RIB.

iBGP session only advertise the BEST path to its peers. As a result, when they are multiple path to a prefix, only one best path is advertised by RR.

There are few solutions to this issue.

  1. ROUTE DISTINGUISHER: If it is a MPLS VPNIPv4 network, use different Route Distinguisher to distinguish the prefix, now RR will advertise both path to the prefix because the VPNIPv4 prefix is difference.
  2. BGP BEST EXTERNAL:For a small network, you can choose not to use RR.  Instead uses iBGP between PEs, and enable advertise BGP Best-External feature. PE will advertise BGP external prefix even tough the prefix is not the best path due to higher local preference prefix advertise by another PE.
  3. RR: Use two set of RR. One set of RR to reflect the primary route, the other set of RR to reflect the secondary route.
  4. ADDITIONAL PATH: enable iBGP to advertise more than one path, using the “ADD PATH” feature. This is NOT a standard BGP behavior, however there is an internet draft for it. This require all RR and PEs support, so may not be practical in large network, due to the expense to upgrade all PEs. Plus the RIB and FIB will increase, need to pay attention to memory requirement. However the most important reason why this is not a preferred solution is because this is a BIG change on how iBGP works.


There are three types of failure scenarios on a SP networks:

  1. P router failure
  2. PE router failure (or link to PE failure)
  3. CE failure (or PE to CE link failure)

The P and PE failure can be detected by IGP. Tuned OSPF and IS-IS both have converge within 1 second, and both have FRR LFA capability, with enable a local repair within 50ms.

The PE-CE link is typically not routed in SP IGP, so the convergence is based on MP-BGP. MP-BGP is convergence is slow, plus its increases as the number of prefixes increases.

One of the reasons why BGP is slow is because router vendor has configured FIB table to associate a BGP prefix directly to an interface. That is what CEF does. CEF’s job is to improve the route look up time by performing recursive look up on the BGP prefix, and store the directly connected next-hop in the FIB. This however creates an issue when we have a lot of BGP prefixes, because if there is a failure and IGP converged, CEF needs to update the connected next hop for ALL BGP prefixes.

The solution is BGP PIC. BGP PIC is a solution that enables router to update the BGP next-hop on the FIB by using a hierarchical FIB structure. It is a very simple solution. All BGP prefixes that have the same connected next hop are pointed to ONE next-hop. So now instead of charing thousands of connected next-hop, we just change ONE next-hop.

The end result is BGP FIB update time will be independence of the number of prefixes. Which makes BGP convergence time remains the same regardless of how many prefixes it carries.


The BGP PIC Edge for IP and MPLS-VPN feature improves BGP convergence after a network failure. This convergence is applicable to both core and edge failures and can be used in both IP and MPLS networks. The BGP PIC Edge for IP and MPLS-VPN feature creates and stores a backup/alternate path in the routing information base (RIB), forwarding information base (FIB), and Cisco Express Forwarding and LFIB so that when a failure is detected, the backup/alternate path can immediately take over, thus enabling fast failover.

BGP PIC is essentially BGP equivalent of FRR, plus RIB/FIB/LFIB optimization using hierarchy on the next hop. 

Benefits of the BGP PIC Edge for IP and MPLS-VPN Feature

  • An additional path for failover allows faster restoration of connectivity if a primary path is invalid or withdrawn.
  • Reduction of traffic loss.
  • Constant convergence time so that the switching time is the same for all prefixes.

External Links

Routing Protocol Key Concepts

These are the key concepts I need to understand for routing protocols:

  1. How neighbors are formed and maintained
  2. How the best path is calculated
  3. How aggregation is configured and deployed
  4. How external routing information is handled
  5. How protocols interact