In our article on Simple BGP troubleshooting we addressed some basic BGP troubleshooting skills:
- How to identify whether a routing problem is a BGP problem,
- to troubleshoot BGP sessions,
- How to troubleshoot IP route origination and propagation.
In this article, we focus on a more advanced scenario: transit Internet service provider (ISP) networks (see the next diagram).
NOTE: Before reading this article, make sure you're familiar with basic BGP technology as well as simple BGP troubleshooting.
To establish end-to-end connectivity across an ISP network, the ISP has to receive customers' IP prefixes via BGP and announce them to other ISPs. The same process has to happen in reverse direction (or at least the default route has to be announced to the customer). The network-wide BGP troubleshooting is thus composed of three steps:
- Have we received the prefix?
- Is the prefix propagated across our network?
- Is the prefix sent to external BGP neighbours at the other edge of the network?
Have we received the prefix?
Troubleshooting inbound BGP problems is the toughest part of BGP troubleshooting you'll encounter. There are two potential reasons that an IP prefix is not in your BGP table as you would expect it to be:
- The neighbour is not sending the prefix.
- Your inbound filters are blocking the prefix.
The only tool that can help you identify the problem is the debugging facility on your edge router (as you normally don't have access to the other BGP neighbour). When doing BGP debugging, be aware that a BGP neighbour can send you several hundred thousand routes, so you have to ensure that the debugging output produced by the troubleshooting session does not overwhelm the router. Furthermore, the BGP prefixes are sent only when they change, not on a periodic basis (like RIP updates or OSPF LSA floods). Your debugging tool will thus not show you an IP prefix until it has actually changed (or you've cleared the BGP session with your neighbour).
Some BGP routers have the ability to store a separate copy of all routes sent by a neighbour into a parallel BGP table. (To enable this functionality on Cisco IOS, you have to configure soft-reconfiguration in for a BGP neighbour.) With the parallel per-neighbour table, you can exactly pinpoint what the neighbour has sent you (the content of the parallel table) and what routes have passed your input filters (the contents of the main BGP table), but of course the parallel per-neighbour table consumes a large amount of memory.
Is the prefix propagated across our network?
Even when an edge router receives an IP prefix via BGP, it may not be propagated to the other end of your network. To start with, internal BGP (BGP within a single autonomous system) requires a full mesh of BGP sessions among all BGP routers. As every router between every pair of edge routers has to run BGP (otherwise the traffic could be dropped inside your network), the number of BGP sessions could become excessively large. (The next diagram illustrates the BGP sessions needed in a small four-router network.)
There are two tools (BGP route reflectors and BGP confederations) that can help you keep the number of BGP sessions to a sensible level, with BGP route reflectors being the most commonly used.
The BGP route reflector rules are quite simple:
- Whatever is received from a route-reflector client or an external BGP peer will be sent to every other BGP peer.
- Whatever is received from a router that is not a route-reflector client will be sent only to clients and external BGP peers.
With these rules in hand, you have to step through the graph of BGP sessions in your network, checking every BGP router on the way and ensuring that the route reflector rules are not violated (and that, using the rules, the BGP prefixes get from every edge router to all other routers).
There is another common reason an IP prefix is not propagated across your network: The external subnets on the edge of your network are not advertised to your core routers.
The IP address of the next-hop router is not changed when an IP prefix is sent to an internal BGP neighbour. The IP next-hop of an external route is thus always the IP address of a router one hop beyond the edge of your autonomous system. The IP subnets connecting your edge routers to their external neighbours thus have to be inserted into your internal routing protocol (for example, OSPF or IS-IS), otherwise some internal BGP router will decide that the BGP next-hop is not reachable and ignore the IP prefix. (It will appear in the BGP table but will not be used or propagated to other BGP peers.)
Is the prefix sent to external neighbours?
As the last step in troubleshooting BGP route propagation, you have to check whether the IP prefixes transported across your network are announced to your external BGP peers. The techniques for troubleshooting outbound BGP route propagation are explained in the Simple BGP troubleshooting article.
Is the traffic traversing the network?
Even if your BGP route propagation works flawlessly, the IP packets may not be able to traverse your network. (Remember, we're talking about pure IP networks here; things change a bit if you add MPLS to the mix.) The most common cause of a 'black hole' in your network is a router in the transit path that does not run BGP and consequently has no idea how to route the received IP packet toward the destination network.
NOTE: IP routing works hop by hop. Even though the ingress edge router knows exactly which egress edge router to use and how to get there, it cannot pass that information to the intermediate routers. All of them must therefore run BGP as well.
To identify a black hole in your network, perform a traceroute from your customer's network to a destination in the Internet. The last router responding to the traceroute is one hop before the black hole.
Even though all core routers in your network have to run BGP, the internal BGP sessions don't have to follow the physical structure of the network. For example, you could have a few central routers acting as BGP route reflectors for all BGP routers in your network.
About the author: Ivan Pepelnjak, CCIE No. 1354, is a 25-year veteran of the networking industry. He has more than 10 years of experience in designing, installing, troubleshooting, and operating large service provider and enterprise WAN and LAN networks and is currently chief technology advisor at NIL Data Communications, focusing on advanced IP-based networks and web technologies. His books published by Cisco Press include EIGRP Network Design. You can read his blog here: http://ioshints.blogspot.com/index.html
This was first published in July 2007