Context : FaceBook

Suffered the biggest outage in its history on 04/10/2021

JasonCrawford's thread :

Facebook is down this morning, apparently due to a BGP problem.

What's BGP? It's an absolutely essential but fairly obscure internet protocol. I have a CS degree, but I only know about it because I did a summer internship with @Akamai a very long time ago.

A brief explainer:

One of the more mind-blowing facts about the Internet is that no one owns or manages all of it, and there is no central authority keeping track of all of its parts. Authority and responsibility are distributed among a large number of ISPs who manage independent networks.

Each ISP has a map of its own network, so its routing computers can route packets of information internally. But how does information go beyond the confines of one ISP? How does a browser on Comcast talk to a website on AT&T?

For this to happen, each ISP has to have some way of talking to the other ISPs and exchanging some sort of network information.

Enter BGP: the Border Gateway Protocol.

BGP organizes the Internet into 'autonomous systems' (ASes). Each ISP's network is an AS.

The ASes talk to each other at 'peering points', each of which is a bridge or gateway from one AS to another.

BGP is the protocol that routers at peering points speak to each other, to help each other understand the others' network.

They can't exchange complete maps of their internal networks—that would be too much info, and it changes too fast.

What they exchange is just information about which internet addresses (IP addresses) they contain, and which other ASes they peer with.

From that information, any router can form a high-level picture of what ASes they need to traverse to get to any other point on the Internet.

For instance, suppose my browser, on my local Comcast network, is trying to reach a website in Japan on a local ISP there. And suppose those networks don't peer with each other directly, but AT&T peers with both.

My laptop only knows to send the request to a router on the Comcast network. But once it's in the middle of that network, the routers need to figure out which way to send it. They consult their AS map derived from BGP. This says that the shortest AS path is through AT&T.

So the Comcast routers say, aha, what I have to do is get this packet to AT&T, and they'll take care of it. That they know how to do because of their internal map of the Comcast network.

The AT&T routers, again because of BGP, know that they are directly connected to the Japanese ISP. So they consult their internal map to find where that peering point is, and route the packet there.

Finally it's dropped off at the Japanese peering point, and can then be routed internally to the website.

But all of that is possible only because of BGP.

You can imagine that if this gets misconfigured, it could be Very Bad™️. An entire AS could just disappear, getting sucked into an internet black hole. Apparently that's what happened to Facebook today.

Disclaimer: this is a simplified, high-level, layman's summary, and it's based on 20-year-old knowledge that may have rusted, or things may be different now. No warranty express or implied. Read more:

PS: How did BGP, specifically, mess up Facebook? Here's a brief thread about it. Facebook seems to have put out an empty/null BGP message, causing all network hell to break loose:

The above explanation reminds me of MiddleSpace and the 2008Crash ... in a sense, at the highest level, "decentralized networks" are organized around a number of super-hubs that talk to each other through their own protocols.

In 2008, the clearing system between banks froze because banks stopped trusting each other. This is another case of an "elite" protocol breaking down.

Although we think about the internet being "flat" or "fractal", in fact there are "elite level" networks that may not follow the same rules or protocols that everyone else does. (Eg. BGP).

I guess this echoes ClayShirky's point in AGroupIsItsOwnWorstEnemy (that there's an inner circle whose interests trump everyone else's) (So it's a conservative notion of networking)

See also :

Backlinks (1 items)