The fall of Facebook services yesterday (4) caused immense inconvenience for users, companies and society in general. Much speculation took place during the event, mainly because the network had not commented in detail on what had caused the crash. Now, however, engineers at Mark Zuckerberg’s company have started to publish preliminary analyzes that help to understand what actually caused the outage for about six hours.

According to Facebook, the problem affected the settings on the backbone routers

that coordinate network traffic between data centers, which disrupted communication. This traffic failure had a ripple effect on the way data centers “talk” to each other, and this caused the web to experience service disruption. The centers of Facebook data crashed and failed to communicate with each other, which caused the crash (Image: geralt-2021/Pixabay)

Indeed , the DNS pointing failure theory was true, but there was a succession of events in the sequence that triggered an even bigger problem. Any error related to DNS servers tends to take time to fix because there is a concept called “propagation time”, which can take hours — even if you fix routes quickly, servers take time to understand the change and start replicating it .

Impact on internal services

How the services attached are also within this structure, many tools and even the company’s internal systems have crashed. This was the case, for example, with the turnstiles at the entrance to the Facebook award: they stopped working and employees were unable to enter rooms and buildings to solve the problems. In some cases, according to the workers themselves, it was necessary to cut doors and other physical barriers to enter the premises.

Facebook employees can’t enter the headquarters because their badges don’t work, and those already inside can’t enter various rooms because access is linked through the Internet of Things (IoT) and goes through the same DNS routes that no longer exist:#FacebookDown pic.twitter.com/8hAea9ZG4l

