Amazon reveals cause of AWS outage that took everything from banks to smart beds offline

Platforms including Signal, Snapchat, Roblox, Duolingo, as well as services such as banking sites and the Ring doorbell company, were some of the 2,000 companies affected by the Amazon Web Services outage on Monday. Photo: AP/Kiichiro Sato
Amazon has revealed the cause of this week’s hours-long Amazon Web Services (AWS) outage, which took everything from Signal to smart beds offline, was a bug in automation software that had widespread consequences.
In a lengthy outline of the cause of the outage, AWS revealed a cascading set of events brought down thousands of websites and apps that host their services with the company.
AWS said customers were unable to connect to DynamoDB, its database system where AWS customers store their data, due to “a latent defect within the service’s automated DNS [domain name system] management system”.
DynamoDB maintains hundreds of thousands of DNS records. It uses automation to monitor the system to ensure records are updated frequently to ensure additional capacity is added as required, hardware failures are handled, and traffic is distributed efficiently.
A bug failed to automatically repair, and required manual operator intervention to correct.
Platforms including Signal, Snapchat, Roblox, Duolingo, as well as services such as banking sites and the Ring doorbell company, were some of the 2,000 companies affected by the outage, according to Downdetector — a site that monitors internet outages — with more than 8.1 million reports of problems from users across the world.
While services were restored in a matter of hours, the impact of the outage was felt widely.
Customers of Eight Sleep — a smart bed company that connects to the internet to control the temperature and incline of a person’s bed — found they were unable to adjust the bed or the temperature of the bed during the outage because they were unable to connect to the bed in their phone app.
Dr Suelette Dreyfus, a computing and information systems lecturer at the University of Melbourne, said the outages showed how dependent the world was on single points of failure on the internet.
“That single point isn’t just AWS — they’re the biggest cloud provider with 30% or so of the market — but rather the cloud as a whole, which is basically just three companies,” she said.
“The internet was designed to be resilient; many other channels existed for routing around problems or attacks, but we’ve lost some of that resilience by becoming so dependent on a handful of giant tech companies to provide not just data storage but also house data services.”