Q&A: Why did Amazon’s cloud services go down on Tuesday?

A major outage disrupted Amazon’s cloud services on Tuesday, temporarily knocking out streaming platforms, apps such as Tinder and e-commerce services.

What exactly happened?

On Tuesday, at around 3.45pm, a problem with some AWS servers caused internet services around the world to start slowing down or failing to load. The outage temporarily knocked out everything from streaming video services to apps, to Amazon’s own services.

An Amazon insider said the company was blaming an "as yet unknown source" for the outage, with an internal analysis pointing to traffic congestion across multiple network devices in the northern Virginia region.

Which services were affected?

It would probably be easier to say what wasn't; the effect was wide ranging. Robot vacuums went silent as iRobot was hit, Tinder daters couldn't swipe right, Disney+ addicts had to step back from their streaming services. Trading app Robinhood was also hit, and Netflix, which runs almost all its infrastructure on AWS, saw its traffic suffer. Fans of Adele who were trying to log on to presales for her Las Vegas residency were left hanging as the outage caused the ticket sales to be shifted on a day.

Of course Amazon’s own services were also hit, with Ring security cameras suffering from problems and its Prime Video streaming service suffering from outages.

How long did it last?

About seven hours later, at 6pm eastern time (11pm Irish time), Amazon said it had largely solved the problem. However, the ripple effect took some time to lift, meaning some services were still suffering from problems even after Amazon fixed the issue at their end.

Why were so many services affected?

Cloud services have become a vital part of our internet infrastructure, allowing companies to grow their services quickly and relatively inexpensively. Instead of having to invest in their own data centres and servers to provide their services, they can buy in the service from a provider such as AWS, Microsoft and others.

The problem is though that when one of these providers experiences problems, it affects a wide range of apps and services.

Whether we should be quite so dependent on a small number of providers is another debate; as we have seen when a crucial part of internet infrastructure suffers a problem, the ripple effect means we all suffer. The technology that was meant to democratise the internet has also ended up being the very thing that has choked it on occasion.

Hasn’t this happened before?

Yes. Last November, AWS had an outage that lasted several hours, affecting sites such as Coinbase, Adobe, iRobot and the Washington Post. In that case, it was down to a typo.

Earlier this year, websites including the The Irish Times, the Guardian, the New York Times, Reddit, Amazon, Paypal, Spotify and others were affected by intermittent outages when US-based content delivery network provider Fastly experienced issues. The company later blamed on a bug in its software that was triggered when one of its customers changed their settings.