Hello….can you hear me?

Major AWS outage hits major sites from Adele tickets to Disney+

From Black Friday to Black Tuesday, from the excitement of thousands of transactions a second to well…nothing.

Tuesday (7th) witnessed one of the largest outages of recent times as AWS (Amazon Web Services) suffered major problems with its US-East-1 Region infrastructure. As yet the problem has to be identified with problems related to an API, it real time auction software, to overloaded firewalls being offered as possible reasons.

Whatever the cause, the incident certainly did not go unnoticed.

Sites affected included UPlay, Disney+, Tindr and Capital One. AWS claims that over 90% of the world’s biggest game companies use its services, including those behind League of Legends, Valorant and Clash of Clans, so fair to state that a more than a few gamers were not best pleased. Customers queuing in McDonalds couldn’t use the menu screens, homes remained dark, and people’s pets went unfed due to web connected auto feeders failing. And when they are added to the 24,000+ cases of people reporting problems with AWS according to Downdetector, which tracks outages by collating status reports from several sources, it was a highly visible ‘hiccup’.

And if you waited 5 years to see Adele perform again, a few more hours won’t have mattered too much, with Ticketmaster tweeting: ”Due to an Amazon Web Services (AWS) outage impacting companies globally, all Adele Verified Fan Presales scheduled for today have been moved to tomorrow to ensure a better experience.”

What the outage did surface is how reliant many of the world’s apps and services are on cloud computing infrastructure provided by a relatively small number of global players. According to Gartner, AWS accounts for 41% of the total cloud computing market, with Microsoft (recording 60% growth in 2020 alone), Alibaba, Google and Huawei accounting for a good chunk of the rest.

Whilst this week’s incident caught the world’s attention (making the BBC news website is usually a good barometer of an outage’s impact), it was in fact the 27th such AWS outage over the past 12 months according to Tooltester.

So, should we be worried about the cloud? Is it time to consider alternatives?

Quite simply. No.

No technology is infallible, failures will happen. The key is ensuring that plans are in place to mitigate the impact. For example, it might be a sensible strategy to host static web and app content hosted in a second cloud, enabling fall over and at least presenting users with some level of use. Apply network intelligence to ensure that LDAP services are multi region for example. And test and backup! Test and back up your systems at every opportunity.

Disaster recovery environments are costly, and much like household insurance, you often resent spending money on services you hope you never need but…the risk is to high not too. Just ask AWS. One of the biggest companies impacted by this outage was Amazon itself. It is estimated that this outage has cost it millions and millions of dollars with the downtime of Prime Video and its inability to fulfil its own orders.

As with any tech, never judge its performance on when things work well, always plan for failure.

Remember, if you need any advice on hosting, back up or disaster recovery we’re only a call away and would be happy to help.