The AWS Outage of 2025: When the World’s Biggest Cloud Blinked

This Monday morning, October 20, 2025, the internet hit a massive speed bump. Amazon Web Services (AWS) – the world’s biggest cloud computing provider- went down, affecting 1000+ apps, websites, and workflows.

Major platforms like ChatGPT, Perplexity, Canva, HBO Max, Duolingo, and many others across the globe were affected. In this blog, let’s decode this outage that froze the internet for a while. Stay tuned, and we will explore this in detail.

A holistic view of the AWS Outage of 2025

The outage began with a DNS issue, which prevented AWS services from connecting or discovering each other.
This DNS issue led to a failure on DynamoDB, and many other AWS tools and customer applications failed as well, as they depend on DynamoDB to operate.
As a result, AWS EC2 instances and Lambda functions also started facing issues with connectivity.
In total, around 113 services were affected in this outage.
The official Amazon AWS Service Health Dashboard website last reported that, by 3:01 PM, all AWS services had returned to normal operations.

Important Timelines of this massive outage of AWS:

How it all started:
Oct 19 11:49 PM PDT – AWS services of the US-EAST-1 region started experiencing a lot of error rates and latency, particularly DynamoDB and its related services, such as IAM, DynamoDB Global Tables, were affected.

Root-cause Identified:
Oct 20 2:01 AM PDT Team identified a potential root cause for error rates for the DynamoDB APIs in the US-EAST-1 Region. The issue was found to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1.

Early Recovery:
Oct 20, 2:24 AM PDT DNS issue was resolved, and the AWS services started recovering, but users were facing issues in launching EC2 instances due to the dependency on DynamoDB.

Handling Backlogs:
Oct 20, 3:35 AM – 5:10 AM PDT AWS team worked through backlogs for Lambda, SQS queues, and other dependent services. Early recovery signs started showing for EC2 instance launches in some Availability Zones.

Network and EC2 Recovery:
Oct 20, 6:42 – 10:38 AM PDT – AWS applied mitigations to network load balancers and EC2 subsystems. This gradually improved connectivity. At the same time, throttling for EC2 instance launches was carefully reduced, allowing more users to launch instances successfully.

Continued Recovery:
Oct 20, 10:38 AM – 3:01 PM PDT – AWS progressively restored all services. Lambda, Redshift, and Connect worked through backlogs. By 3:01 PM PDT, all services returned to normal operations, though some analytics and reporting data were still being processed.

When Apps Stopped Working: Snapchat, Netflix, Alexa, and More Affected

During the 12-hour AWS outage, several popular websites and apps experienced disruptions. Among the services impacted were WhatsApp, the UK government’s site, and tax services, the cryptocurrency exchange Coinbase, gaming at The New York Times, and the paywall of The Wall Street Journal. Moreover, many other enterprises and shops — Amazon, Hulu, Snapchat, McDonald’s, and Fortnite- were also disrupted.

As per the downdetector website, where users report these kinds of outage or service health-related issues, over 11 million reports were submitted across all services, including 3 million in the U.S. alone, during the AWS outage.

Unpacking the Core Issue: A DNS Breakdown in DynamoDB

So, the outage started with an issue in DynamoDB’s DNS system, that lead to the failure of several other related AWS services. Let’s take a deep dive in this and try to do a root cause analysis.

What is DynamoDB, and why is it important?

As we all know, there are three types of data: structured, unstructured, and semi-structured. To store unstructured data, we have AWS S3 buckets, for structured data, we use AWS RDS database services and likewise, when we need to store semi-structured data having no particular fixed structures, we use AWS DynamoDB, a fully managed NOSQL database offered by Amazon Web Services.
Being a serverless solution, which means you do not have to manage or worry about the underlying server infrastructure, it becomes one of the most important services of AWS, having high availability with 99.99% of SLA (Service Level Agreement) it gives you a very high uptime when configured with multi-availability zone replication. Also, the latency is low and easy to use.
All these features make DynamoDB the right choice for storing semistructured data. I personally have observed cost-wise, it is also cheaper than the other providers in the market may be one of the reasons that many big companies are using this reliable data storage service of AWS.

In this outage

The DynamoDB went down, and the DNS record for DynamoDB in the US-East-1 region failed.
So, when systems tried to reach,dynamodb.us-east-1.amazonaws.com,they got nothing. To all apps and services, whether running inside AWS or outside it, it looked like DynamoDB had completely disappeared from the internet.

DynamoDB Was Down for Three Hours

This outage lasted for about three hours; the outage affected the US-EAST-1 region, and it was totally inaccessible. The failure at this single point spread through the AWS ecosystem like a domino effect, impacting not only EC2 instances but also Lambda functions, AWS IAM, and a ton of customer applications that depend on DynamoDB for semi-structured data storage. AWS teams gradually worked through the backlogs and restored the dependent services very carefully, even after the DNS issue was resolved, which resulted in certain applications experiencing delays before full functionality was restored.

And then AWS EC2 went down for 12 hours

As soon as the DynamoDB issue was resolved suddenly the AWS EC2 service went down and the situation became even worse. So what happened?

There’s something called DropletWorkflow Manager (DWFM) that EC2 uses to keep track of all its servers. Think of it as a scheduler that knows which server is free and which is in use. Every few minutes, it checks the status of each server and stores that information in DynamoDB.

The DWFM – Dropletworkflow manager tracks the lease of every server to get the status of occupancy of the server so that if it is free itr can be allocated to another EC2 customer.

When DynamoDB was down:

Because the status info couldn’t be updated, the system thought many servers were unavailable.
Anyone trying to launch a server got error messages saying “no capacity,” even though servers were actually free.
Once DynamoDB came back, there were too many servers to update at once. The system got “overloaded” and couldn’t catch up quickly.

It took AWS engineers hours of careful work to fix the problem. In total, EC2 users faced around 12 hours of disruption, showing how one small problem can ripple across the cloud.

So, this was the whole story.. AWS has also posted the summary of this outage on their official page.

Too Big to Fail? Rethinking Our Dependence on AWS

The AWS outage of 2025 brings up a major issue: Are we relying too much on just one cloud provider? AWS is used by millions of applications, websites, and companies for a range of services from databases to hosting. When a particular service like DynamoDB fails, the consequences are felt all over the world, and the users are left annoyed while the companies are trying to fix the problem.

AWS has long been associated with reliability and high uptime, but this event has demonstrated that all systems anywhere are possible failure points. Therefore, it’s a plan to adopt redundancy, multi-cloud strategies, and disaster recovery plans that will ensure that a single outage does not freeze entire operations.

Preparing for the Next Blackout: Lessons for Businesses

This AWS outage is an alarm for companies of all sizes. Relying on a single provider, no matter how reputable, carries risk. Businesses can reduce exposure by diversifying cloud providers, setting up multi-region deployments, and creating fallback systems. Outages are inevitable in technology, but the pain they cause can be minimized with foresight, planning, and resilient architecture.

All Programs