Aws Admits More Bits Of Its Cloud Broke As It Recovered From Dynamodb Debacle

Amazon Web Services has revealed that its efforts to recover from the massive mess at its US-EAST-1 region caused other services to fail.

The most recent update to the cloud giant’s service health page opens by recounting how a DNS mess meant services could not reach a DynamoDB API, which led to widespread outages.

Down the drain - Shutterstock

Today is when the Amazon brain drain finally sent AWS down the spout

READ MORE

AWS got that sorted at 02:24 AM PDT on October 20th.

But then things went pear-shaped in other ways.

“After resolving the DynamoDB DNS issue, services began recovering but we had a subsequent impairment in the internal subsystem of EC2 that is responsible for launching EC2 instances due to its dependency on DynamoDB,” the status page explains. Not being able to launch EC2 instances meant Amazon’s foundational rent-a-server offering was degraded, a significant issue because many users rely on the ability to automatically create servers as and when needed.

While Amazonian engineers tried to get EC2 working properly again, “Network Load Balancer health checks also became impaired, resulting in network connectivity issues in multiple services such as Lambda, DynamoDB, and CloudWatch.”

AWS recovered Network Load Balancer health checks at 9:38 AM, but “temporarily throttled some operations such as EC2 instance launches, processing of SQS queues via Lambda Event Source Mappings, and asynchronous Lambda invocations.”

The cloud colossus said it throttled those services to help with its recovery efforts which, The Register expects, means it decided not to allow every request for resources because a flood of jobs would have overwhelmed its systems.

“Over time we reduced throttling of operations and worked in parallel to resolve network connectivity issues until the services fully recovered,” the post states.

By 3:01 PM, all AWS services returned to normal operations, meaning problems persisted for over a dozen hours after resolution of the DynamoDB debacle.

AWS also warned that the incident is not completely over, as “Some services such as AWS Config, Redshift, and Connect continue to have a backlog of messages that they will finish processing over the next few hours.”

The post ends with a promise to “share a detailed AWS post-event summary.”

Grab some popcorn. Unless you have an internet-connected popcorn machine, which recent history tells us may be one of a horrifyingly large number of devices that stops working when major clouds go down. ®


Original Source


Support Our Work

A considerable amount of time and effort goes into maintaining this website, creating backend automation and creating new features and content for you to make actionable intelligence decisions. Everyone that supports the site helps enable new functionality.

If you like the site, please support us on Patreon or Buy Me A Coffee using the buttons below.

AI APIs OSINT driven New features