AWS ‘power event’ disrupts largest U.S. availability zone
Mon 4 Jun 2018
On Thursday, a power issue at an AWS data centre in Northern Virginia disrupted services throughout the US-EAST-1 region, the largest in the United States.
At 3:13PM, the company Services Dashboard said that it was investigating connectivity issues in the region, and later confirmed that a data centre issue affecting physical servers and networking devices was impacting connectivity in the US-EAST-1 Availability Zone. At the time, Amazon said that customers with EC2 instances may experience connectivity problems.
Most services were brought back online within an hour, however, Amazon warned that the issue in the data centre caused hardware failures and that some instances that were not backed up to a separate availability zone may not be recoverable.
A message from the AWS Services Dashboard at 5:36 PM, approximately two hours after the power event noted, “We have been working to recover the remaining instances and volumes. The small number of remaining instances and volumes are hosted on hardware which was adversely affected by the loss of power. While we will continue to work to recover all affected instances and volumes, for immediate recovery, we recommend replacing any remaining affected instances or volumes if possible.”
Affected services included EC2, EBS, RDS, Redshift and WorkSpaces. While the power event was largely resolved, customer connectivity was also impacted by a disruption in the US-EAST-2 region the same day.
AWS US-East’s historic problems
Power issues were also behind the AWS outage in March 2018, which affected the same US-EAST-1 region. At that time, customers experienced connectivity issues and packet loss, as Amazon worked on rerouting traffic to unaffected network peering facilities.
The March outage caused disruption for services including Slack, Twilio, and Atlassian as well as Alexa and AWS Direct Connect.
As the largest and oldest region for Amazon Web Services, the US-EAST-1 network is also noted for having the largest number of high-profile outages and issues. These include the March 2018 outage, major S3 storage issues in February 2017, and EC2 issues going all the way back to 2011, when traffic was routed incorrectly resulting in several days of errors and issues for AWS customers.