Google suffers European cloud outage, human error to blame
Mon 30 Nov 2015
Google’s Compute Engine cloud platform suffered an outage in Europe last week, following an accidental denial of service caused by a manually activated peering link from an unnamed network carrier.
The outage occurred in Google’s Europe1-West zone and sent the service offline in the region for around 70 minutes. According to a Google update, its engineering team manually activated a peering advertisement link from the network, skirting the automatic validation checks.
The region’s traffic dropped by 13% during the accidental outage, with most affected users residing in eastern Europe and the Middle East.
The incident, which occurred last Monday, has only today been explained by Google. It described that during the activation of the link, the engineers had not anticipated that it would advertise far more capacity than was available at the time. ‘Google’s network responded accordingly by routing a large volume of traffic to the link. At 11:55, the link saturated and began dropping the majority of its traffic,’ the blog post read.
The company assured that automated safety checks would have typically detected and corrected the issue.
‘In this case, the automation was not operational due to an unrelated failure, and the link was brought online manually, so the automation’s safety checks did not occur,’ the firm continued in the statement. Amending its operations policy, Google has now taken measures to ensure that manual link activation cannot be actioned. It now stipulates that links should not be set live unless processed by automated mechanisms, including safety checks before and after activation, and monitoring immediately after activation – providing ‘redundant error detection.’
In August, Google suffered a further outage in Europe when a lightning storm in Belgium hit one of its data centres causing power loss and damage to disk storage, leaving customers without access to data.
An additional two outages hit Google in March at the hands of a faulty patch which affected some virtual machines (VMs) in an abnormal manner.