A lightening strike in Dublin on Sunday caused a power failure in data centers belonging to Amazon and Microsoft, causing the companies’ cloud services to go offline.
Lightening struck a transformer, sparking an explosion and fire which caused the power outage at 10:41 AM PDT, according to preliminary information, Amazon wrote on its Service Health Dashboard. Under normal circumstances, backup generators would seamlessly kick in, but the explosion also managed to knock out some of those generators.
By 1:56 PM PDT, power to the majority of network devices had been restored, allowing Amazon to focus on bringing EC2 (Elastic Compute Cloud) instances and EBS (Elastic Block Storage) volumes back online. But progress was slower than expected, Amazon said a couple of hours later.
“We know many of you are anxiously waiting for your instances and volumes to become available, and we want to give you more detail on why the recovery of the remaining instances and volumes is taking so long,” the company wrote at 11:04 PM PDT. “Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored … While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed.”
To speed up the recovery process, Amazon started adding more EBS capacity. European customers of Microsoft’s Business Productivity Online Standard Suite were also affected by the power outage. But services were restored to all customers by 5:45 PM PDT, a spokesman said via e-mail.
Dutch company Layar, whose augmented reality platform has been running on Amazon’s cloud for the last 18 months, was one of the affected companies.
“I’ve been continuously following #AWS on Twitter and looking at our AWS dashboards to see the progress. It’s frustrating. There’s nothing you can do except wait,” Dirk Groten, CTO at Layar, said in a blog posted at about midnight between Sunday and Monday local time.
In an interview on Monday morning local time, Groten said his company’s service was up and running again and that he is still a believer in cloud services. Any data center can experience an outage following a power failure, he said.
It was clear that Amazon was working hard to restore services, but the information the company provided didn’t always match with what Layar was seeing at its end, according to Groten. Layar is now waiting for Amazon to publish a final report on the incident, which should include what it plans to do in the future to prevent something similar from happening again, Groten said.