Major Cloud Outages of 2012 to Learn From
According to a recent report by the International Working Group on Cloud Computing Resiliency, each year a cloud computing service is usually down for an average of 7.5 hours. Companies who use the cloud for running their operations totally or partially were severely affected this year. So, let’s see some of the biggest outages suffered by cloud users in 2012:
Microsoft Windows Azure
Microsoft Windows Azure suffered an extensive, worldwide outage in February that wasn’t fully addressed for more than 24 hours. The Azure cloud service outage impacted more geographic regions in Western Europe, Northern Europe, East Asia, and in the U.S. Microsoft said the outage was caused by a software bug related to a “time calculation that was incorrect for the leap year.” The outage drew angry reaction from customers, who expected more communication about the issue on behalf of Microsoft.” In July, Windows Azure cloud computing service went down again in Western Europe for about 2.5 hours. The incident was caused by a “misconfigured network device” and the interruption resulted in connectivity issues for their customers. As recently as this fall another outage with office365 left millions of outsourced mailboxes without service.
Amazon Web Services
An Amazon Web Services power outage cut services to customers for about six hours, in June. The Amazon services affected included Amazon Elastic Compute Cloud, Amazon Relational Database Service and AWS Elastic Beanstalk, which are run from Amazon’s U.S. East region datacenters in Virginia. Among those affected were cloud managed services and platform providers, including Stratalux, Digitaria; and Heroku, the cloud platform-as-a-service provider owned by Salesforce.com. Well-known sites such as Netflix, Pinterest, Reddit, Forsquare, and Instagram were also among those affected. In less than a month, a second outage affected Amazon Web Services. One of their clients has publicly announced that they will cease using Amazon’s cloud services and switch to an alternative provider instead.
In September, a number of users of Apple’s iCloud service found they could not access their e-mail. Due to the fact that there was a problem with the central iCloud service, users experienced problems on their Macs, iOS devices, and with Apple’s Webmail interface on iCloud.com.
Google Gmail suffered more outages this year. The first outage happened in April and lasted for one hour. The bug affected less than 10% of Gmail users and the root cause was a misconfiguration that occurred during a routine upgrade. The second outage happened in June and affected less than 1.50% of its users.
Despite many precautions taken by cloud service providers, outages happen on a regularly basis due to different factors like human errors, technical glitches or natural disasters. These factors can be controlled by developing a comprehensive disaster recovery and failover plan. Follow my future articles as I share best practices on building the 100% SLA.
By Rick Blaisdell