Lessons In Cloud Fail Damage Control
Detractors of cloud computing received unexpected buttresses to their arguments this past weekend, as a pair of noteworthy power failures sent the Internet reeling.
Friday night (June 29), a storm electrified by lightning temporarily wrecked a sizable section of Amazon Web Services’ cloud computing service. AWS’ enormous clientele, of which Netflix, Pinterest, and Instagram shine most brightly, were rendered unavailable for hours on end. Customers utilizing websites and resources powered by AWS were neither provided with sufficient information to comprehend the sudden outage nor reassured that their then inaccessible data would remain protected.
Across the pond, customers of the Royal Bank of Scotland (RBS) experienced similar panic induced by a failure in late June. RBS, alongside NatWest, complete shut down its capacity to take inbound payments from customers, a failing which has continued linger through to this week. Many of those banking with RBS have been left unable to tender finances needed for expenses like bills and food. Technological errors in RBS’ functionality, stemming from cloud associated infrastructure, instigated the bank’s dilemma. Mirroring the AWS debacle, customers felt deserted without transparency. The distraught mood was intense enough to arouse the BBC’s interest in tracking the carnage.
Rivals of AWS have due cause to hurl the company’s cloud gaffe in its face as a deserved brickbat. And RBS customers are justified in their ire against a bank who ostensibly values data security over transparency with those who fund it. Yet a pause on the anger allows for some rational realizations to surface.
In the case of Amazon Web Services, it’s unlikely that its competitors in storage space outsourcing via the cloud would have been able to complete circumvent the crisis. AWS boasts only one figure at its apex: Jeff Bezos, to whom all staff at work for the company must answer. Other companies feature a group of leaders, which in the event of a cloud-affecting power crisis, a case could result of too many chefs in the kitchen.
As for RBS, a less convoluted and more apologetic mea culpa was in order. Blighted customers instead received a vapid press release. “The need to first establish at what point processing had stopped delayed subsequent batches and created a substantial backlog,” read one of the release’s clearer statements.
Solutions for preventing such incidents in the future continue to remain grim. Interviewed by GigaOM, engineer par excellence Geoff Arnold had this to say about if the IT community could hope of better managing large distributed systems: “I used to think so, but I’m getting more cynical now.”
By Jeff Norman