CloudTweaks | Backups, Storage, and challenge-response mechanisms

On A Clear Day: Downtime In Cloud Computing

As most know, and as the recent Azure outage kindly reminded those who don’t, cloud storage isn’t infallible. Power outages happen all the time. Software is prone to catching nasty bugs. Human error can never be ruled out. Service Providers openly acknowledge the possibility of failure. Some, like Amazon, even offer spreading data across eight regional data centers around the world, for an extra fee, but others offer discounts if the guaranteed uptime is not met.

The good news is that power outages don’t happen often, and they rarely affect the whole service. It’s still, however, a disconcerting thought that downtime, or, in the worst case scenario, data loss can tamper with the functioning of your own app, resulting in huge losses and angry customers. Furthermore, if entire data centers are affected, entire apps and websites can be affected for hours at a time.

Notable outages in 2014

There have been bigger and smaller outages in 2014. For what it’s worth, Netflix was down for two hours on January 3rd, and DropBox was unavailable for three hours a week later. Azure’s recent, but only partial outage was perhaps the biggest this year, as far as hitting the headlines is concerned. It doesn’t amount to much overall downtime, however you put it.

That being said, a 2013 research paper estimates 2007-2012 losses caused by cloud service providers to be $273m, which might not be an astonishing figure, given the size of the industry. It’s certainly an enlightening figure, however. The bottom line is that no service can realistically guarantee even near-constant uptime. 99.9% a month is still 45 minutes of allowed downtime, and a few unlucky businesses can very well suffer from that. Certain types of businesses, such as brokerage firms, can suffer immensely from any downtime.

Backups, spread storage, and challenge-response mechanisms

Giants like Netflix can afford to spread their storage across the globe, protecting their service against power outages and data center failures — not from their own code, though. Even many who can’t afford this can still afford backups and implement them differently, for example, by hosting a read-only version of the website in a separate data center. In the end, that’s merely storing twice as much just for extra safety, a practice that doesn’t fare well with websites hosting terabytes of user-generated content.

Regular data review is also cumbersome, though possibly cheaper. A third way exists in implementing challenge-response mechanisms that automatically check the integrity of data. The gist is that protocols are made that allow an agent (e.g., a separate Instance) to ask the server for a response, which can only be given if the data is intact and as needed. This can be cheaper, but ultimately only protects your data, not your uptime, which remains in the hands and goodwill of the cloud provider service.

And what of the layman entrepreneur? It seems that adopting cloud computing is ultimately a trade-off: lose some control over your data for the innumerable benefits of the cloud . Instead of being tempted to drop the cloud or abort moving to it due to safety reasons, businesses should strive to adapt their apps so that they don’t go haywire when there’s downtime… and hope for the best.

By Lauris Veips