CloudTweaks

Cloud Availability

I’m a firm believer in having control over anything that can get me fired. So, while the cloud is wonderful for solving all sorts of IT issues, only the bold, the brave or the career suicidal place business-critical applications so completely out of their own control.

My company began pushing applications to the cloud around 2004. Today the majority of our applications are cloud-based. Our most important applications, however, stay in-house and run on fault-tolerant servers. I know everything about them … where they are, what platform they are running on, when and how they are maintained, where data are stored, what the current rev levels are for everything that touches them. More importantly, I know what is being done and by whom if the server goes down, which hasn’t happened in years. Thanks to how my platform is architected, I can be reasonably sure when applications will be back up and running. And, problem’s root cause will not be lost to the ether. This is how I sleep well at night.

On the other hand, having a critical application go offline in the cloud is a CIO’s nightmare. The vendor is as vague about the problem as it is estimating recovery time, saying (or, posting to Twitter) only that they are looking in to it. Of the thousands or millions of clients they have (think Go Daddy), whose applications come back first and whose are last? No matter how cleverly you phrase your response when the executive office calls for a status update, the answer still comes across as, “I have no idea what’s going on.”

No worries, you have a failover plan to switch to another location or back-up provider. This being the first time you are actually doing it for real, some critical dependencies or configuration errors surface that were missed in testing. All this also adds cost and complexity to a solution that was supposed to yield the opposite result.

Why this is important

Getting sacked notwithstanding, losing critical applications to downtime is extremely costly, whether they reside in the cloud or internal data center. Many may think this is stating the obvious. In our experience, corroborated by ample industry research, more than half of all companies make no effort to measure downtime costs. Those who do, usually underestimate by a wide margin.

Cost-of-downtime estimates provided by a number of reputable research firms exceed $100,000 per hour for the average company. The biggest cost culprits, of course, are the applications your company relies on most and would want up and running first after an outage. The thought of ceding responsibility to a third-party for keeping these applications available 24/7 … whose operations you have no control over, whose key success metric is the lowest possible cost per compute cycle, whose SLAs leave mission-critical applications hanging over the precipice … is anathema.

This is not an indictment against cloud Service Providers. This is only the current reality, which will improve with time. Today’s reality is completely acceptable for more enterprise applications than not, as it is in my company. Regrettably for some companies, it’s even acceptable for critical workloads.

At a recent CIO conference my conversation with a peer from a very recognizable telecom and electronics company turned to application availability. I was confounded to hear him declare how thrilled he’d be with 99.9% uptime for critical applications, which I believe is the level most cloud providers aspire to, and ordinary servers are capable of. If analysts’ downtime cost estimates are anywhere close to reality, 99.9% uptime translates into about $875,000 in cost per year for the average company. This was a Fortune 500 firm.

Determining the total of hard and soft downtime costs is not easy, which is why it’s often not done well if at all. For example, downtime impact can ripple to departments and business functions beyond the core area. There may be contractual penalties. News headlines may be written.

Making technology choices without knowing your complete downtime costs is a crap shoot. Making informed ROI decisions is impossible. You may even find that savings from moving not-so-critical applications to the cloud are inconsequential, as I did with our company’s email system. That will stay in-house. And, I will continue to sleep soundly.

By Joe Graves – CIO of Stratus Technologies

Joe was named CIO of Stratus Technologies in 2002. During his tenure, Joe has recreated the Stratus IT environment using innovative approaches such as virtualization and Software-as-a-Service (SaaS). Prior to becoming CIO, he was responsible for managing IS operations followed by IT application development. Prior to Stratus, Joe held various software engineering positions with Sequoia Systems and Data General.