AWS Outage – Ground-Hog Day Meets Murphy’s Law; You Guys Should Get A Room!

AWS Outage – Ground-Hog Day Meets Murphy’s Law; You Guys Should Get A Room!

AWS Outage – Ground-Hog Day Meets Murphy’s Law; You Guys Should Get A Room!

So, here we go again – I’ve said it once, so I’ll say it again. It gives me no pleasure to write another blog post about AWS suffering another outage in their West Virginia Zone. This because of a couple of reasons: First, the publicity – industry analysts and commentators are divided into two camps, some taking the view that AWS is slightly unfit for the purpose (Barb Darrow from Gigaom: “Cloud outage raises more questions about Amazon Cloud” ) and others taking the more pragmatic view that Instagram, Netflix, etc. (AWS customers) could have been more proactive in protecting themselves against their host going offline (Ingrid Lunden from TechCrunch: “Could Instagram and other sites avoid going down with Amazons Ship”)

The twitter feed kicked in on Saturday morning after the outage on Friday night, and when I saw the first few tweets coming in, I thought it was just people catching up and re-posting regarding the previous AWS issue only two weeks before. But no; it was groundhog day all over again – storm hits; power cut; generators didn’t work; elastic cloud falls over; sites go down!

Checking the hash tags for #Netflix, #Instagram, #AWS and #AWSoutage, I saw all the expected reactions – AWS customers posting stuff like this:

Nearly 28,000 re-tweets, and similar for NetFlix, Pinterest, and Heroku.

The publicity for all of these companies is clearly not good – consumers don’t care or even know what a “host” is. Unless you work in IT, why would you care or want to know? To a consumer, the service they either pay for (Netflix) or use on an hourly basis (Instagram) just doesn’t work, and that type of damage is difficult to undo.

The second reason is that it just gives more fuel to the “I told you so” cloud naysayers. You can just hear old-school CIOs whispering to fearful CEOs all over the world, “the cloud is not ready for us, and we’re certainly not ready for it!”

But I feel I’m repeating myself a bit from my last post, so let’s move on and take an alternate view, one which I subscribe to, and one that infrastructure teams at Netflix et al. would do well to explore.

Putting all your eggs in one basket is clearly a strategy that is both good and bad; good, because you get to be a big customer of a provider; you get economies of scale, better pricing, and someone should pick the phone up when you call, etc., etc., but bad, because you give away some control. When AWS went down, it is clear that many infrastructure teams at customer sites who may have engineered their application to be redundant inside their host didn’t take into account the unthinkable – what happens if the host goes down?

As Michael Lee from ZDNet pointed out in a post on 2 July, quoting Intelligent Business Research Services advisor Jorn Bettin, the blame for the outage may have lain with providers failing to utilise cloud services as they should.

He said that the real issue wasn’t that such a huge cloud-services giant such as Amazon had stumbled over a storm, but that the affected customers – Instagram, Pinterest, Pocket and Netflix (which all suffered from Amazon’s recent outage on the weekend) – hadn’t used the ability of the cloud to create geographically redundant links.

“They could operate at a higher level of redundancy, so that these sort of outages would only have a minimal impact on them. It’s a matter of cost,” Bettin said.

This is the most sensible article I’ve read about the AWS outage issue thus far. Having one provider manage your entire infrastructure without a DR/Back-up strategy with another cloud provider is just commercial madness.

Now, I understand there is a cost element here – the cost of replicating some or all of your infrastructure to spin up when a disaster happens is expensive, isn’t it?

Well, yes and no.

Yes, it’s going to add some level of cost, but what you gain from that is control. You, the System Admin from Pintflixogram, get control to the extent that if your primary host goes down, you get to fire up another, secondary host and maintain your service. Let’s remember AWS is not the only hosting company on the planet. Although they may be perceived as such by many, but in fact there are plenty of regional outfits in the market that are not as cheap as AWS. But guess what – they don’t go down.

On the other hand, if you balance the reputational risk, the customer support calls you have to field, the tickets raised, the PR damage limitation exercise and, finally, the churn as your customer base leaves for your competitor, then no, it’s not expensive.

Companies seem to forget that the quality of hosting service you use is the public perception of your company. You can have the coolest website, the best marketing machine, an awesome product or service, but it all counts for nothing when your customer see’s this:

I can only imagine the frustration and sense of helplessness that the PR folks and the system admins felt, as there is literally nothing they can do to get their service up online until their host tells them they are back up online.

But if they had explored a strategy whereby the client had the control instead of the host, then it could have been service as normal.

AWS are getting hammered, which is understandable from a certain perspective – clients frustrated that their site has gone down; everyone in the space commenting that this shouldn’t happen – but really, the larger clients of AWS who could and should have explored “Redundancy across Regions” (RaR) strategies only have themselves to blame. There is not an industry on the planet that does not have some kind of back-up plan to maintain their core business in the event of a natural disaster, be it as simple as work from home, or a complete replication of their business environment somewhere else.

It’s clear here that some companies had no such plan and just blamed their host, when in fact, if you look at the big picture, it was their own fault. Murphy’s Law exists for a reason, and there are lessons to be learned here.

It’s very simple: You pay for what you get; you pay for greater control and security so that in the event of something bad happening, you don’t have a ground-hog day, and you also beat Murphy at his own game.

By Jason Currill

Jason Currill is a seasoned executive with over 20 yearsʼ international sales and sales leadership experience in investment banking and information technology. In 2011 he founded Ospero, a global Infrastructure as a Service (IaaS) company. Prior to founding Ospero, he held leadership positions at Cisco Systems, Business Objects (an SAP company) and NetSuite, running both EMEA and NA theaters. In addition,  Jason spent 10 years as a leading Futures Trader in the London International Financial Futures Exchange (LIFFE) for SG Warburg, Nomura and ING Bank.

About CloudTweaks

Established in 2009, CloudTweaks is recognized as one of the leading authorities in connected technology information and services.

We embrace and instill thought leadership insights, relevant and timely news related stories, unbiased benchmark reporting as well as offer green/cleantech learning and consultive services around the world.

Our vision is to create awareness and to help find innovative ways to connect our planet in a positive eco-friendly manner.

In the meantime, you may connect with CloudTweaks by following and sharing our resources.

View All Articles

Sorry, comments are closed for this post.

Comics
How The CFAA Ruling Affects Individuals And Password-Sharing

How The CFAA Ruling Affects Individuals And Password-Sharing

Individuals and Password-Sharing With the 1980s came the explosion of computing. In 1980, the Commodore ushered in the advent of home computing. Time magazine declared 1982 was “The Year of the Computer.” By 1983, there were an estimated 10 million personal computers in the United States alone. As soon as computers became popular, the federal government…

Cyber Security: McAfee on IoT Threats and Autonomous Cars

Cyber Security: McAfee on IoT Threats and Autonomous Cars

IoT Threats and Autonomous Cars Autonomous cars are just around the corner, there have been controversies surrounding their safety, and a few doubts still hang in the minds of people who don’t like the idea of a computer driving their car. However, the biggest news stories surrounding this topic have been to do with how…

Three Factors For Choosing Your Long-term Cloud Strategy

Three Factors For Choosing Your Long-term Cloud Strategy

Choosing Your Long-term Cloud Strategy A few weeks ago I visited the global headquarters of a large multi-national company to discuss cloud strategy with the CIO. I arrived 30 minutes early and took a tour of the area where the marketing team showcased their award winning brands. I was impressed by the digital marketing strategy…

Is Machine Learning Making Your Data Scientists Obsolete?

Is Machine Learning Making Your Data Scientists Obsolete?

Machine Learning and Data Scientists In a recent study, almost all the businesses surveyed stated that big data analytics were fundamental to their business strategies. Although the field of computer and information research scientists is growing faster than any other occupation, the increasing applicability of data science across business sectors is leading to an exponential…

The Importance of Cloud Backups: Guarding Your Data Against Hackers

The Importance of Cloud Backups: Guarding Your Data Against Hackers

The Importance of Cloud Backups Cloud platforms have become a necessary part of modern business with the benefits far outweighing the risks. However, the risks are real and account for billions of dollars in losses across the globe per year. If you’ve been hacked, you’re not alone. Here are some other companies in the past…

Digital Identity Trends 2017 – Previewing The Year Ahead

Digital Identity Trends 2017 – Previewing The Year Ahead

Digital Identity Trends 2017 The lack of security of the Internet of Things captured public attention this year as massive distributed denial of service attacks took down much of the internet. The culprits? Unsecured connected devices that were easily accessed and manipulated to do the bidding of shadowy hackers. When you can’t access Netflix anymore,…

The Future Of Cloud Storage And Sharing…

The Future Of Cloud Storage And Sharing…

Box.net, Amazon Cloud Drive The online (or cloud) storage business has always been a really interesting industry. When we started Box in 2005, it was a somewhat untouchable category of technology, perceived to be a commodity service with low margins and little consumer willingness to pay. All three of these factors remain today, but with…

Through the Looking Glass: 2017 Tech and Security Industry Predictions

Through the Looking Glass: 2017 Tech and Security Industry Predictions

2017 Tech and Security Industry Predictions As we close out 2016, which didn’t start off very well for tech IPOs, momentum and performance has increased in the second half, and I believe that will continue well into 2017. M&A activity will also increase as many of the incumbents will realize that they need to inject…