AWS Outage – Ground-Hog Day Meets Murphy’s Law; You Guys Should Get A Room!

AWS Outage – Ground-Hog Day Meets Murphy’s Law; You Guys Should Get A Room!

AWS Outage – Ground-Hog Day Meets Murphy’s Law; You Guys Should Get A Room!

So, here we go again – I’ve said it once, so I’ll say it again. It gives me no pleasure to write another blog post about AWS suffering another outage in their West Virginia Zone. This because of a couple of reasons: First, the publicity – industry analysts and commentators are divided into two camps, some taking the view that AWS is slightly unfit for the purpose (Barb Darrow from Gigaom: “Cloud outage raises more questions about Amazon Cloud” ) and others taking the more pragmatic view that Instagram, Netflix, etc. (AWS customers) could have been more proactive in protecting themselves against their host going offline (Ingrid Lunden from TechCrunch: “Could Instagram and other sites avoid going down with Amazons Ship”)

The twitter feed kicked in on Saturday morning after the outage on Friday night, and when I saw the first few tweets coming in, I thought it was just people catching up and re-posting regarding the previous AWS issue only two weeks before. But no; it was groundhog day all over again – storm hits; power cut; generators didn’t work; elastic cloud falls over; sites go down!

Checking the hash tags for #Netflix, #Instagram, #AWS and #AWSoutage, I saw all the expected reactions – AWS customers posting stuff like this:

Nearly 28,000 re-tweets, and similar for NetFlix, Pinterest, and Heroku.

The publicity for all of these companies is clearly not good – consumers don’t care or even know what a “host” is. Unless you work in IT, why would you care or want to know? To a consumer, the service they either pay for (Netflix) or use on an hourly basis (Instagram) just doesn’t work, and that type of damage is difficult to undo.

The second reason is that it just gives more fuel to the “I told you so” cloud naysayers. You can just hear old-school CIOs whispering to fearful CEOs all over the world, “the cloud is not ready for us, and we’re certainly not ready for it!”

But I feel I’m repeating myself a bit from my last post, so let’s move on and take an alternate view, one which I subscribe to, and one that infrastructure teams at Netflix et al. would do well to explore.

Putting all your eggs in one basket is clearly a strategy that is both good and bad; good, because you get to be a big customer of a provider; you get economies of scale, better pricing, and someone should pick the phone up when you call, etc., etc., but bad, because you give away some control. When AWS went down, it is clear that many infrastructure teams at customer sites who may have engineered their application to be redundant inside their host didn’t take into account the unthinkable – what happens if the host goes down?

As Michael Lee from ZDNet pointed out in a post on 2 July, quoting Intelligent Business Research Services advisor Jorn Bettin, the blame for the outage may have lain with providers failing to utilise cloud services as they should.

He said that the real issue wasn’t that such a huge cloud-services giant such as Amazon had stumbled over a storm, but that the affected customers – Instagram, Pinterest, Pocket and Netflix (which all suffered from Amazon’s recent outage on the weekend) – hadn’t used the ability of the cloud to create geographically redundant links.

“They could operate at a higher level of redundancy, so that these sort of outages would only have a minimal impact on them. It’s a matter of cost,” Bettin said.

This is the most sensible article I’ve read about the AWS outage issue thus far. Having one provider manage your entire infrastructure without a DR/Back-up strategy with another cloud provider is just commercial madness.

Now, I understand there is a cost element here – the cost of replicating some or all of your infrastructure to spin up when a disaster happens is expensive, isn’t it?

Well, yes and no.

Yes, it’s going to add some level of cost, but what you gain from that is control. You, the System Admin from Pintflixogram, get control to the extent that if your primary host goes down, you get to fire up another, secondary host and maintain your service. Let’s remember AWS is not the only hosting company on the planet. Although they may be perceived as such by many, but in fact there are plenty of regional outfits in the market that are not as cheap as AWS. But guess what – they don’t go down.

On the other hand, if you balance the reputational risk, the customer support calls you have to field, the tickets raised, the PR damage limitation exercise and, finally, the churn as your customer base leaves for your competitor, then no, it’s not expensive.

Companies seem to forget that the quality of hosting service you use is the public perception of your company. You can have the coolest website, the best marketing machine, an awesome product or service, but it all counts for nothing when your customer see’s this:

I can only imagine the frustration and sense of helplessness that the PR folks and the system admins felt, as there is literally nothing they can do to get their service up online until their host tells them they are back up online.

But if they had explored a strategy whereby the client had the control instead of the host, then it could have been service as normal.

AWS are getting hammered, which is understandable from a certain perspective – clients frustrated that their site has gone down; everyone in the space commenting that this shouldn’t happen – but really, the larger clients of AWS who could and should have explored “Redundancy across Regions” (RaR) strategies only have themselves to blame. There is not an industry on the planet that does not have some kind of back-up plan to maintain their core business in the event of a natural disaster, be it as simple as work from home, or a complete replication of their business environment somewhere else.

It’s clear here that some companies had no such plan and just blamed their host, when in fact, if you look at the big picture, it was their own fault. Murphy’s Law exists for a reason, and there are lessons to be learned here.

It’s very simple: You pay for what you get; you pay for greater control and security so that in the event of something bad happening, you don’t have a ground-hog day, and you also beat Murphy at his own game.

By Jason Currill

Jason Currill is a seasoned executive with over 20 yearsʼ international sales and sales leadership experience in investment banking and information technology. In 2011 he founded Ospero, a global Infrastructure as a Service (IaaS) company. Prior to founding Ospero, he held leadership positions at Cisco Systems, Business Objects (an SAP company) and NetSuite, running both EMEA and NA theaters. In addition,  Jason spent 10 years as a leading Futures Trader in the London International Financial Futures Exchange (LIFFE) for SG Warburg, Nomura and ING Bank.

About CloudTweaks

Established in 2009, CloudTweaks is recognized as one of the leading authorities in connected technology information and services.

We embrace and instill thought leadership insights, relevant and timely news related stories, unbiased benchmark reporting as well as offer green/cleantech learning and consultive services around the world.

Our vision is to create awareness and to help find innovative ways to connect our planet in a positive eco-friendly manner.

In the meantime, you may connect with CloudTweaks by following and sharing our resources.

View All Articles

Sorry, comments are closed for this post.

Five Cloud Questions Every CIO Needs To Know How To Answer

Five Cloud Questions Every CIO Needs To Know How To Answer

The Hot Seat Five cloud questions every CIO needs to know how to answer The cloud is a powerful thing, but here in the CloudTweaks community, we already know that. The challenge we have is validating the value it brings to today’s enterprise. Below, let’s review five questions we need to be ready to address…

Cloud Security: The Top 8 Risks According To ENISA

Cloud Security: The Top 8 Risks According To ENISA

Cloud Security Risks Does security in the cloud ever bother you? It would be weird if it didn’t. Cloud computing has a lot of benefits, but also a lot of risks if done in the wrong way. So what are the most important risks? The European Network Information Security Agency did extensive research on that,…

10 Trending US Cities For Tech Jobs And Startups

10 Trending US Cities For Tech Jobs And Startups

10 Trending US Cities For Tech Jobs And Startups Traditionally actors headed for Hollywood while techies made a beeline for Silicon Valley. But times are changing, and with technological job opportunities expanding (Infographic), new hotspots are emerging that offer fantastic opportunities for tech jobs and startup companies in the industry. ZipRecruiter, an online recruitment and job…

5 Ways The Internet of Things Will Drive Cloud Growth

5 Ways The Internet of Things Will Drive Cloud Growth

5 Ways The Internet of Things Will Drive Cloud Growth The Internet of Things is the latest term to describe the interconnectivity of all our devices and home appliances. The goal of the internet of things is to create universal applications that are connected to all of the lights, TVs, door locks, air conditioning, and…

15 Cloud Data Performance Monitoring Companies

15 Cloud Data Performance Monitoring Companies

Cloud Data Performance Monitoring Companies (Updated: Originally Published Feb 9th, 2015) We have decided to put together a small list of some of our favorite cloud performance monitoring services. In this day and age it is extremely important to stay on top of critical issues as they arise. These services will accompany you in monitoring…

Cloud Computing Then & Now

Cloud Computing Then & Now

The Evolving Cloud  From as early as the onset of modern computing, the possibility of resource distribution has been explored. Today’s cloud computing environment goes well beyond what most could even have imagined at the birth of modern computing and innovation in the field isn’t slowing. A Brief History Matillion’s interactive timeline of cloud begins…

5 Essential Cloud Skills That Could Make Or Break Your IT Career

5 Essential Cloud Skills That Could Make Or Break Your IT Career

5 Essential Cloud Skills Cloud technology has completely changed the infrastructure and internal landscape of both small businesses and large corporations alike. No professionals in any industry understand this better than IT pros. In a cutthroat field like IT, candidates have to be multi-faceted and well-versed in the cloud universe. Employers want to know that…

Cloud Infographic – Cloud Computing And SMEs

Cloud Infographic – Cloud Computing And SMEs

Cloud Computing And SMEs SMEs (Small/Medium Sized Enterprises) make up the bulk of businesses today. Most cloud based applications created today are geared toward the SME market. Accounting, Storage, Backup services are just a few of them. According to the European Commission, cloud based technology could help 80% of organisations reduce costs by 10-20%. This infographic provided…

12 Promising Business Intelligence (BI) Services For Your Company

12 Promising Business Intelligence (BI) Services For Your Company

12 Promising Business Intelligence (BI) Services Business Intelligence (BI) services have recently seen an explosion of innovation and choices for business owners and entrepreneurs. So many choices, in fact, that many companies aren’t sure which business intelligence company to use. To help offer you a solution, we’ve compiled a list of 12 Business Intelligence companies…

Cloud Computing – The Real Story Is About Business Strategy, Not Technology

Cloud Computing – The Real Story Is About Business Strategy, Not Technology

Enabling Business Strategies The cloud is not really the final destination: It’s mid-2015, and it’s clear that the cloud paradigm is here to stay. Its services are growing exponentially and, at this time, it’s a fluid model with no steady state on the horizon. As such, adopting cloud computing has been surprisingly slow and seen more…