Is Your Company Ready For A Cloud Service Outage?

Is Your Company Ready For A Cloud Service Outage?

If you are using one of the major CSPs (cloud service providers) you may already be used to major service outages. Amazon EC2, Rackspace, Google Apps and Microsoft Azure have all had their fair share of outages in the past 18 months, and some of them have been huge failures (Amazon’s April 2011 outage lasted 47 hours for some customers), which have brought down sites such as Reddit. However, companies such as Netflix have been able to survive the outages well. In this post, I will cover details on how you can effectively manage a service outage by taking note of best practices from Netflix and other companies that have successfully weathered the storm.

Create a disaster management fire-drill

Fire drills should be an essential part of cloud management. Once a month, have a fire drill to simulate failures in different parts of the system to see how your systems hold up in case of failures. This includes preparing your PR and customer service personnel, instituting quality control processes and executing an executive-level contingency plan to prevent panic from gripping the company.

Have all your data backed up securely

Periodically back up data and store it away from your primary CSP. For example, you could have an Amazon S3 instance to back up your Rackspace cloud installation. This will mitigate against a single point of failure.

Keep another service provider ready

Have another CSP ready to run an instance of your server at short notice, if needed. Even if it doesn’t provide full features for the site, this plan B should provide a minimum working subset. (In an email application, for instance, the service must allow you to send and receive email, even if contracts and archive access is not restored.)

Create stateless systems

One of the lessons Netflix offered was to build stateless systems where possible. That means a new request from the client can be served by any of the available servers, even if the original server to which the client made the request is down. This requires very careful planning during development.

Work on graceful degradation

Your system has a graceful degradation when a certain percentage of failure causes only an equivalent percentage drop in performance instead of bringing down the whole system. To enable graceful degradation, you must detect failures quickly (set  quick timeouts when the system recognizes a failure) and shut down all non-essential features of the system (to save precious resources for critical features).

Create a communication plan to keep customers in the loop

If there is an outage at your CSP, your customers will also be affected. Imagine you are running an online store and the outage has prevented you from shipping. Even if none of the previous disaster management steps worked, you could save some of the bad press by keeping the customers and other stakeholders regularly informed of the status. Identify proper communication channels and create a plan to keep all of them in the loop. This requires having a backup of customer contact data, writing FAQs and preparing your employees to handle questions appropriately.

Having a proper service outage plan is essential to the survival of your business in the long-term. It could save a lot of headaches, not to mention your brand value, when failures do happen.

By Balaji Viswanathan

Sorry, comments are closed for this post.

Comic
The Lighter Side Of The Cloud – Data Merge

The Lighter Side Of The Cloud – Data Merge

By Christian Mirra Please feel free to share our comics via social media networks such as Twitter, Facebook, LinkedIn, Instagram, Pinterest. Clear attribution (Twitter example: via @cloudtweaks) to our original comic sources is greatly appreciated.

The Rise Of Threat Intelligence Sharing

The Rise Of Threat Intelligence Sharing

Threat Intelligence Sharing  Security has been discussed often on CloudTweaks and for good reason. It is one of the most sought after topics of information in the technology industry.  It is virtually impossible to wake up and not read a headline that involves the words “Breached, Hacked, Compromised or Extorted (Ransomware)“. Included (below) is an…

Moving Your Email To The Cloud? Beware Of Unintentional Data Spoliation!

Moving Your Email To The Cloud? Beware Of Unintentional Data Spoliation!

Cloud Email Migration In today’s litigious society, preserving your company’s data is a must if you (and your legal team) want to avoid hefty fines for data spoliation. But what about when you move to the cloud? Of course, you’ve probably thought of this already. You’ll have a migration strategy in place and you’ll carefully…

Higher Education Institutions Increasing Cloud Use In Next 5 Years

Higher Education Institutions Increasing Cloud Use In Next 5 Years

Cloud Computing Advancing Edtech In a new research study by ResearchMoz it’s predicted that the global cloud computing market in higher education will grow steadily at a CAGR of 24.57% over the period 2016 to 2020. Making use of computing resources connected by either public or private networks provides the benefits of scalable infrastructure, greater…

Big Data and AI Hold Greatest Promise For Healthcare Technologies

Big Data and AI Hold Greatest Promise For Healthcare Technologies

Digital Healthcare Executives and Investors Addressed Opportunities and Challenges Facing the Industry New York City – September 21, 2016 – According to a survey of 122 founders, executives and investors in health-tech companies released today by Silicon Valley Bank, big data and artificial intelligence will have the greatest impact on the industry in the year ahead. Healthcare…

How Formal Verification Can Thwart Change-Induced Network Outages and Breaches

How Formal Verification Can Thwart Change-Induced Network Outages and Breaches

How Formal Verification Can Thwart  Breaches Formal verification is not a new concept. In a nutshell, the process uses sophisticated math to prove or disprove whether a system achieves its desired functional specifications. It is employed by organizations that build products that absolutely cannot fail. One of the reasons NASA rovers are still roaming Mars…

Ending The Great Enterprise Disconnect

Ending The Great Enterprise Disconnect

Five Requirements for Supporting a Connected Workforce It used to be that enterprises dictated how workers spent their day: stuck in a cubicle, tied to an enterprise-mandated computer, an enterprise-mandated desk phone with mysterious buttons, and perhaps an enterprise-mandated mobile phone if they traveled. All that is history. Today, a modern workforce is dictating how…

Three Ways To Secure The Enterprise Cloud

Three Ways To Secure The Enterprise Cloud

Secure The Enterprise Cloud Data is moving to the cloud. It is moving quickly and in enormous volumes. As this trend continues, more enterprise data will reside in the cloud and organizations will be faced with the challenge of entrusting even their most sensitive and critical data to a different security environment that comes with using…

The Future Of Work: What Cloud Technology Has Allowed Us To Do Better

The Future Of Work: What Cloud Technology Has Allowed Us To Do Better

What Cloud Technology Has Allowed Us to Do Better The cloud has made our working lives easier, with everything from virtually unlimited email storage to access-from-anywhere enterprise resource planning (ERP) systems. It’s no wonder the 2013 cloud computing research IDG survey revealed at least 84 percent of the companies surveyed run at least one cloud-based application.…

New Report Finds 1 Out Of 3 Sites Are Vulnerable To Malware

New Report Finds 1 Out Of 3 Sites Are Vulnerable To Malware

1 Out Of 3 Sites Are Vulnerable To Malware A new report published this morning by Menlo Security has alarmingly suggested that at least a third of the top 1,000,000 websites in the world are at risk of being infected by malware. While it’s worth prefacing the findings with the fact Menlo used Alexa to…

The Big Data Movement Gets Bigger

The Big Data Movement Gets Bigger

The Big Data Movement In recent years, Big Data and Cloud relations have been growing steadily. And while there have been many questions raised around how best to use the information being gathered, there is no question that there is a real future between the two. The growing importance of Big Data Scientists and the…

Surprising Facts and Stats About The Big Data Industry

Surprising Facts and Stats About The Big Data Industry

Facts and Stats About The Big Data Industry If you start talking about big data to someone who is not in the industry, they immediately conjure up images of giant warehouses full of servers, staff poring over page after page of numbers and statistics, and some big brother-esque official sat in a huge government building…

Cloud Computing – The Real Story Is About Business Strategy, Not Technology

Cloud Computing – The Real Story Is About Business Strategy, Not Technology

Enabling Business Strategies The cloud is not really the final destination: It’s mid-2015, and it’s clear that the cloud paradigm is here to stay. Its services are growing exponentially and, at this time, it’s a fluid model with no steady state on the horizon. As such, adopting cloud computing has been surprisingly slow and seen more…

The Internet of Things – Redefining The Digital World As We Know It

The Internet of Things – Redefining The Digital World As We Know It

Redefining The Digital World According to Internet World Stats (June 30th, 2015), no fewer than 3.2 billion people across the world now use the internet in one way or another. This means an incredible amount of data sharing through the utilization of API’s, Cloud platforms and inevitably the world of connected Things. The Internet of Things is a…