Is Your Company Ready For A Cloud Service Outage?

Is Your Company Ready For A Cloud Service Outage?

If you are using one of the major CSPs (cloud service providers) you may already be used to major service outages. Amazon EC2, Rackspace, Google Apps and Microsoft Azure have all had their fair share of outages in the past 18 months, and some of them have been huge failures (Amazon’s April 2011 outage lasted 47 hours for some customers), which have brought down sites such as Reddit. However, companies such as Netflix have been able to survive the outages well. In this post, I will cover details on how you can effectively manage a service outage by taking note of best practices from Netflix and other companies that have successfully weathered the storm.

Create a disaster management fire-drill

Fire drills should be an essential part of cloud management. Once a month, have a fire drill to simulate failures in different parts of the system to see how your systems hold up in case of failures. This includes preparing your PR and customer service personnel, instituting quality control processes and executing an executive-level contingency plan to prevent panic from gripping the company.

Have all your data backed up securely

Periodically back up data and store it away from your primary CSP. For example, you could have an Amazon S3 instance to back up your Rackspace cloud installation. This will mitigate against a single point of failure.

Keep another service provider ready

Have another CSP ready to run an instance of your server at short notice, if needed. Even if it doesn’t provide full features for the site, this plan B should provide a minimum working subset. (In an email application, for instance, the service must allow you to send and receive email, even if contracts and archive access is not restored.)

Create stateless systems

One of the lessons Netflix offered was to build stateless systems where possible. That means a new request from the client can be served by any of the available servers, even if the original server to which the client made the request is down. This requires very careful planning during development.

Work on graceful degradation

Your system has a graceful degradation when a certain percentage of failure causes only an equivalent percentage drop in performance instead of bringing down the whole system. To enable graceful degradation, you must detect failures quickly (set  quick timeouts when the system recognizes a failure) and shut down all non-essential features of the system (to save precious resources for critical features).

Create a communication plan to keep customers in the loop

If there is an outage at your CSP, your customers will also be affected. Imagine you are running an online store and the outage has prevented you from shipping. Even if none of the previous disaster management steps worked, you could save some of the bad press by keeping the customers and other stakeholders regularly informed of the status. Identify proper communication channels and create a plan to keep all of them in the loop. This requires having a backup of customer contact data, writing FAQs and preparing your employees to handle questions appropriately.

Having a proper service outage plan is essential to the survival of your business in the long-term. It could save a lot of headaches, not to mention your brand value, when failures do happen.

By Balaji Viswanathan

Sorry, comments are closed for this post.

5 Things To Consider About Your Next Enterprise Sharing Solution

5 Things To Consider About Your Next Enterprise Sharing Solution

Enterprise File Sharing Solution Businesses have varying file sharing needs. Large, multi-regional businesses need to synchronize folders across a large number of sites, whereas small businesses may only need to support a handful of users in a single site. Construction or advertising firms require sharing and collaboration with very large (several Gigabytes) files. Financial services…

The Cancer Moonshot: Collaboration Is Key

The Cancer Moonshot: Collaboration Is Key

Cancer Moonshot In his final State of the Union address in January 2016, President Obama announced a new American “moonshot” effort: finding a cure for cancer. The term “moonshot” comes from one of America’s greatest achievements, the moon landing. If the scientific community can achieve that kind of feat, then surely it can rally around…

Beacons Flopped, But They’re About to Flourish in the Future

Beacons Flopped, But They’re About to Flourish in the Future

Cloud Beacons Flying High When Apple debuted cloud beacons in 2013, analysts predicted 250 million devices capable of serving as iBeacons would be found in the wild within weeks. A few months later, estimates put the figure at just 64,000, with 15 percent confined to Apple stores. Beacons didn’t proliferate as expected, but a few…

Despite Record Breaches, Secure Third Party Access Still Not An IT Priority

Despite Record Breaches, Secure Third Party Access Still Not An IT Priority

Secure Third Party Access Still Not An IT Priority Research has revealed that third parties cause 63 percent of all data breaches. From HVAC contractors, to IT consultants, to supply chain analysts and beyond, the threats posed by third parties are real and growing. Deloitte, in its Global Survey 2016 of third party risk, reported…

Three Reasons Cloud Adoption Can Close The Federal Government’s Tech Gap

Three Reasons Cloud Adoption Can Close The Federal Government’s Tech Gap

Federal Government Cloud Adoption No one has ever accused the U.S. government of being technologically savvy. Aging software, systems and processes, internal politics, restricted budgets and a cultural resistance to change have set the federal sector years behind its private sector counterparts. Data and information security concerns have also been a major contributing factor inhibiting the…

Staying on Top of Your Infrastructure-as-a-Service Security Responsibilities

Staying on Top of Your Infrastructure-as-a-Service Security Responsibilities

Infrastructure-as-a-Service Security It’s no secret many organizations rely on popular cloud providers like Amazon and Microsoft for access to computing infrastructure. The many perks of cloud services, such as the ability to quickly scale resources without the upfront cost of buying physical servers, have helped build a multibillion-dollar cloud industry that continues to grow each…

How The CFAA Ruling Affects Individuals And Password-Sharing

How The CFAA Ruling Affects Individuals And Password-Sharing

Individuals and Password-Sharing With the 1980s came the explosion of computing. In 1980, the Commodore ushered in the advent of home computing. Time magazine declared 1982 was “The Year of the Computer.” By 1983, there were an estimated 10 million personal computers in the United States alone. As soon as computers became popular, the federal government…

How To Overcome Data Insecurity In The Cloud

How To Overcome Data Insecurity In The Cloud

Data Insecurity In The Cloud Today’s escalating attacks, vulnerabilities, breaches, and losses have cut deeply across organizations and captured the attention of, regulators, investors and most importantly customers. In many cases such incidents have completely eroded customer trust in a company, its services and its employees. The challenge of ensuring data security is far more…