Is Your Company Ready For A Cloud Service Outage?

Is Your Company Ready For A Cloud Service Outage?

If you are using one of the major CSPs (cloud service providers) you may already be used to major service outages. Amazon EC2, Rackspace, Google Apps and Microsoft Azure have all had their fair share of outages in the past 18 months, and some of them have been huge failures (Amazon’s April 2011 outage lasted 47 hours for some customers), which have brought down sites such as Reddit. However, companies such as Netflix have been able to survive the outages well. In this post, I will cover details on how you can effectively manage a service outage by taking note of best practices from Netflix and other companies that have successfully weathered the storm.

Create a disaster management fire-drill

Fire drills should be an essential part of cloud management. Once a month, have a fire drill to simulate failures in different parts of the system to see how your systems hold up in case of failures. This includes preparing your PR and customer service personnel, instituting quality control processes and executing an executive-level contingency plan to prevent panic from gripping the company.

Have all your data backed up securely

Periodically back up data and store it away from your primary CSP. For example, you could have an Amazon S3 instance to back up your Rackspace cloud installation. This will mitigate against a single point of failure.

Keep another service provider ready

Have another CSP ready to run an instance of your server at short notice, if needed. Even if it doesn’t provide full features for the site, this plan B should provide a minimum working subset. (In an email application, for instance, the service must allow you to send and receive email, even if contracts and archive access is not restored.)

Create stateless systems

One of the lessons Netflix offered was to build stateless systems where possible. That means a new request from the client can be served by any of the available servers, even if the original server to which the client made the request is down. This requires very careful planning during development.

Work on graceful degradation

Your system has a graceful degradation when a certain percentage of failure causes only an equivalent percentage drop in performance instead of bringing down the whole system. To enable graceful degradation, you must detect failures quickly (set  quick timeouts when the system recognizes a failure) and shut down all non-essential features of the system (to save precious resources for critical features).

Create a communication plan to keep customers in the loop

If there is an outage at your CSP, your customers will also be affected. Imagine you are running an online store and the outage has prevented you from shipping. Even if none of the previous disaster management steps worked, you could save some of the bad press by keeping the customers and other stakeholders regularly informed of the status. Identify proper communication channels and create a plan to keep all of them in the loop. This requires having a backup of customer contact data, writing FAQs and preparing your employees to handle questions appropriately.

Having a proper service outage plan is essential to the survival of your business in the long-term. It could save a lot of headaches, not to mention your brand value, when failures do happen.

By Balaji Viswanathan

Sorry, comments are closed for this post.

Comic
When Sci-Fi Predictions Come To Fruition

When Sci-Fi Predictions Come To Fruition

Evolution of Technologies To paraphrase science fiction author Arthur C. Clark, those who make predictions about the future are either “considered conservative now and mocked later, or mocked now and proved right when they are no longer around to enjoy the acclaim.” The one thing we can be sure about, Clark ventured, is that “[the…

Facebook Hopes To Extend Internet Connectivity With Solar-Powered Drones

Facebook Hopes To Extend Internet Connectivity With Solar-Powered Drones

Facebook Inc (FB.O) said on Thursday it had completed a successful test flight of a solar-powered drone that it hopes will help it extend internet connectivity to every corner of the planet. Aquila, Facebook’s lightweight, high-altitude aircraft, flew at a few thousand feet for 96 minutes in Yuma, Arizona, Chief Executive Mark Zuckerberg wrote in…

When Will Women In Tech Become The Norm?

When Will Women In Tech Become The Norm?

Tech Diversity It is well known that the technology industry has been dominated by men, but it is also clear that the industry is working to change that. Diversity in the tech industry, especially where it applies to women in tech, has been a topic of discussion for years. Recently the Washington Technology Industry Association…

Four Keys For Telecoms Competing In A Digital World

Four Keys For Telecoms Competing In A Digital World

Competing in a Digital World Telecoms, otherwise largely known as Communications Service Providers (CSPs), have traditionally made the lion’s share of their revenue from providing pipes and infrastructure. Now CSPs face increased competition, not so much from each other, but with digital service providers (DSPs) like Netflix, Google, Amazon, Facebook, and Apple, all of whom…

Edtech and Virtual Reality – Exciting Learning Environment

Edtech and Virtual Reality – Exciting Learning Environment

Customizing Edutech Customized edtech learning solutions are becoming more commonplace as the education industry recognises their potential and begins transforming the traditional structures so as to incorporate innovative developments. From textbooks to tablets, chalkboards to virtual reality, edtech promises not only dynamic and exciting learning environments but better learning strategies and solutions. Virtual Reality and…

Are CEO’s Missing Out On Big Data’s Big Picture?

Are CEO’s Missing Out On Big Data’s Big Picture?

Big Data’s Big Picture Big data allows marketing and production strategists to see where their efforts are succeeding and where they need some work. With big data analytics, every move you make for your company can be backed by data and analytics. While every business venture involves some level of risk, with big data, that risk…

Which Is Better For Your Company: Cloud-Based or On-Premise ERP Deployment?

Which Is Better For Your Company: Cloud-Based or On-Premise ERP Deployment?

Cloud-Based or On-Premise ERP Deployment? You know how enterprise resource management (ERP) can improve processes within your supply chain, and the things to keep in mind when implementing an ERP system. But do you know if cloud-based or on-premise ERP deployment is better for your company or industry? While cloud computing is becoming more and…

Why Security Practitioners Need To Apply The 80-20 Rules To Data Security

Why Security Practitioners Need To Apply The 80-20 Rules To Data Security

The 80-20 Rule For Security Practitioners  Everyday we learn about yet another egregious data security breach, exposure of customer data or misuse of data. It begs the question why in this 21st century, as a security industry we cannot seem to secure our most valuable data assets when technology has surpassed our expectations in other regards.…

The Cloud Is Not Enough! Why Businesses Need Hybrid Solutions

The Cloud Is Not Enough! Why Businesses Need Hybrid Solutions

Why Businesses Need Hybrid Solutions Running a cloud server is no longer the novel trend it once was. Now, the cloud is a necessary data tier that allows employees to access vital company data and maintain productivity from anywhere in the world. But it isn’t a perfect system — security and performance issues can quickly…

Data Breaches: Incident Response Planning – Part 1

Data Breaches: Incident Response Planning – Part 1

Incident Response Planning – Part 1 The topic of cybersecurity has become part of the boardroom agendas in the last couple of years, and not surprisingly — these days, it’s almost impossible to read news headlines without noticing yet another story about a data breach. As cybersecurity shifts from being a strictly IT issue to…

Is The Fintech Industry The Next Tech Bubble?

Is The Fintech Industry The Next Tech Bubble?

The Fintech Industry Banks offered a wide variety of services such as payments, money transfers, wealth management, selling insurance, etc. over the years. While banks have expanded the number of services they offer, their core still remains credit and interest. Many experts believe that since banks offered such a wide multitude of services, they have…

Disaster Recovery And The Cloud

Disaster Recovery And The Cloud

Disaster Recovery And The Cloud One of the least considered benefits of cloud computing in the average small or mid-sized business manager’s mind is the aspect of disaster recovery. Part of the reason for this is that so few small and mid-size businesses have ever contemplated the impact of a major disaster on their IT…

Cloud Security: The Top 8 Risks According To ENISA

Cloud Security: The Top 8 Risks According To ENISA

Cloud Security Risks Does security in the cloud ever bother you? It would be weird if it didn’t. Cloud computing has a lot of benefits, but also a lot of risks if done in the wrong way. So what are the most important risks? The European Network Information Security Agency did extensive research on that,…

Containerization: The Bold Face Of The Cloud In 2016

Containerization: The Bold Face Of The Cloud In 2016

Containerization And The Cloud “Right now, the biggest technology shift in the cloud is a rapid evolution from simple virtual machine (VM) hosting toward containerization’’ says the CTO of Microsoft Azure, Mark Russinovitch, a man who deals with the evolving cloud infrastructure every day. In his words, containerization is “an incredibly efficient, portable, and lightweight…

5 Reasons Why Your Startup Will Grow Faster In The Cloud

5 Reasons Why Your Startup Will Grow Faster In The Cloud

Cloud Startup Fast-tracking Start-ups face many challenges, the biggest of which is usually managing growth. A start-up that does not grow is at constant risk of failure, whereas a new business that grows faster than expected may be hindered by operational constraints, such as a lack of staff, workspace and networks. It is an unfortunate…

How Your Startup Can Benefit From Cloud Computing And Growth Hacking

How Your Startup Can Benefit From Cloud Computing And Growth Hacking

Ambitious Startups An oft-quoted statistic, 50% of new businesses fail within five years. And the culling of startups is even more dramatic, with an estimated nine out of ten folding. But to quote Steve Jobs, “I’m convinced that about half of what separates the successful entrepreneurs from the non-successful ones is pure perseverance.” So while…