BBC Tech

Are electric cars as ‘green’ as you think?

It's predicted that by 2030, over 125 million electric vehicles will be owned by people worldwide. But where's the lithium that powers their batteries coming from? Most of the world's stocks of this lightest of metals are found in brine deep beneath salt flats, high
/
Tech Crunch

A set of new tools can decrypt files locked by Stop, a highly active ransomware

Thousands of ransomware victims may finally get some long-awaited relief. New Zealand-based security company Emsisoft has built a set of decryption tools for Stop, a family of ransomware that includes Djvu and Puma, which they say could help victims recover some of their files. Stop
/
Balaji Viswanathan

Best Practices For A Cloud Service Outage?

Cloud Service Outage

If you are using one of the major CSPs (cloud service providers) you may already be used to major service outages. Amazon EC2, Rackspace, Google Apps and Microsoft Azure have all had their fair share of outages in the past 18 months, and some of them have been huge failures (Amazon’s April 2011 outage lasted 47 hours for some customers), which have brought down sites such as Reddit. However, companies such as Netflix have been able to survive the outages well. In this post, I will cover details on how you can effectively manage a service outage by taking note of best practices from Netflix and other companies that have successfully weathered the storm.

Create a disaster management fire-drill

Fire drills should be an essential part of cloud management. Once a month, have a fire drill to simulate failures in different parts of the system to see how your systems hold up in case of failures. This includes preparing your PR and customer service personnel, instituting quality control processes and executing an executive-level contingency plan to prevent panic from gripping the company.

Have all your data backed up securely

Periodically back up data and store it away from your primary CSP. For example, you could have an Amazon S3 instance to back up your Rackspace cloud installation. This will mitigate against a single point of failure.

Keep another service provider ready

Have another CSP ready to run an instance of your server at short notice, if needed. Even if it doesn’t provide full features for the site, this plan B should provide a minimum working subset. (In an email application, for instance, the service must allow you to send and receive email, even if contracts and archive access is not restored.)

Create stateless systems

One of the lessons Netflix offered was to build stateless systems where possible. That means a new request from the client can be served by any of the available servers, even if the original server to which the client made the request is down. This requires very careful planning during development.

Work on graceful degradation

Your system has a graceful degradation when a certain percentage of failure causes only an equivalent percentage drop in performance instead of bringing down the whole system. To enable graceful degradation, you must detect failures quickly (set  quick timeouts when the system recognizes a failure) and shut down all non-essential features of the system (to save precious resources for critical features).

Create a communication plan to keep customers in the loop

If there is an outage at your CSP, your customers will also be affected. Imagine you are running an online store and the outage has prevented you from shipping. Even if none of the previous disaster management steps worked, you could save some of the bad press by keeping the customers and other stakeholders regularly informed of the status. Identify proper communication channels and create a plan to keep all of them in the loop. This requires having a backup of customer contact data, writing FAQs and preparing your employees to handle questions appropriately.

Having a proper service outage plan is essential to the survival of your business in the long-term. It could save a lot of headaches, not to mention your brand value, when failures do happen.

By Balaji Viswanathan

Balaji Viswanathan Contributor
CEO of Invento Robots
Balaji is the CEO of Invento Robots. Balaji has been writing about technology for several years and is currently featured in many publications around the globe.
Understanding Data Governance - Don't Be Intimidated By It

Understanding Data Governance – Don’t Be Intimidated By It

Understanding Data Governance Data governance, the understanding of the raw data of an organization is an area IT departments have historically viewed as a lose-lose ...
BI Data

The Rise Of BI Data And How To Use It Effectively

The Rise of BI Data Every few years, a new concept or technological development is introduced that drastically improves the business world as a whole ...
Daren Glenister

What’s Next In Cloud And Data Security?

Cloud and Data Security It has been a tumultuous year in data privacy to say the least – we’ve had a huge increase in data ...
The Cloudification of Healthcare: Benefits and Risks

The Cloudification of Healthcare: Benefits and Risks

Cloud Healthcare: Benefits and Risks Many organizations are moving most of their business-critical applications and workloads to the cloud. The healthcare industry is no exception ...
Kris Lahri

What the Dyn DDoS Attacks Taught Us About Cloud-Only EFSS

DDoS Attacks October 21st, 2016 went into the annals of Internet history for the large scale Distributed Denial of Service (DDoS) attacks that made popular Internet ...
It Programs Compressor