Redundancy in the Cloud
Dunce, dumb, reductive: these are all words that would appear to have much in common with redundancy, if only in the similar sounds they produce when spoken. The idea of being redundant is one that repels those who read books, follow current events, and who endeavor to consistently bring original, fresh information to the table. Yet in the conversation of cloud computing, redundancy brings peace of mind to those who cringe at cloud's ongoing kinks in data protection and security.
Redundancy in cloud computing can be defined as the supplying of duplicate copies of various data, equipment, systems, or the like, to be used in the event that part of one's cloud computing system fails or cannot be accessed. This redundancy is made available by having fully replicated data several times on multiple computers or units involved in the same data center.
With cloud, there is no longer a need to construct a pricey “high available redundant system,” as one would need to do with a traditional IT operational system, because a fail-first mentality has been inherently built into the structure of cloud computing. The cloud was structured on the understanding that certain components in the system will give out at some point.
Those components most likely to fail include physical disks, power equipment, and memory units. By first ensuring that every file associated with such physical components has been copied three times, most cloud operating systems are protecting themselves in the event that, perhaps, the entire system itself might fall by the wayside.
Amazon's cloud collapse of last year serves as a cogent case study in the importance of redundancy, and the pitfalls of ignoring it. Amazon Web Services had initially seemed to be watertight and completely reliable upon its opening to the public in 2006. Yet in April 2011, the company's data center in Virginia suffered from a devastating bout of issues regarding connectivity and latency. Companies such as HootSuite and Reddit felt the impact of these troubles, and Amazon's redundancy strategy could not prevent a zone-wide outage.
The problem with Amazon's cloud lay in its Availability Zones, which were not properly synchronized with one another. The result: each Zone was not constructed to fail on its own, seguing to a nearly complete system shutdown. The scope of redundancy is ironically quite complex. But ultimately, every cloud enterprise — and startups in particular — to value this concept, lest disaster strike and the operation topples with it.
By Jeff Norman