Cloud Growing Pains – Failure Is Inevitable
Growing up with floppy disks as the standard for storage was not pretty, my school days were filled with corrupt assignments and missing files because the things were very fragile and tended to fail at the slightest sign of an electrical field. And there was no use bringing a backup floppy disk because chances are that would fail too. Not to mention random crashes on very slow computers and you quickly learn to save fast and save often, then keep lots and lots of backup. Now fast forward decades later and we have cheaper and faster computers, reliable flash storage, the internet, and even the Cloud. We do not even need to carry our data with us physically because of online file storage services.
But some things never change, and that is failure and all of its forms. It seems to be looming in every corner, and there is no escape, not for any technology. So the core best practice that any business trying to make it in the Cloud could have, is to expect failure and plan for it. After all, each node whether a server, hard drive or networking equipment consists of mass produced commodity hardware parts that may or may not last years. All Cloud service providers architect and design their systems so that when one or several pieces of equipment fail, the system or environment should be able to recover automatically.
Elasticity and fault tolerance actually go hand in hand. Elasticity requires bootstrapping just like fault tolerance. And the reason to bootstrap may be to meet additional demand (elasticity), or sometimes to replace a box which is having problems (fault tolerance). So basically when a new box is required, one boots and is given orders and then it finds and installs required resources to become what piece of equipment it needs to be. This should happen especially when others start failing.
Even though, the Cloud provider has architected his systems to be elastic and fault tolerant, does not mean that your existing software that you are bringing to the Cloud becomes elastic and Fault tolerant as well. If you want to move to the Cloud you have to architect your application to make full use of the advantage of fault tolerance and elasticity provided by the Cloud. Moving unoptimized spaghetti code to the Cloud is just asking for trouble, just like migrating a set of tightly coupled objects. The Cloud is forcing developers to architect their applications to work and take advantage of the elastic and fault-tolerant nature of the Cloud. And you do not just plan for failure within the Cloud but of the Cloud itself. But it is a fool’s errand to count on a single provider. You should consider using many different providers as elements in your vast enterprise. This also takes care of elasticity and fault tolerance, by removing any single point of failure.
By Abdul Salam