Gmail Outage – Is Cloud Computing To Blame?

“There are no secrets to success. It is the result of preparation, hard work, and learning from failure.”
– Colin Powell, former US Secretary of State and Chairman of US Joint Chiefs of Staff.

Unless you are living under a rock, you must have come across the story of the Gmail outage a fortnight back. The question several observers are asking is, “Is cloud computing to blame?

First, here are the facts. On Sunday, 27 February, Gmail became inaccessible for some hours. When it did come back, several users were surprised to see their inboxes empty, other than two welcome messages from Google. Needless to say, they weren’t very pleased with it.

What added to their consternation was Google’s response, or rather lack of it. While it did put up a blog post assuring Gmail users that only 0.02% of users have been affected and that data recovery efforts were in progress, it did not give a timeline for the restoration to be completed. And there was good reason for Google’s reticence in declaring a time frame, considering it took them more than three days to solve the problem.

On Tuesday, 1 March, there were two updates at an interval of four hours. The first one said that service “has already been restored for some users, and we expect a resolution for all users in the near future. Please note this time frame is an estimate and may change.” This was followed by the update that “data for the remaining 0.012% of affected users has been successfully restored from tapes and is now being processed. We plan to begin moving data into mailboxes in 2 hours, and in the hours that follow users will regain access to their data. Accounts with more mail will take more time.

Finally, on Wednesday night (2 March), Google reported, “The problem with Google Mail should be resolved. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.”

Now, Gmail is perhaps the most widely-used cloud computing-based service in the world today. Therefore, when it fails, even if only for three days, it does give rise to doubts on the dependability of the technology itself. Especially since Google had to resort to good old tapes to resolve the problem.

As for my opinion, I don’t believe that cloud computing as a whole should be blamed here. Yes, there were issues, but to use those as an excuse to deride or abandon cloud computing is akin to “throwing the baby out with the bathwater.” While multiple data centers as used in a cloud do provide redundancy, that quality is seriously undermined if the same software bug is allowed to penetrate each one of these data centers, as happened in this case.

Let’s look at one of the proposed alternatives to a cloud-based Gmail. Suppose all Gmail archival data is stored on tapes in a physical location, and that location is struck by an earthquake which destroys all the tapes, then how is that option any safer?

Here’s what CBS News technology correspondent Larry Magid had to say on the matter: “Look, anything has its risks. Any time there is an air disaster there are people who talk about why we shouldn’t fly, and then there are others who say, yes, but if you look at it compared to all the other ways of traveling, it’s much, much safer. I think that’s kind of what we’re dealing with right now. We did have – I wouldn’t call it a disaster – we had a serious problem, and it’s causing people like yourselves to question cloud computing. But you also have to look at it from the other side. For example, I am very capable of messing up my own system, thank you very much. Hard drives can crash; there are all sorts of risks on any type of technology. If you trade the hard drive for the cloud, in my opinion, you greatly decrease the risk of losing email.”

What people seem to forget is that cloud computing is a new technology scarcely a decade old, and consequently, vulnerable to teething problems (See: A History of Cloud Computing). Although I am in favor of the technology, I don’t deny the existence of manageable risks (See: Cloud Computing Risks (And How to Deal With Them). However, if weighed against the multiple benefits cloud computing offers (scalability, efficiencies, savings, etc,), the risks are definitely worth it.

As General Powell said, (the quote at the beginning of the article) learning from failure is one of the crucial ingredients of success. As long as Google learns from this incident, especially of the importance of true redundancy, cloud computing’s future is secure.

By Sourya Biswas

