April 27, 2011
the Amazon outage set cloud computing back years
I mean this is just plain wrong. Yes the AWS event means people will think long and hard about their architecture. Yes, some enterprises that were toying with the idea of public cloud might pull back for awhile. Yes private cloud providers will use the event ad infinitum to justify private versus public but let’s be a little realistic, it doesn’t spell the end of the cloud.
There are then the other extremes, exemplified by George Reese who says that this is a shining moment and shows that, with proper design, the Cloud can be amazingly resilient. Reese calls out the example of NetFlix that, despite being an AWS customer, had no real issues during the outage because they’ve designed for failure – built their system to be redundant and resilient outside of their providers setup.
Even the ex-president of private cloud, Christian Reilly tells us that;
for the traditional enterprise folks, it really doesn’t take much more than an outage of this nature, combined with some horror stories of how certain customers were catastrophically affected, and paradoxically worrying cases of what it took for certain customers not to be affected, to push the exploration of private cloud further up the to-do lists of many enterprise CIOs.
I know Reilly isn’t himself justifying the event as a gravestone moment for the public cloud but, and here I’m getting a little pent up, these same execs who are decrying he public cloud because it isn’t safe (when in actual fact it’s single zone/region/data center use of the public cloud which isn’t safe) are stepping back and oftentimes relying on private infrastructure sitting in… you guessed it – one data center. In a lovely circular way we’ve just recreated the very risk factors which caused so much impact in the AWS case.
Finally we have Klint Finley over on ReadWriteWeb who does a great job of clearing the FUD away from he issues and contends that the fault for the outage, and ensuing failure of downstream services, is entirely on AWS.
So, where does one look, and what is the prognosis for the cloud?
Well, like Reese, I see the “Amazonoclypse” as a bad event with a silver lining to it. In the most stark of situations, highlight has been made of the need to think beyond one zone, one data center, one region and one provider to build a robust and resilient service.
So, what are the components and solutions needed to build a service that would avoid issues were an outage like the one we saw recently to occur?
All Cloud vendors are quick to point out just how reliable their data centers are with their redundant communication channels, power supply structures and the like. Any application running on the clouds needs to consider the same issues – it is unrealistic to rely completely on one single data center – a chain is only as strong as its weakest link ad by relying on one DC only the idea of multiple redundancies is rendered a fiction.
This one is a little more contentious, and difficult to effect right now. But with the advent of more open standards (OpenStack anyone?), Cloud users have the ability to obtain service across multiple providers. More and more third party solutions are helping with this process.
The real opportunity here is for providers that offer infrastructure-vendor agnostic orchestration and automation services. Case in point Layer7 who came out quickly with a post that explains why their own rules based cloud broker product would have avoided downstream issues from the AWS event. I’ll talk more specifically about the Layer7 offering tomorrow but suffice it to say that third party services management just became very relevant.
So – yes the outage was truly bad. Yes some people got a serious fright from what happened. No we shouldn’t let Amazon off the hook and should expect a very thorough post-mortem. But in no way does this change the landscape for the age old public-private debate.