Solving Problems On The Cloud Part 1: Netflix’s Example Of Strategic Problem Solving
Technical, marketing, and personnel, are all problems you may be facing in the cloud today. To deal successfully with these issues we must avoid the common mistake of fixating on the immediate issues surrounding the problem and start taking steps to move the roadblocks out of the way. Netflix demonstrated this when they avoided a prolonged cloud outage courtesy of their friends at Amazon.
Cognitive science defines a problem as an obstacle in the present that is blocking a goal in the future. That’s a reasonable explanation but it’s not enough to solve anything. Let’s break it down a little more. A problem can also be well defined or poorly defined. The first step is to define the problem in a concrete proactive way.
Netflix had a problem. When their users want to watch a movie they don’t want to wait until Amazon fixes a cloud outage.
This issue could result in a serious loss of business. This observation is a good description of a problem but a poorly defined one. Transforming a problem description into a well-defined problem focuses our attention toward the best possible solution and keeps us off the path of murky and unclear solutions that find us staring at our screen fixating on issues we can do nothing about.
Moving Towards a Solution: Crafting a Well Defined Problem
A well-defined problem is one that leads to specific steps that can be realistically applied towards a specific end. At some point the professionals at Netflix transformed their ill-defined problem, outages upset customers who may cancel services, to a well-defined problem, how can we anticipate cloud outages so we can switch to another zone before our customer’s service in interrupted? Their answer was to develop Asgard technology that produces code that changes and manages Amazon resources better than the console provided by Amazon.
This seems simple enough but only because thinking through the problem description resulted a more concrete direction to solve what became a well-defined problem. We can imagine that in reality there were dozens of possible solutions. The Netflix team had to choose wisely lest the problem continue or even worsen. Still, this type foundational re-ordering of the problem is a solid first step toward solving any issue.
During the last week of October Asgard was put to the test as Netflix faced an outage. Within 20 minutes service was restored. As we are put to test when it comes to problem solving skills, let’s make sure we begin with a well-defined problem before we begin processing scenarios. Next time we will discuss the Gestalt approach to restructuring a problem to add another tool to our problem solving tool mind set.
By Don Cleveland