Paul Mercina

Mitigating the Downtime Risks of Virtualization

Mitigating the Downtime Risks

Nearly every IT professional dreads unplanned downtime. Depending on which systems are hit, it can mean angry communications from employees and the C-suite and often a Twitterstorm of customer ire. See the recent Samsung SmartThings dustup for an example of how much trust can be lost in just one day.

Gartner pegs the financial costs of downtime at $5,600 per minute, or over $300,000 per hour. And a survey by IHC found enterprises experience an average of five downtime events each year, resulting in losses of $1 million for a midsize company to $60 million or more for a large corporation. In addition, the time spent recovering can leave businesses with an “innovation gap,” an inability to redirect resources from maintenance tasks to strategic projects.

The quest for downtime-minimizing technologies remains hot, especially as demand for high-availability IT has grown. Where “four nines” (99.99%) uptime might once have sufficed, five nines or six nines is now expected.

Enter server virtualization, the powerful technology enabling administrators to partition servers, increase utilization rates, and spread workloads across multiple physical devices. It’s a powerful and increasingly popular technology, but it can be a mixed blessing when it comes to downtime.

Mitigating the Downtime

Virtualization Minimizes Some Causes, Exacerbates Some Impacts of Downtime

Virtualization is no panacea, but that’s not a call to reconsider industry enthusiasm for it. Doing so would be unproductive anyway. The data center virtualization market, already worth $3.75 billion in 2017, is expected to grow to $8.06 billion by 2022. For good reason. Virtualization has many advantages, some of them downtime-related. For example, it’s easier to employ continuous server mirroring for more seamless backup and recovery.

These benefits are well documented by virtualization technology vendors like VMWare and in the IT literature generally. Less frequently discussed are the compromises enterprises make with virtualization, which often boil down to an “all eggs in one basket” problem.

What used to be discrete workloads running on multiple, separate physical servers can in a virtualized environment be consolidated to a single server. The combination of server and hypervisor then become a single point of failure, which can have an outside impact on operations for many reasons.

Increased utilization

First of all, today’s virtualized servers are doing more work. According to a McKinsey & Company report, utilization rates in non-virtualized equipment was mired at 6% to 12%, and Gartner research had similar findings. Virtualization can drive that figure up to 30% or 50% and sometimes higher. Even back-of-the-napkin math shows any server outage has several times the impact of yesteryear, simply because there is more compute happening within any given box.

Diverse customer consequences

Prior to virtualization, co-location customers, among others, demanded dedicated servers to handle their workloads. Although some still do, the cloud has increased comfort with sharing physical resources by using virtual machines (VMs). Now a single server with virtual partitions could be a resource for dozens of clients, vastly expanding the business impact of downtime. Instead of talking to one irate individual demanding a refund, customer service representatives could be getting emails, tweets, and calls from every corner.

This holds true for on-premises equipment as well. The loss of a single server could as easily affect the accounting systems the finance department relies on, the CRM system the sales team needs, and resources various customer-facing applications demand, all at the same time. It’s a recipe for help desk meltdown.

Added complexity

According to CIO Magazine, many virtualization projects “have shifted rather than eliminated complexity in the data center.” In fact, of the 16 annual outages per year their survey respondents reported, 11 were caused by system failure resulting from complexity. And the more complex the environment, the more difficult the troubleshooting process can be, which can lead to longer, more harmful downtime experiences.

Thin client

Although not a direct result of virtualization, the industry has made yet another swing of the centralization versus decentralization pendulum. After years of powerful PCs loaded with local applications, we have entered an age of mobile, browser-based, and other very thin client solutions. In many cases, the client does little but collect bits of data for processing elsewhere. Not much can happen at the device level if the cloud-based or other computing resources are unavailable. The slightest problem can result in mounting user frustration as apps crash and error messages are returned.

In summary, the data center of 2018 houses servers that are doing more, for more internal and external customers. At the same time complexity is bringing about downtime risk with problems that can be more difficult to solve, which can lead to extended outages. Although effective failover, backup, and recovery processes can help mitigate the combined effects, these tactics alone are not enough.

Additional Solutions for Minimizing Server Downtime

It may sound old school, but data center managers need to stay focused on IT equipment. These failures account for 40% of all reported downtime. Compare that figure with the 25% caused by human error, whether by internal staff or service providers, and the 10% by cyberattacks. To have the greatest positive effect on uptime, hardware should obviously be the first target.

There are several recommendations data center managers should implement, if they haven’t already done so:

  • Perform routine maintenance regularly. It should go without saying but often doesn’t. Install recommended patches, check for physical issues like airflow blockages, and heed all alerts and warnings. Maintenance is fundamental but it is no less essential. That means training employees, scheduling tasks, and tracking completion. If maintenance can’t happen on time, all the time, seek outside assistance to get it done so available internal resources can focus on strategic projects and those unavoidable fire drills without leaving systems in jeopardy.
  • Monitor your resources. The first you hear of an outage should never be from a customer. Full-time, 24/7 systems monitoring is a must for any enterprise. Fortunately, there are new, AI-driven technologies combining monitoring with advanced predictive maintenance capabilities for immediate fault detection and integrated, quick-turnaround response. Access is less expensive than you might think.
  • Upgrade your break/fix plan. A disorganized parts closet or an eBay strategy won’t work. Rapid access to spares is vital in getting systems back online without delay. Especially for mission critical systems, station repair kits on site or work with a vendor who can do so and/or deliver spares within hours.
  • Invest in expertise. Parts are only part of the equation. There is significant skill involved in troubleshooting systems in these increasingly complex data center environments. The current IT skills gap may necessitate looking outside the enterprise to complement existing engineering capabilities with those of a third-party provider.
  • Test everything. Data centers evolve, but conducting proof-of-principle testing on each workload before any changes are made will cut down on virtualization problems before they happen. By the same token, systems recovery and DR scenarios are unknowns unless they are real-world verified. Try pulling a power cord and see what happens. Does that idea give you pause? It might be time for some enhancements.

There is good news for IT organizations already overwhelmed by demands to maintain more complex environments, execute the digital transformation, and achieve it all with fewer resources and less money, in a tight labor market to boot. Alternatives exist.

Third-party maintenance providers can take on a substantial portion of the equipment-related upkeep, troubleshooting, and support tasks in any data center. With a premium provider on board, it’s possible radically reduce downtime and reach the availability and reliability goals you’d hoped to achieve when you took the virtualization path in the first place.

By Paul Mercina

Paul Mercina

Director of Product Management

Paul Mercina brings over 20 years of experience in IT center project management to Park Place Technologies, where he has been a catalyst for shaping the evolutionary strategies of Park Place’s offering, tapping key industry insights and identifying customer pain points to help deliver the best possible service. A true visionary, Paul is currently implementing systems that will allow Park Place to grow and diversify their product offering based on customer needs for years to come.

His work is informed by more than a decade at Diebold Nixdorf, where he worked closely with software development teams to introduce new service design, supporting implementation of direct operations in a number of countries across the Americas, Asia and Europe that led to millions of dollars in cost savings for the company.

Mercina shares his technology and business expertise as an adjunct professor at Walsh University’s DeVille School of Business, where he instructs courses on business negotiations, business and project management, and marketing.

View Website
Cloud Communications Security: Whose Business Is It, Anyway?

Cloud Communications Security: Whose Business Is It, Anyway?

Cloud Communications Security Don’t count on cloud providers to provide all your UCaaS security It’s official: Unified Communications-as-a-Service (UCaaS) has arrived as a mainstream technology, with one prominent analyst firm (IDC) going so far as ...
How Automation is Changing the Digital Revolution

How Automation is Changing the Digital Revolution

Automated Digital Revolution While the exact moment it began is up for debate (opinions vary wildly on the matter, from the late 1940s to the early 1980s), the digital revolution continues to have a profound ...
Design + Cloud + 3D Printing = Real Objects Anywhere

Design + Cloud + 3D Printing = Real Objects Anywhere

Design + Cloud + 3D Printing Got an idea for a new gadget or do you need a unique part? Just reach out to Staples and they will print out the physical object and ship ...
Tech Trends That Is Shaping the Future: Cloud, IoT and AI

Tech Trends That Is Shaping the Future: Cloud, IoT and AI

Tech Trends Future Do not expect disruptive tech but improvements of existing technologies Tech trends that will shape the future will not be disruptive technologies but development of already available technologies that will progress further ...
How IoT and OT collaborate to usher in the data-driven factory of the future

How IoT and OT collaborate to usher in the data-driven factory of the future

The Data-driven Factory The next BriefingsDirect Internet of Things (IoT) technology trends interview explores how innovation is impacting modern factories and supply chains. We’ll now learn how a leading-edge manufacturer, Hirotec, in the global automotive industry, takes advantage of ...
The Lighter Side Of The Cloud - The Letter "G"
The Lighter Side Of The Cloud - Dial-up Speeds
The Lighter Side Of The Cloud - Playing It Safe
The Lighter Side Of The Cloud - Snowball Effect
The Lighter Of The Cloud - Virtual Lunch Break
The Lighter Side Of The Cloud - Machine Learning
The Lighter Side Of The Cloud - Security Overkill
The Lighter Side Of The Cloud - Turmoil
Comic

CLOUDBUZZ NEWS

Rackspace Launches Kubernetes-as-a-Service with Fully Managed Operations

Rackspace Launches Kubernetes-as-a-Service with Fully Managed Operations

SAN ANTONIO – May 16, 2018 – Rackspace today announced Rackspace Kubernetes-as-a-Service, a highly-available managed service that transforms the way enterprises can utilize new container technologies, accelerating their digital transformation. Rackspace is focused on delivering true transformation ...
Kaspersky Lab to open Swiss data center to combat spying allegations

Kaspersky Lab to open Swiss data center to combat spying allegations

LONDON (Reuters) - Moscow-based Kaspersky Lab plans to open a data center in Switzerland by the end of next year to help address Western government concerns that Russia exploits its anti-virus software to spy on ...
Security in the Cloud—A Little Known Advantage, Actually

Security in the Cloud—A Little Known Advantage, Actually

Okay, I’ll go ahead and say it: Public cloud infrastructures are more secure, and the security is more cost-effective, than the majority of on-premises data centers. That should get the blood flowing. With the word ...