What’s Holding DevOps Back

And How Developers and Businesses Can Vault Forward to Improve and Succeed

Developers spend a lot of valuable time – sometimes after being woken up in the middle of the night – fixing bugs and errors. This work is important because code is never perfect. But the time and effort it takes to do this work often creates problems for developers and businesses.

Spending an outsized part of their time investigating and fixing bugs, and contending with errors in code at odd hours, take a toll on developer productivity and happiness and can lead to developer burnout. “Rethinking Productivity in Software Engineering” says that the most frequent causes of developer unhappiness are time pressure and being stuck in problem-solving. The authors’ research found that being unhappy caused breaks in developers’ flow, resulting in adverse effects on process. The book goes on to note that unhappiness can cause employees to remove themselves from daily work tasks – either temporarily or permanently.

Fixing bugs and errors can be onerous for developers, and the business’s customers get products that don’t improve as fast as they could. Also, the longer that it takes a business to resolve bugs and errors, the longer its customers may be impacted by those issues.

Part of the reason for these problems is that most DevOps tools are focused on infrastructure, and that’s only half of the equation. The other half is code. Developers need tools for creating and fixing code, which should include automation to take part of the burden off developers and to speed up the process. Another reason is that businesses and developers have not focused enough on how to reduce the time it takes to resolve code issues and have not considered how important accelerating mean time to resolve (MTTR) is to iterating faster and releasing code more often.

Use The Right Tools for the Job

Existing monitoring and observability tools work well for infrastructure – alerting users when disk space runs out or the network isn’t correctly configured, for example – but not for code.

Infrastructure tools are designed around concepts like services, hosts, metrics and trends. They focus on infrastructure, such as the physical hardware that is abstracted into virtual machines.

The code that developers write and control sits on top of these environments. Tools designed for code do a better job of seeing how code performs across environments and focus on whether the code is working the way the developer intended. Infrastructure tools just assume that the code works and that, if there’s anything wrong, the problem is with the infrastructure – which often is simply not true.

This disconnect is compounded by the fact that in the past, when businesses might only release software once a year, they could spend a lot more time and resources testing code. But now customers are demanding that software improves faster, so teams are shipping more often. This means there is less time to fine-tune code testing and that more issues crop up in production.

That’s fine if developers have the right tools to understand and fix those issues. But they need the right tools for the job, and infrastructure tools are not the right tools. This is akin to doing a home improvement project. You would use a hacksaw to cut metal and a table saw to cut wood.

As organizations look for code-focused tools, they should seek tools that:

Provide real-time signals that they can trust and use for automation
Work with every language they use in their stack and in every environment in which their code runs (not just development, staging or production), including in the cloud, the edge, mobile, on premises and serverless
Meet their security requirements
Can be easily adopted by every developer in their organization

Leverage Automation to Accelerate Improvement

Automation can go a long way in making developers more productive and happier by enabling them to identify and fix problems more easily. Rollbar finds that automation can lead to a 4x increase in debugging speed, shaving down the eight hours a week per developer that a business would spend on debugging to just two hours a week per developer. As a result, developers and their employers can issue new releases faster and with greater frequency.

What a business chooses to automate is up to that organization. A company may want to start by automating simple things, such as the discovery of errors, and then work toward more complex things, such as automatic remediation based on established runhooks. Businesses can also automate processes such as the triaging and assignment of errors.

In seeking automation solutions, businesses should ask potential suppliers about:

The speed and reliability of their automation (Will automated processes happen within a couple of seconds, within a minute or within an hour?) whether, and to what extent, the signals from their solutions can be trusted (Are there lots of false positives or false negatives? Will actions that run automatically be reliable?)
The speed and reliability of the user interface (When developers need to use the tool by hand, is it fast and easy to use?)

Expedite Deployments with Feature Flagging

Imagine that a developer has shipped code to production and then realizes the code is broken and needs to be changed. If it takes the developer 30 minutes or longer to do a new release, 30 minutes is the fastest possible MTTR. However, if the developer can deploy faster – releasing new code within a few minutes or seconds – the best possible MTTR greatly improves.

Businesses are beginning to adopt feature flagging to allow developers to instantly turn off broken code. That means developers will be able to do in one second what today may take them 30 minutes or longer. When businesses combine that capability with automation, they will gain the ability to do automatic remediation that happens within one second.

This is noteworthy because people are used to MTTR being measured in hours or days. Lengthy MTTR results in businesses and developers being cautious and fearful about making changes. That, in turn, often means that their customers don’t get products that improve as quickly as they could – and the business doesn’t learn as quickly about what it should be building. But if MTTR is measured in minutes, a developer can deploy instantly. It’s much safer to make changes. And the business, as a result, has a much greater chance of a successful deployment.

When businesses and their developers have greater visibility into problems in released code, they can pull code back, fix it much more quickly and increase the speed at which they deploy and release. Once they have these capabilities in their workflow, they can go faster and faster.

That’s why forward-looking businesses are taking a holistic view of how to improve code, rather than focusing exclusively on improving the infrastructure. They are adopting tools for code, including automation. And they’re working to improve their time to resolve code issues.

By Brian Rue

Brian Rue

Brian is the CEO and Co-founder of Rollbar, a SF-based provider of real-time error monitoring Software as a Service, where he leads the company’s overall strategy and direction. Brian founded the company with Cory Virok in 2012. Prior to Rollbar, Brian was the CTO and Co-founder of Lolapps, a leading publisher of independent games on social networks and mobile platforms. Brian attended Stanford University where he studied Management Science and Engineering.

Website