The Top 10 DevOps Metrics: How To Measure What Matters – Part 1
Mon 10 May 2021 | Jeff Keyes
By focusing on 10 important metrics, DevOps stakeholders can more clearly understand and act on the issues that determine the success or failure of any DevOps initiative. By Jeff Keyes, VP of Product at Plutora
DevOps is becoming embedded in the standard approach to software delivery in businesses across the world. Even though its definition varies, at its core, DevOps is a collaborative approach that unifies the development and operations teams in an organisation. To fully understand DevOps, it should be viewed as a culture that guides how different teams in a company work to achieve business goals. Yes, process and tools are key, but they are there to support the wider philosophy.
Despite its widespread adoption and proven impact as a development methodology, many organisations struggle to benchmark their DevOps efforts and results. The result can be that teams don’t fully engage with the process, fail to optimise their collective approach or don’t fully understand its impact from a technical and business perspective.
What’s often lacking are the meaningful performance indicators. But by focusing on 10 important metrics, DevOps stakeholders can more clearly understand and act on the issues that determine the success or failure of any DevOps initiative. Part One of this two-part guide examines the first five:
Monitor Application Availability
Monitoring application availability is a simple and meaningful way to ensure resources are properly devoted toward maintaining uptime, because more often than not, even a bug-riddled application that stays online is preferable to an app that’s completely offline. The rule of thumb is that unless you have alerted users of some planned maintenance downtime, they should have access to your application around the clock.
To measure availability, orchestration tools including Kubernetes and Jenkins are useful, but it’s also important to consider application performance managers (APMs) to properly manage and rollback points.
But, from an availability point of view, what does good look like? You may have seen service-level agreements (SLAs) claim upwards of 90% of app availability, for example, but don’t forget, just because you’re hosting your application on a cloud services provider with 99.99% availability, it doesn’t follow that your app inherits the same availability. As a result, it’s always useful to calculate your own app’s availability before promising any percentages to customers. This is achieved by observing how frequently your application is down and subtracting that frequency from the perfect case of 100%.
Measure Traffic and Application Usage
Assuming that application availability is under control, the next measurement priority is its traffic and usage data. This is important because if your application gets too much traffic, it might fall over under the pressure – a situation which is often avoidable through the use of an intuitive log analyser, which can notify developers whenever that breaking point is near.
It’s also important to keep track of usage statistics as your app version number increases. A drop in usage statistics is an actionable feedback, and could be as a result of changes made that users don’t like. A DevOps team that’s fully engaged with traffic and application data will take steps to quickly correct dips in usage caused by new features.
However, traffic spikes can also happen for other reasons, and despite the general objective out there to ‘maximise’ interest, too much of it will actually cause resource allocation issues. What’s more, understanding what represents normal traffic levels for your app can make it easy to identify serious problems, such as DDoS attacks.
Focus on Support Tickets
For app developers, it’s always a good sign when users ask for help in using it more effectively. One of the most common ways to reach out for help is by submitting tickets to your support team. In the DevOps pipeline, this kind of feedback sits in between coding and tests, and determines what gets coded into the app and the results to expect from tests that follow.
Getting the most from this metric requires a best practice approach. Firstly, any tickets that arise because of the ticketing system itself will create a paradox, and third-party tools commonly used to track tickets and their life cycles can help minimise the problem. Secondly, while it can be tempting to build a ticketing solution in-house, it could end up being more cost-efficient to buy once you add up the time and effort required to design and develop it.
Once you’ve managed to sustain low ticket volumes, your DevOps team can focus on keeping the system fresh. Technically, this involves internal tickets. However, they’re naturally of lower priority than externally generated ones. Measuring this DevOps metric is done by simply reading out the count for any set period of observation.
Record the Commit Count
Commits are changes sent to the main source code file using a version control system (VCS), such as GitHub. The more commits your team is recording, the easier it is to consider them “productive.” It’s important to bear in mind that a commit is only useful when senior developers have reviewed and approved it, hence the emphasis on productivity.
Any VCS you adopt should count the total commits across any period of interest. It’s useful insight because developers with more commits can inspire the rest of the team to improve their output. What’s more, there’s a correlation between the commit count and rate of deployment. Low commits, for instance, can demonstrate why your application versioning is slow, and keeping a close eye on this allows managers to help developers with the lowest commits.
Analyse the Number of Tests
But even more important than the overall commit count is the number of tests conducted on each commit. If you’re having to do multiple tests on a single possible change, chances are the process could be at fault. Some issues arise when tests are carried out manually, and while thorough, a tired tester increases the chance of mistakes when running manual experiments on commits and builds prior to deployment.
Alternatively, some DevOps teams use automated tests spread across as many containers as possible in production environments. This not only reduces the amount of time you have to wait for a session, but also increases the number of concurrent tests done.
In Part Two, we’ll look at five more metrics that can help teams focus on optimising the DevOps process, including deployment speed, rollback rates, and lead times.