How to measure the impact of IT operations on your business (Part 2)
Part 2: Identifying candidate KPIs to evaluate
In my first post in this series, I went through the six steps for putting into place a metrics/KPI program that measures the performance of your IT operations within the context of your business goals.
When consulting, this is usually the point where I stress that we have to work the process in order to come up with KPIs that mean something to your specific business. I explain that there is no such thing as one size fits all in this matter. Despite that, the very next question I’m usually asked: “Can you tell us now what KPIs a company like ours should be measuring?”
Just providing a list of example would probably send them off on the wrong course by chasing KPIs that were important to someone else’s business. Since figuring out what to measure can be as valuable as the actual measurement, I instead walk them through the following concepts to get them started on step #2 and step #3 of the process.
First, stop and consider what “measurement” really means
Measurement: a set of observations that reduce uncertainty where the result is expressed as a quantity
I lifted the above definition from measurement guru, Douglas W. Hubbard. However, if you noodle around in the academic writings on this topic, you’ll see that it’s a fairly accepted definition.
When looking for a way to measure something, keep this definition in mind. Whether its problem solving, allocating budget, or prioritizing your resources, reducing uncertainty gives you a decisive and valuable advantage. You don’t need to have absolute precision. A coarse swing at something is often going to be enough to get started reducing uncertainty and providing business value.
Don’t forget to consider that not every measurement has to be expressed as a simple number (e.g. “137 occurrences” or “83.2% of the time”). You can measure things on an ordinal scale (e.g. “this is less than that” or “this gets 3 out of 4 stars”). You can use nominal measurements where you are are only considering membership in a set (e.g. “this action is in category x, that action is in category y”). Yes/No questions are a valid kind of measurement. You should even consider using subjective methods of measurement (e.g. “do you feel this week was better than last week?”).
Also, don’t expect that every measurement will be made at the same time interval. Sometimes it makes sense to measure certain things on a daily basis. Sometimes it makes sense to measure other things on a quarterly basis.
No matter what type of measurement you end up employing, make sure that it is clear to everyone — even the casual observer — how and why you are measuring something . This is critical for gaining buy-in and avoiding gaming (which both seem like excellent topics for future posts in this series!)
Then use “The Four Buckets” as a guide to start looking for candidate KPIs
At the end of the KPI development process, you are going to be tracking a small set of KPIs that best measure the performance of you IT operations in it’s role supporting your business’s goals. But to get there, you need to start with a larger pool of candidate KPIs. In my experience, most useful measurements tend to fall into one or more of the following categories.
I call these “The Four Buckets”.
Again, keep in mind that at this stage you are looking to surface possible KPIs that will be feed into the rest of the process. The end result will only be a small subset of what you started with (5 – 10 at the most!)
1. Resource Utilization – How resources are allocated and how efficiently they are used. Usually we’re talking about people, but other kinds of resources can fall into this bucket as well.
- How much time do developers and administrators spend on build and deployment activity?
- How much productivity is lost to problems and bottlenecks? What is the ripple effect of that?
- What’s the ratio of ad-hoc change or service recovery activity to planned change?
- What’s the cost of moving a unit of change through your lifecycle?
- What’s the mean time to diagnose a service outage? Mean time to repair?
- What was the true cost of each build or deployment problem (resource and schedule impact)?
- What percentage of Development driven changes require Operations to edit/change procedures or edit/change automation?
- How much management time is spent dealing with build and deployment problems or change management overhead?
- Can Development and QA successfully deploy their own environments? How long does it take per deployment?
- How much of your team’s time is spent recreating and maintaining software infrastructure that already exists elsewhere?
2. Failure Rates – Looking at how often processes, services, or hardware fail is a pretty obvious area of measurement.
- What was the ratio of successful builds to failed or problematic builds?
- What is the ratio of build problems due to poor code vs poor build configuration?
- What was the ratio of successful deployments to failed or problematic deployments?
- What is the ratio of deployment problems due to poor code vs poor deployment configuration or execution?
- What is the mean time between failures?
3. Operations Throughput – The volume and rate at which change moves through your development to operations pipeline.
- How long does it take to get a release from development, through testing, and into production?
- How much of that is actual testing time, deployment time, handoff time, or waiting?
- How many releases can you successfully deploy per period?
- How many successful individual change requests can your operations team handle per period?
- Are any build and deployment activities the rate limiting step of your application lifecycle? How does that limit impact your business?
- How many simultaneous changes can your team safely handle?
- What is business’ perceived “wait time” from code completion to production deployment of a feature?
4. Agility – This looks at how quickly and efficiently your IT operations can react to changes in the needs of your business. This can include change driven by internal or external business pressures. There is often considerable overlap with bucket 3, however this bucket is focused more on changing/scaling processes than it is on the throughput of those processes once in place. (Of course, you can always argue that all four buckets play some role in enabling a business to be more “agile”.)
- How quickly can you scale up or scale down capacity to meet changing business demands?
- What’s the change management overhead associated increasing/decreasing capacity? What’s the risk?
- How quickly and what would it cost to adapt your build and deployment systems to automate any new applications or acquired business lines?
- What would it cost you to handle a x% growth in the number of applications or business lines (direct resource assignment plus any attention drain from other staff)?
- Could your IT operations handle a x% growth in number of applications or business lines? (i.e. could it even be done?)