View All Videos

Archive for the ‘KPIs and Metrics’ Category

DevOps Days Mountain View 2011: DevOps Metrics & Measurement (Video)

Damon Edwards / 

Panel at DevOps Days Mountain View 2011 on DevOps Metrics and Measurement.

Alexis Lê-Quôc (Datadog)
Patrick Debois (Jedi)
Vladimir Vuksan (Broadcom)
Brian Doll (New Relic)
Laurie Denness (Etsy)
Andrew Shafer (Cloudscaling)

Moderator: John Allspaw (Etsy)

See all videos from DevOps Days Mountain View 2011

DevOps Days Mountain View:
http://devopsdays.org/events/2011-mountainview/

Special thanks to LinkedIn for hosting DevOps Days Mountain View 2011.

Also, thank you to the sponsors:
AppDynamics  DTO Solutions  Google  MaestroDev  New Relic  Nolio
O’Reilly Media  PagerDuty  Puppet Labs  Reactor8  Splunk  StreamStep
ThoughtWorks  Usenix

 

More Video with John Allspaw at Etsy: Dashboard tour & metrics discussion

Damon Edwards / 

Here’s another video I shot with John Allspaw at Etsy’s Brooklyn, NY offices…

00:00 – John gives a tour of the big dashboards

02:42 – Zoom in on key deployment dashboard (yes, that says 24.6 code deploys per day!)

04:52 – John jumps on the whiteboard to explain what goes into their metrics efforts 

See the previous video with John.

Video with John Allspaw at Etsy: What to put on the big dashboards

2

Damon Edwards / 

I had the pleasure of visting the folks at Etsy recently. Etsy is well known in DevOps circles for their Continuous Deploymet and Metrics efforts.

I’ll be posting more videos soon, but in the meantime here is a quick hit I recorded with John Allspaw while he gave me a tour of their Brooklyn HQ. I had just asked John about how they decide what to put on the big dashboards on the wall.

Here is what he told me…

The Five Why’s of Cloud

The Five Why’s of Cloud

7

John Willis / 

 

Editors note: This is the first post by our newest dev2ops.org contributor, John Willis.

 

 In my “cloudy” travels from Cannonical, to Opscode, to DTO Solutions, I often ask a seemingly simple question to the companies I meet – “Why do you want to use the cloud?” – and I get an array of amazingly unsatisfying answers.

Some people cite the classic 8-weeks-to-8-minute provisioning use case. Some people regurgitate the analyst answer of “for the ROI of (insert magic number here)” . Of course there is always the CAPex vs OPex answer. Now I am no financial type, but I’ve been around enough to know that in many types of businesses, OPex is more of a dirty word to the CFO than is CAPex.

Every once in a while I get a more informed answer like “I want to go to the cloud for agility”. But the usual lack of depth behind the answer proves to be just as unsatisfying.

 

The 5 Whys Epiphany

The other day I was having a conversation with my good friend Michael Cote and it hit me as to why I never get the depth of answer that I am am looking for. I have not been asking the right question(s).

Many of of us in our industry have learned and adapted many techniques and methodologies from Toyota’s lean manufacturing models. One of my favorites is the applying the “Five Whys” technique to determine a root cause of a problem. Therefore, I started applying the 5W’s during my informal surveys.

Here is an example of one of those conversations:

Question 1 – Why do you want the Cloud?

Answer: To decrease provisioning time.

Question 2 – Why?

Answer: So we can get servers provisioned in 8 minutes instead of 8 weeks.

Question 3 – Why?

Answer: So developers can get resources quicker to get there job done faster.

Question 4 – Why?

Answer: So they can get features out to customers faster.

Question 5 – Why?

Answer: So we make more money faster.

Aha, so that is why they wanted their cloud, to make more money and make it faster. At this point if I had more time I would start another line of questions (unfortunately I usually don’t) that begins with “Will the cloud do that for you?”.

Any of you who have been working with the cloud in a business for more than a year already know the answer to that question – Not out of the box it won’t. But dispelling the myth that Cloud alone solves all problems is not my point here.

My point is that the real question/answer I should have gotten to my initial “Why Cloud” question is that last question/answer after 5W’s. The real answer regarding “Why Cloud” should not focus on technical feature, it should be be based on specific business goals. Ask the 5W’s until those true goals are clear to everyone. Don’t doubt the power of “Why?”.

 

Begin With the End in Mind

They now teach to kids in elementary schools the idea of beginning with the end in mind. Its a lesson we can all learn from.

I like to use my cell phones in Russia analogy. When Eastern Europe opened up to capitalism, they didn’t go out and order land line phones all over the place. They knew from recent history that it was much easier to create new phone networks via cell phones. The began with the end in mind.

All too often, smart people lose site of this simple idea when it comes to the cloud. Getting a cloud is not the goal. The goal is to achieve a specific business objective (i.e., the root cause of why you want a cloud). Therefore if your real goal is “We want to make more money faster” then let’s map out the full path to get there.

I often describe the cloud as a big fence that comes just up to everyones eyebrows. It looks really cool but you can’t quite see what’s over the fence. I try to urge people to use a step ladder to look over the fence and see the longer journey. Most people will see that whats beyond the fence is not the end of the journey, but just the beginning.

Depending on what your business goal is, the end state which probably will include a cloud. But in almost every case, the end state will require far more than just a cloud.

 

In Cloud We Trust

I often hear in my travels the remark that “I thought the cloud did that”. It amazes me that an organization will first implement a cloud and then start asking questions like “where is the autoscaling part” or “where are the automated load balancers” or “where is the push button application deployment”. I call this the being blinded by the cloud pixie dust.

When the pixie dusts settles and they examine what their cloud really delivers, typically they will find that they are still missing what my good friend Damon Edwards calls the “abilities”. Around you cloud you are still going to need to create a strong operational infrastructure that delivers scalability, manageability, availability, reliability, flexibility, and – last but not least – agility.

In my forever quest to find the perfect cloud, I have yet to find one that comes with one big red “Abilities Button”. Sure the cloud will help with some of these — and some clouds provide more than others — but the cloud is just one tool that you will need.

Just like before the cloud, successful cloud-based operations include a hand full off glue technologies and a whole lot of additional sweat equity.

When analyzing your requirements, don’t let you thinking stop at the word “cloud”. Focus on what that actually means. Focus on ideas like:

  • Utility based infrastructure
  • Self-service resource allocation
  • Self-managing infrastructure
  • Software development life cycle support
  • Behavior driven availability

The cloud is just one part of the equation. Clouds are useful tools, but not the magic bullet. My suggestion, therefore, is to always apply the Five Why’s to make sure you know where you are going.

What is DevOps?

39

Damon Edwards / 

Update 1: Wikipedia now has a pretty good DevOps page

Update 2: Follow-up posts on the business problems that DevOps solves and the competitive business advantage that DevOps can provide.


 

If you are interested in IT management — and web operations in particular — you might have recently heard the term “DevOps” being tossed around. The #DevOps tag pops up regularly on Twitter. DevOps meetups and DevOpsDays conferences, are gaining steam.

DevOps is, in many ways, an umbrella concept that refers to anything that smoothes out the interaction between development and operations. However, the ideas behind DevOps run much deeper than that.

 

What is DevOps all about?

DevOps is a response to the growing awareness that there is a disconnect between what is traditionally considered development activity and what is traditionally considered operations activity. This disconnect often manifests itself as conflict and inefficiency.

As Lee Thompson and Andrew Shafer like to put it, there is a “Wall of Confusion” between development and operations. This “Wall” is caused by a combination of conflicting motivations, processes, and tooling.

 

Development-centric folks tend to come from a mindset where change is the thing that they are paid to accomplish. The business depends on them to respond to changing needs. Because of this relationship, they are often incentivized to create as much change as possible.

Operations folks tend to come from a mindset where change is the enemy.  The business depends on them to keep the lights on and deliver the services that make the business money today. Operations is motivated to resist change as it undermines stability and reliability. How many times have we heard the statistic that 80% of all downtime is due to those self-inflicted wounds known as changes?

Both development and operations fundamentally see the world, and their respective roles in it, differently. Each believe that they are doing the right thing for the business… and in isolation they are both correct!

To make matters worse, development and operations teams tend to fall into different parts of a company’s organizational structure (often with different managers and competing corporate politics) and often work at different geographic locations.

Adding to the Wall of Confusion is the all too common mismatch in development and operations tooling. Take a look at the popular tools that developers request and use on a daily basis. Then take a look at the popular tools that systems administrators request and use on a daily basis. With a few notable exceptions, like bug trackers and maybe SCM, it’s doubtful you’ll see much interest in using each others tools or significant integration between them. Even if there is some overlap in types of tools, often the implementations will be different in each group.

Nowhere is the Wall of Confusion more obvious than when it comes time for application changes to be pushed from development operations. Some organizations will call it a “release” some call it a “deployment”, but one thing they can all agree on is that trouble is likely to ensue. The following scenario is generalized, but if you’ve ever played a part in this process it should ring true.

Development kicks things off by “tossing” a software release “over the wall” to Operations. Operations picks up the release artifacts and begins preparing for their deployment. Operations manually hacks the deployment scripts provided by the developers or creates their own scripts. They also hand edit configuration files to reflect the production environment, which is significantly different than the Development or QA environments. At best they are duplicating work that was already done in previous environments, at worst they are about to introduce or uncover new bugs.

Operations then embarks on what they understand to be the currently correct deployment process, which at this point is essentially being performed for the first time due to the script, configuration, process, and environment differences between Development and Operations. Of course, somewhere along the way a problem occurs and the developers are called in to help troubleshoot. Operations claims that Development gave them faulty artifacts. Developers respond by pointing out that it worked just fine in their environments, so it must be the case that Operations did something wrong. Developers are having a difficult time even diagnosing the problem because the configuration, file locations, and procedure used to get into this state is different then what they expect (if security policies even allow them to access the production servers!).

Time is running out on the change window and, of course, there isn’t a reliable way to roll the environment back to a previously known good state. So what should have been an eventless deployment ended up being an all-hands-on-deck fire drill where a lot of trial and error finally hacked the production environment into a usable state.

While deployment is the most obvious pain point, it is only one part of the need for DevOps. As John Allspaw points out, the need for cooperation between development and operations starts well before and continues long after deployment.

 

What’s the benefit of DevOps?

DevOps is a powerful idea because it resonates on so many different levels.

From the perspective of individuals toiling in hands-on development or operational roles, DevOps points towards a life that is free from the source of so many of their hassles. It’s by no means a magical panacea, but if you can make DevOps work you are removing barriers that are both a significant time-sink and a source of morale killing frustration. It’s a simple calculation to make: invest in making DevOps a reality and we all should be more efficient, increasingly nimble, and less frustrated. Some may argue that DevOps is a lofty or even farfetched goal, but it’s difficult to argue that you shouldn’t try.

 

For the business, DevOps contributes directly to enabling two powerful and strategic business qualities, “business agility” and “IT alignment”. These may not be terms that the troops in the IT trenches worry about on a daily basis, but they should definitely get the attention of the executives who approve the budgets and sign the checks.

A simple definition of IT alignment is “a desired state in which a business organization is able to use information technology (IT) effectively to achieve business objectives — typically improved financial performance or marketplace competitiveness” [source].

DevOps helps to enable IT alignment by aligning development and operations roles and processes in the context of shared business objectives. Both development and operations need to understand that they are part of a unified business process. DevOps thinking ensures that individual decisions and actions strive to support and improve that unified business process, regardless of organizational structure.

A simple definition of agility in a business context is the “ability of an organization to rapidly adapt to market and environmental changes in productive and cost-effective ways” [source].

Of course, developers also have their own specialized meaning of the word “agile“, but the goals are very similar. Agile development methodologies are designed to keep software development efforts aligned with customer/company goals and produce high quality software despite changing requirements. For most organizations, Scrum, the iterative project management methodology, is the face of Agile.

Agile promises close interaction and fast feedback between the business stakeholders making the decisions and the developers acting on those decisions. If you look at the output of a well functioning Agile development group you should see a steady stream improvement that is in tune with business needs.

However, when you step back and look at the entire development-to-operations lifecycle from an enterprise point of view, that Agile stream and it’s associated benefits are often obscured. The Wall of Confusion leads to a dissociation of the application lifecycle. Development works at one pace and Operations works at another. The long intervals between production deployments, in effect, turn the Agile efforts of an organization right back into the waterfall lifecycle it was trying to avoid. No matter how Agile the development organization is, it’s exceedingly difficult to change the slow and lumbering nature of a business while the Wall of Confusion is in place. Andrew Rendell has a great post that tells the anecdotal story of how an organization’s cumbersome release processes turn their agile development efforts right back into a waterfall.

DevOps enables the benefits of Agile development to be felt at the organizational level. DevOps does this by allowing for fast and responsive, yet stable, operations that can be kept in sync with the pace of innovation coming out of the development process.

If you are seeking to establish a DevOps project within your organization, be sure to keep the terms “IT alignment” and “business agility” in mind.

 

How do we bring DevOps to life?

Like most emerging topics, it’s easier to find a consensus about the problem than it is about the solution.

If you listen to the current DevOps conversations, there does appear to be 3 areas of focus for DevOps related solutions:

1. Measurement and incentives to change culture – Changing culture and reward systems is never easy. However, if you don’t change your organization’s culture, fulfilling the promise of DevOps will be difficult, if not impossible.  When looking to influence culture in a business organization, you need to pay close attention to how you measure and judge performance. What you measure influences and incentivizes behavior. All parties across the development-to-operations lifecycle need to understand their stake in the larger business process of which they are a part. The success of both individuals and groups needs to be measured within the context of the success of the entire development-to-operations lifecycle. For many organizations this is a shift from more of a siloed approach to performance measurement, where each group measures and judges performance based on what matters to that specific group. This previous post I wrote dives deeper into the process for getting the correct end-to-end view of measurement into place.

2. Unified processes – The important theme of DevOps is that the entire development-to-operations lifecycle must be viewed as one end-to-end process. Individual methodologies can be followed for individual segments of that processes (such as Agile on one end and Visible Ops on the other), so long as those processes can be plugged together to form a unified process (and, in turn, be managed from that unified point-of-view). Much like the question of measurement and incentives, each organization will have slightly different requirements for achieving that unified process. Here is an excellent post by Six Sigma Blackbelt Ray Riescher on his experience bridging Scrum and ITIL.

3. Unified tooling –  This is the area in which most of the DevOps discussion has been focused. This isn’t surprising since it seems to be the natural reflex of technologists, for better or for worse, to jump straight into tooling discussions when looking to solve a problem. If you follow the communities of tools like Puppet, Chef, or ControlTier then you are probably already aware of the significant focus on bridging development and operations tooling. “Infrastructure as code”, “model driven automation”, and “continuous deployment” are all concepts that would fall under the DevOps banner. Alex Honor wrote a good post about some of the design patterns that toolsmiths working on DevOps tools need to worry about.

Jake Sorofman does a great job with the following overview of what types of tooling is required to make DevOps a reality:

A version-controlled software library—which ensures all system artifacts are well defined, consistently shared, and up to date across the release lifecycle. Development and QA organizations draw from the same platform version, and production groups deploy the exact same version that has been certified by QA.

Deeply modeled systems—where a versioned system manifest describes all of the components, policies and dependencies related to a software system, making it simple to reproduce a system on demand or to introduce change without conflicts.

Automation of manual tasks—taking the manual effort out of processes like dependency discovery and resolution, system construction, provisioning, update and rollback. Automation—not hoards of people—becomes the basis for command and control of high-velocity, conflict-free and massive-scale system administration.

It’s essential that all individual tools be considered part of a larger toolchain that spans the entire Development to Operations lifecycle (even if tight technical integration isn’t a option). Tool choice and implementation decisions (on both the toolchain and individual tool levels) need to be made in the context of their impact on that end-to-end lifecycle.  If you are wondering how that is done, take a look at this example of an open source fully automated provisioning toolchain that can be plugged into a larger Development to Operations toolchain.

 

What DevOps is not!

At the recent OpsCamp Austin, Adam Jacob from OpsCode/Chef railed against the idea that some system administrators were now seeking to change their job title to “DevOps”. I have to admit that, at the time, I was a bit skeptical that this was actually happening. However, I have since witnessed people on multiple occasions expressing this desire to rewrite job titles or establish DevOps as some sort of new role to be filled.

For example, Stephen Nelson-Smith wrote an excellent post about DevOps. While I agree with almost everything he said, I have to strongly disagree with the idea that DevOps should be a unique position or job title.

Turning “DevOps” into a new job title or special role sets a dangerous precedent. This makes DevOps someone else’s problem. You’re a DBA? Don’t worry about DevOps, that’s the DevOps team’s problem. You’re a security expert? Don’t worry about DevOps, that’s the DevOps team’s problem.

Think of it this way. You wouldn’t say “I need to hire an Agile” or “I need to hire a Scrum” or “I need to hire an ITIL” would you? No, you would just say I need to hire developers, project managers, testers, or systems administrators who understand these concepts and methodologies. DevOps is no different.

 

Why the name “DevOps”?

Probably because it’s catchy. It’s also a good mental image of the concept at the widest scale — when you bring Dev and Ops together you get DevOps. There has been other terms for this idea, such as Agile Operations, Agile Infrastructure, and Dev2Ops (a term we’ve been using on this blog since 2007). There is also plenty of examples of people arriving at the idea of DevOps on their own, without calling it “DevOps”. For an excellent example of this, read this recent post by Ernest Mueller or watch John Allspaw and John Hammond’s seminal presentation “10+ Deploys Per Day: Dev and Ops Cooperation at Flickr” from Velocity 2009.

For better or for worse, DevOps seems to be the name that is catching peoples’ imaginations. I credit the efforts of Patrick Dubois for championing the term “DevOps”, bringing the first DevOps Days conference to a (successful) reality, and maintaining the devops.info site.

Be sure to join in the DevOps conversation at the upcoming DevOps Day USA conference on June 25, 2010 in Mountain View, CA. It’s the day after O’Reilly’s Velocity 2010 conference, so be sure to hit both!

 

How to measure the impact of IT operations on your business (Part 2)

2

Damon Edwards / 

Part 1: Putting a metrics/KPI program into place in 6 steps 

Part 2: Identifying candidate KPIs to evaluate

In my first post in this series, I went through the six steps for putting into place a metrics/KPI program that measures the performance of your IT operations within the context of your business goals.

When consulting, this is usually the point where I stress that we have to work the process in order to come up with KPIs that mean something to your specific business. I explain that there is no such thing as one size fits all in this matter. Despite that, the very next question I’m usually asked: “Can you tell us now what KPIs a company like ours should be measuring?”

Just providing a list of example would probably send them off on the wrong course by chasing KPIs that were important to someone else’s business. Since figuring out what to measure can be as valuable as the actual measurement, I instead walk them through the following concepts to get them started on step #2 and step #3 of the process.

 

First, stop and consider what “measurement” really means

Measurement: a set of observations that reduce uncertainty where the result is expressed as a quantity

 

I lifted the above definition from measurement guru, Douglas W. Hubbard. However, if you noodle around in the academic writings on this topic, you’ll see that it’s a fairly accepted definition.

When looking for a way to measure something, keep this definition in mind. Whether its problem solving, allocating budget, or prioritizing your resources, reducing uncertainty gives you a decisive and valuable advantage. You don’t need to have absolute precision. A coarse swing at something is often going to be enough to get started reducing uncertainty and providing business value.

Don’t forget to consider that not every measurement has to be expressed as a simple number (e.g. “137 occurrences” or “83.2% of the time”). You can measure things on an ordinal scale (e.g. “this is less than that” or “this gets 3 out of 4 stars”). You can use nominal measurements where you are are only considering membership in a set (e.g. “this action is in category x, that action is in category y”). Yes/No questions are a valid kind of measurement. You should even consider using subjective methods of measurement (e.g. “do you feel this week was better than last week?”).

Also, don’t expect that every measurement will be made at the same time interval. Sometimes it makes sense to measure certain things on a daily basis. Sometimes it makes sense to measure other things on a quarterly basis.

No matter what type of measurement you end up employing, make sure that it is clear to everyone — even the casual observer — how and why you are measuring something . This is critical for gaining buy-in and avoiding gaming (which both seem like excellent topics for future posts in this series!)

 

Then use “The Four Buckets” as a guide to start looking for candidate KPIs

At the end of the KPI development process, you are going to be tracking a small set of KPIs that best measure the performance of you IT operations in it’s role supporting your business’s goals. But to get there, you need to start with a larger pool of candidate KPIs. In my experience, most useful measurements tend to fall into one or more of the following categories.

I call these “The Four Buckets”.

Again, keep in mind that at this stage you are looking to surface possible KPIs that will be feed into the rest of the process. The end result will only be a small subset of what you started with (5 – 10 at the most!)

 

1. Resource Utilization – How resources are allocated and how efficiently they are used. Usually we’re talking about people, but other kinds of resources can fall into this bucket as well.

  • How much time do developers and administrators spend on build and deployment activity?
  • How much productivity is lost to problems and bottlenecks? What is the ripple effect of that?
  • What’s the ratio of ad-hoc change or service recovery activity to planned change?
  • What’s the cost of moving a unit of change through your lifecycle?
  • What’s the mean time to diagnose a service outage? Mean time to repair?
  • What was the true cost of each build or deployment problem (resource and schedule impact)?
  • What percentage of Development driven changes require Operations to edit/change procedures or edit/change automation?
  • How much management time is spent dealing with build and deployment problems or change management overhead?
  • Can Development and QA successfully deploy their own environments? How long does it take per deployment?
  • How much of your team’s time is spent recreating and maintaining software infrastructure that already exists elsewhere?

 

2. Failure Rates – Looking at how often processes, services, or hardware fail is a pretty obvious area of measurement.

  • What was the ratio of successful builds to failed or problematic builds?
  • What is the ratio of build problems due to poor code vs poor build configuration?
  • What was the ratio of successful deployments to failed or problematic deployments?
  • What is the ratio of deployment problems due to poor code vs poor deployment configuration or execution?
  • What is the mean time between failures?

 

3. Operations Throughput – The volume and rate at which change moves through your development to operations pipeline.

  • How long does it take to get a release from development, through testing, and into production?
  • How much of that is actual testing time, deployment time, handoff time, or waiting?
  • How many releases can you successfully deploy per period?
  • How many successful individual change requests can your operations team handle per period?
  • Are any build and deployment activities the rate limiting step of your application lifecycle? How does that limit impact your business?
  • How many simultaneous changes can your team safely handle?
  • What is business’ perceived “wait time” from code completion to production deployment of a feature?

 

4. Agility – This looks at how quickly and efficiently your IT operations can react to changes in the needs of your business. This can include change driven by internal or external business pressures. There is often considerable overlap with bucket 3, however this bucket is focused more on changing/scaling processes than it is on the throughput of those processes once in place. (Of course, you can always argue that all four buckets play some role in enabling a business to be more “agile”.)

  • How quickly can you scale up or scale down capacity to meet changing business demands?
  • What’s the change management overhead associated increasing/decreasing capacity? What’s the risk?
  • How quickly and what would it cost to adapt your build and deployment systems to automate any new applications or acquired business lines?
  • What would it cost you to handle a x% growth in the number of applications or business lines (direct resource assignment plus any attention drain from other staff)?
  • Could your IT operations handle a x% growth in number of applications or business lines? (i.e. could it even be done?)

 

 

Page 1 of 212