View All Videos

Videos: Luke Kanies, Bill Karpovich, Ernest Mueller at OpsCamp Austin 2010

2

Damon Edwards / 

Here’s the first round of “3 Questions” style interviews I filmed at OpsCamp Austin 2010

Luke Kanies (Reductive Labs / Puppet), Bill Karpovich (Zenoss), and Ernest Mueller (National Instruments) were asked:

1. What brought you to OpsCamp?
2. What excites you in 2010?
3. Wildcard question!…

 

Luke Kanies is the CEO of Reductive Labs and the original author of Puppet.
Wildcard question: How did you enable the Puppet Community’s early success?

 

 

Bill Karpovich is the CEO & Co-Founder of Zenoss.
Wildcard question: Where is the innovation happening in monitoring?

 

 

Ernest Mueller is a Web Systems Architect at National Instruments.
Wildcard question: What advice do you have for open source toolmakers?

 

Scheduled Downtime vs. Seamless Conversion

Scheduled Downtime vs. Seamless Conversion

1

Lee Thompson / 

I fly a lot.  I did before I was a consultant, but as a consultant, my dependency on air service is very important.  American Airlines effectively fired me as a customer when they dropped the proverbial “nerd bird” flight last year which was the Austin to San Jose non-stop flight.  The last few nerd bird flights there was lots of on board discussion on Central Texas to Bay Area route options after the sudden termination notice we were given by American.  Most of the folks I spoke with had 1 to 2.5 million miles flown on AA.  That is a huge amount of butt time 500+ mph chairs!

When I’m on the planes I frequently work.  But many times I schedule some personal downtime and have a gin-n-tonic and watch a DVD.  Everyone needs scheduled downtime.  Most of you Dev2Ops readers are probably burning some of your personal scheduled downtime right now reading this post.  I assume Dev2Ops readers also don’t get a lot of scheduled downtime for the online business infrastructure they support (if ever!).  So I was blown away when I got this email from my new transportation services vendor I hired…

First off, let me say jetBlue has a great service picking up poorly serviced travel customers like the recently terminated nerd bird flock and their prices are just so reasonable.  The in flight TV is nice so you can do things like get the latest news from Haiti. JetBlue is donating their services to get personnel and supplies in which is just fantastic to see.

But now the eBusiness technologist part of my brain kicks in and says “WHAT!”.  An airline business running without a ticketing system for two days intentionally.  Scheduled downtime;  are you crazy?  I can only imagine what kind of bind jetBlue is in with their technical infrastructure.  Surely the CEO signed off on such a project plan so the issue must be nasty.  I’m left with so many questions.  Why can’t you phase in the new ticketing system by route and gradually obsolete the older system?  Why can’t you do what Facebook describes as “Dark Launches”.  Why can’t the new system run 5 minutes after the old system is powered off? 

I’m asking the wrong questions.

You have to take your technologist hat off and put on your business hat on and ask different questions.  What is the cost difference between a seamless conversion and  a scheduled outage conversion?  Would jetBlue have to raise air-fares or ask the shareholders to suffer losses?  Does the complexity of the requirements to implement a seamless conversion put the conversion out an extra year hindering the business?  Does the added complexity of a seamless conversion add tremendous risk to business operations?  Having done numerous transaction system conversions in my career, I can easily say a seamless conversion is probably 4x the size of a scheduled outage conversion (or more).  Minimizing complexity substantially reduces risk by a greater than the 4x rate as the relationship to risk and complexity is non linear!  Case in point, if the business singed up for the added expense, and schedule delay, what business impact would occur if the technology effort failed???  I would imagine given the above email, the risk was too great.

I’m about to head to the SFO airport and jump on a jetBlue flight who by that time won’t have a ticketing system.  My travel schedule makes me personally interested in jetBlue’s success in this conversion obviously!

BTW – I choose United non-stop to San Francisco and jetBlue non-stop on the way back.  Good news…  Alaska Airline service picks up mid March to San Jose!

Please welcome dev2ops’s newest blogger…. Lee Thompson

Please welcome dev2ops’s newest blogger…. Lee Thompson

Damon Edwards / 

Lee Thompson is joining us as a dev2ops contributor.

Lee is a currently a consultant specializing in development and operations practices for large scale and mission critical web applications. His current clients include household names in the banking, social networking, and e-commerce fields.

Previously, Lee was the Chief Technologist of E*TRADE Financial. To learn more about Lee, you might want to check out this interview we did a few months back.

Lee has seen the world from the Dev side, the Ops side, and everything in between. Alex and I are please to welcome him to the dev2ops community and look forward to his contributions.

 

How to measure the impact of IT operations on your business (Part 2)

2

Damon Edwards / 

Part 1: Putting a metrics/KPI program into place in 6 steps 

Part 2: Identifying candidate KPIs to evaluate

In my first post in this series, I went through the six steps for putting into place a metrics/KPI program that measures the performance of your IT operations within the context of your business goals.

When consulting, this is usually the point where I stress that we have to work the process in order to come up with KPIs that mean something to your specific business. I explain that there is no such thing as one size fits all in this matter. Despite that, the very next question I’m usually asked: “Can you tell us now what KPIs a company like ours should be measuring?”

Just providing a list of example would probably send them off on the wrong course by chasing KPIs that were important to someone else’s business. Since figuring out what to measure can be as valuable as the actual measurement, I instead walk them through the following concepts to get them started on step #2 and step #3 of the process.

 

First, stop and consider what “measurement” really means

Measurement: a set of observations that reduce uncertainty where the result is expressed as a quantity

 

I lifted the above definition from measurement guru, Douglas W. Hubbard. However, if you noodle around in the academic writings on this topic, you’ll see that it’s a fairly accepted definition.

When looking for a way to measure something, keep this definition in mind. Whether its problem solving, allocating budget, or prioritizing your resources, reducing uncertainty gives you a decisive and valuable advantage. You don’t need to have absolute precision. A coarse swing at something is often going to be enough to get started reducing uncertainty and providing business value.

Don’t forget to consider that not every measurement has to be expressed as a simple number (e.g. “137 occurrences” or “83.2% of the time”). You can measure things on an ordinal scale (e.g. “this is less than that” or “this gets 3 out of 4 stars”). You can use nominal measurements where you are are only considering membership in a set (e.g. “this action is in category x, that action is in category y”). Yes/No questions are a valid kind of measurement. You should even consider using subjective methods of measurement (e.g. “do you feel this week was better than last week?”).

Also, don’t expect that every measurement will be made at the same time interval. Sometimes it makes sense to measure certain things on a daily basis. Sometimes it makes sense to measure other things on a quarterly basis.

No matter what type of measurement you end up employing, make sure that it is clear to everyone — even the casual observer — how and why you are measuring something . This is critical for gaining buy-in and avoiding gaming (which both seem like excellent topics for future posts in this series!)

 

Then use “The Four Buckets” as a guide to start looking for candidate KPIs

At the end of the KPI development process, you are going to be tracking a small set of KPIs that best measure the performance of you IT operations in it’s role supporting your business’s goals. But to get there, you need to start with a larger pool of candidate KPIs. In my experience, most useful measurements tend to fall into one or more of the following categories.

I call these “The Four Buckets”.

Again, keep in mind that at this stage you are looking to surface possible KPIs that will be feed into the rest of the process. The end result will only be a small subset of what you started with (5 – 10 at the most!)

 

1. Resource Utilization – How resources are allocated and how efficiently they are used. Usually we’re talking about people, but other kinds of resources can fall into this bucket as well.

  • How much time do developers and administrators spend on build and deployment activity?
  • How much productivity is lost to problems and bottlenecks? What is the ripple effect of that?
  • What’s the ratio of ad-hoc change or service recovery activity to planned change?
  • What’s the cost of moving a unit of change through your lifecycle?
  • What’s the mean time to diagnose a service outage? Mean time to repair?
  • What was the true cost of each build or deployment problem (resource and schedule impact)?
  • What percentage of Development driven changes require Operations to edit/change procedures or edit/change automation?
  • How much management time is spent dealing with build and deployment problems or change management overhead?
  • Can Development and QA successfully deploy their own environments? How long does it take per deployment?
  • How much of your team’s time is spent recreating and maintaining software infrastructure that already exists elsewhere?

 

2. Failure Rates – Looking at how often processes, services, or hardware fail is a pretty obvious area of measurement.

  • What was the ratio of successful builds to failed or problematic builds?
  • What is the ratio of build problems due to poor code vs poor build configuration?
  • What was the ratio of successful deployments to failed or problematic deployments?
  • What is the ratio of deployment problems due to poor code vs poor deployment configuration or execution?
  • What is the mean time between failures?

 

3. Operations Throughput – The volume and rate at which change moves through your development to operations pipeline.

  • How long does it take to get a release from development, through testing, and into production?
  • How much of that is actual testing time, deployment time, handoff time, or waiting?
  • How many releases can you successfully deploy per period?
  • How many successful individual change requests can your operations team handle per period?
  • Are any build and deployment activities the rate limiting step of your application lifecycle? How does that limit impact your business?
  • How many simultaneous changes can your team safely handle?
  • What is business’ perceived “wait time” from code completion to production deployment of a feature?

 

4. Agility – This looks at how quickly and efficiently your IT operations can react to changes in the needs of your business. This can include change driven by internal or external business pressures. There is often considerable overlap with bucket 3, however this bucket is focused more on changing/scaling processes than it is on the throughput of those processes once in place. (Of course, you can always argue that all four buckets play some role in enabling a business to be more “agile”.)

  • How quickly can you scale up or scale down capacity to meet changing business demands?
  • What’s the change management overhead associated increasing/decreasing capacity? What’s the risk?
  • How quickly and what would it cost to adapt your build and deployment systems to automate any new applications or acquired business lines?
  • What would it cost you to handle a x% growth in the number of applications or business lines (direct resource assignment plus any attention drain from other staff)?
  • Could your IT operations handle a x% growth in number of applications or business lines? (i.e. could it even be done?)

 

 

How to measure the impact of IT operations on your business (Part 1)

Damon Edwards / 

Part 1: Putting a metrics/KPI program into place in 6 steps

Walk into any web-based business and more likely than not you’ll find all sorts of metrics that have been collected with varying levels of accuracy, consistency, and freshness. However, despite what appears to be a wealth of data, no one will seem to be all that happy with it.

The frontline guys are grumbling about having to deal with the collection and reporting overhead while questioning under their breathe “what’s in it for me?”. The technology managers are still making decisions based on anecdotal evidence, gut feelings, or knee-jerk reactions to the latest incidents. And to top it all off, the business managers are clamoring for new dashboards because the last ones didn’t tell them much that was actually useful for managing the business.

In short, there is endless data floating around, but no one appears to have the knowledge they want.

 

 

Invariably another “metrics project” is going to come down the project pipeline. But why would this effort turn out any different than the previous efforts?

Here are six straightforward steps for putting into place a metrics program that actually delivers knowledge about your operations and your business:

Step #0: Stop looking for metrics and start thinking about KPIs
Most metrics projects that are initiated within a technology organization will take a bottom up view of the world. The thought process is often to grab as much data as possible and then try later to make some sense of it with a bit of analytical magic. The dream is that, with enough technical mojo, you’ll arrive at a magical “ah ha!” moment where a piece of important knowledge jumps out.

Unfortunately, for as common as this dream is, it rarely comes true. After considerable effort, these bottom up metrics projects usually result in an outcome somewhere between a mix of interesting (but not meaningful) trivia or a loss of interest in the project alltogether.

To ensure success, you’ve got to turn your technology-centric view of this problem on its head. Since the role of IT is to support the goals of the business, it logically follows that what you really want to measure is the performance of your IT operations in supporting those business goals. This is done by developing and tracking a set of Key Performance Indicators (KPIs) that align the performance of your IT operations with the performance of your business

What’s the difference between metrics and KPIs? The following list is the best explanation I’ve seen on how to tell if a metric (or set of metrics) qualifies as a KPI :

  1. A KPI echoes organizational goals
  2. A KPI is decided by management
  3. A KPI provides context
  4. A KPI creates meaning on all organizational levels
  5. A KPI is based on legitimate data
  6. A KPI is easy to understand
  7. A KPI leads to action!

KPIs, by their very nature, are about influencing actions. If a metric isn’t capable of influencing the behavior of your team in a way that would be clearly understood from all directions, then it’s not a KPI.

Step #1: Set up a KPI Advisory Board
KPIs, like all metrics, have little value in isolation. What’s the best way to ensure that your KPIs are going to be successful in measuring and influencing the performance of your organization? Get all of the various stakeholders involved early and often.

As your very first step, create a KPI Advisory Board that is comprised of key stakeholders from each part of your organization (Dev, Ops, QA, Business/Product Management, Finance, etc). The KPI Advisory Board is responsible for validating/updating your KPI choices, overseeing data collection, analyzing results, and relaying results to the rest of the organization. The KPI Advisory Board also helps to keep everyone honest and avoid intentional or unintentional gaming of the KPI process. Make sure the KPI Advisory Board meets on a regularly scheduled basis. Don’t overload the meetings with other issues like architecture or budgeting. Stay focused on organizational and process performance. If the KPI Advisory Board is a burden on anyone, it’s probably not being run correctly.

Due to the human dynamics involved with assembling a KPI Advisory Board, may technologists feel compelled to skip this step. DON’T. The KPI Advisory Board is essential to the success of your KPI program. Reaching consensus on what to measure and how to measure it is as important to the health of an organization as the actual measurement.

 

Step #2: Prioritize what is important for your business
First, get a good understanding of how business performance is measured within your company. Every business should already know what indicates good business performance in their specific case. Every business should also know what it’s strategic goals are. Ask around if you don’t know. Better yet, if you’ve formed your KPI Board correctly, those answers will come to you via the members from other parts of your organization. Be sure to take note of the priority/weighting of each goal (this will sometimes vary depending on who you asked).

Next, create a list of the ways that IT operations can impact/support each of those business goals. Your specific list will obviously depend on your business’s specific goals. However, you’ll likely find that there are four general buckets of IT operations objectives that can be mapped to your business’s goals (I’ll dig into each of these in a future post):

  • Improve resource utilization/efficiency
  • Decrease failure rates
  • Improve operations throughput
  • Enable business agility

What you should now have is a list of specific IT operations objectives mapped to prioritized business goals.

Side note: Beware of falling into the trap of thinking that what matters to the individual members of your technology team is the same as what matters to the business. Sure, less individual headaches is a good thing because headaches are generally a sign of larger systemic issues. But your business representatives on the KPI board should push back and remind you that solving your individual headaches many not be an important factor in the success of the business (and therefore not a KPI priority).

 

Step #3: Weight possible KPIs against your prioritized IT operations objectives
Work your way through the list of IT operations objectives and list out all of the possible candidate KPIs for each objective. Be sure to cast a broad net. Look through old metrics projects. Ask around your organization (“How would you measure…?”). Look at what others have published (“Google is your friend”).

I’ve found that it handy to keep Douglas Hubbard’s “Four Useful Measurement Assumptions” in mind when searching for candidate KPIs:

  1. Your problem is not as unique as you think
  2. You have more data than you think
  3. You need less data than you think
  4. There is a useful measurement that is much simpler than you think

Most importantly, you need to continuously ask the question “Is this really important?”. Just because something is interesting doesn’t mean it’s valuable.

Go through the list with your KPI Advisory Board. Weight the effectiveness of each candidate KPI for indicating the success/failure of each prioritized IT operations objective. Also, be sure to weight the “difficulty of measurement” for each candidate KPI. While you will ultimately be concerned with a KPIs ability to indicate progress towards meeting goals, understanding the “difficulty of measurement” can have practical implications when deciding where to focus first.

Step #4: Gain consensus on a manageable set of KPIs
Work first within the KPI Advisory Board and then across your broader organization to gain consensus on a manageable set of initial KPIs. I’d recommend that your start with no more than 5 – 10 KPIs. Keep it simple. Get the win.

To select your initial set of KPIs use the weighting and prioritization determined in the previous steps. More often than not, you’ll be surprised at what candidate KPIs come out on top of the list. Of course a bit of gut feeling and pragmatism may come into play, but try to trust the prioritization process. If company politics or strong interpersonal dynamics get in the way, try your best to still reach consensus (but stay focused on getting something done).  Your initial goal should be to gain experience with the process and to achieve early success that the organization can buy into. You can always expand and/or refocus later.

Step #5: Baseline and track
Create a baseline for each of the initial KPIs you’ve selected. This is also a good time to validate any of the assumptions you’ve made in the previous steps.

If everything looks reasonable, begin regular tracking of the initial set of KPIs. It’s best to start with manual (or semi-manual) data collection and KPI reporting. If you jump straight for a tool you are going to get caught up in what can/can’t be automated and what the right way is to go about that automation. This will just distract from what you are really trying to figure out… are we tracking the right thing and does the output make sense?

Transparency is an important issue in this process. Make sure that anyone with a reasonable understanding of your operations is able to understand how and why these KPIs indicate success/failure (e.g. “Can the CEO/CFO make sense of this?”)

Step #6: Re-evaluate and expand
Your KPI Advisory Board should meet on a regular basis to validate/analyze results and propose new KPIs (if any). Initially these KPI Advisory Board meetings should be held weekly.  However, over time, monthly KPI Advisory Board meetings (with weekly or daily distribution of KPI reports) will usually be sufficient.

Also, this is the point where you should start investing in automating the data collection and KPI reporting. Once you get a few cycles under your belt, you’ll have an understanding of what needs to be measured and how it should to be measured. And perhaps more importantly, you should have earned the buy-in and budget approval needed to get automated data collection and KPI reporting done correctly. Automation will ultimately allow you to track more KPIs (not always a good thing) with a higher degree of accuracy (a good thing) and provide a quicker feedback loop for your organization (always a good thing).

How difficult is it to implement a KPI program?
This is a common question. My consulting company helps companies accelerate the building of their KPI capabilities. We’ve got a mature methodology (based on the steps above). We use decision modeling tools built by partners. We use proven practices for building consensus and extracting analysis. We get our clients to a mature process as quickly as possible. Next to an example like that this may seem overwhelming to tackle on your own, but it’s really not.

KPI programs might be difficult to perfect, but they are easy to start. Start small, focus on building consensus, and grow iteratively. Since all businesses are dynamic, refining your KPI capabilities is more important than any one measurement.

 

Go to Part 2 in this series –>

 


6 Months In: Fully Automated Provisioning Revisited

6 Months In: Fully Automated Provisioning Revisited

4

Damon Edwards / 

It’s been about six months since I co-authored the “Web Ops 2.0: Achieving Fully Automated Provisioning” whitepaper along with the good folks at Reductive Labs (the team behind Puppet). While the paper was built on a case study about a joint user of ControlTier and Puppet (and a joint client of my employer, DTO Solutions, and Reductive Labs), the broader goal was to start a discussion around the concept of fully automated provisioning.

So far, so good. In addition to the feedback and lively discussion, we’ve just gotten word of the first independent presentation by a community member. Dan Nemec of Silverpop made a great presentation at AWSome Atlanta (a cloud computing technology focused meetup). John Willis was kind enough to record and post the video:

I’m currently working on an updated and expanded version of the whitepaper and am looking for any contributors who want to participate. Everything is being done under the Creative Commons (Attribution – Share Alike) license.

The core definition of “fully automated provisioning” hasn’t changed: the ability to deploy, update, and repair your application infrastructure using only pre-defined automated procedures.

Nor has the criteria for achieving fully automated provisioning:

  1. Be able to automatically provision an entire environment — from “bare-metal” to running business services — completely from specification
  2. No direct management of individual boxes
  3. Be able to revert to a “previously known good” state at any time
  4. It’s easier to re-provision than it is to repair
  5. Anyone on your team with minimal domain specific knowledge can deploy or update an environment

The representation of the open source toolchain has been updated and currently looks like this:

 

The new column on the left was added to describe the kind of actions that takes place at the corresponding layer. The middle column shows each layer of the toolchain. In the right column are examples of existing tools.

There are some other areas that are currently being discussed:

1. Where does application package management fall?
This is an interesting debate. Some people feel that all package distribution and management (system and application packages) should take place at the system configuration management layer. Others think that it’s appropriate for the system configuration management layer to handle system packages and the application service deployment layer to handle application  and content packages.

2. How important is consistency across lifecycle?
It’s difficult to argue against consistency, but how far back into the lifecycle should the fully automated provisioning system reach? All Staging/QA environments? All integrated development environments? Individual developer’s systems? It’s a good rule of thumb to deal with non-functional requirements as early in the lifecycle as possible, but that imposes an overhead that must be dealt with.

 
3. Language debate
With a toolchain you are going to have different tools with varying methods of configuration. What kind of overhead are you adding because of differing languages or configuration syntax? Does individual bias towards a particular language or syntax come into play? Is it easier to bend (or some would say abuse) one tool to do most of everything rather than use a toolchain that lets each tool do what its supposed to be good at?

4. New case study
I’m working on adding additional case studies. If anyone has a good example of any part of the toolchain in action, let me know.

Page 16 of 26First1415161718Last