View All Videos

Archive for the ‘Business Impact’ Category

Common Objections to DevOps from Enterprise Operations

5

Alex Honor / 

I’ve been in many large enterprise companies helping them learn about devops, helping them understand how to improve their service delivery capability. These companies have heard about devops and are looking for help creating a strategy to adopt devops principles because they need better time to market and higher quality. Not everyone in the company believes in devops for different reasons. To some, devops sounds like a free for all where devs make production changes. To others devops sounds like a bunch of nice sounding high ideals or that devops can’t be adopted because the necessary automation tooling does not exist for their domain.

DevOpsEntOpsObjects

In the enterprise, the operations group is often centralized and supports many different application groups. When it comes to site availability, the buck stops with ops. If there is a performance problem, outage or issue, the ops team is the first line of defense, sometimes escalating issues back to the application team for bug fixes or for help diagnosing a problem.

Enterprises interested in devops are also usually practicing or adopting agile methodology in which case demands on ops happen more often, during sprints (e.g., to set up a test environment) or after a sprint when ops needs to release software to the production site. The quickened pace puts a lot more pressure on the centralized ops team because they often get the work late in the project cycle (i.e., when it’s time to release to production). Because of time pressure or because they are over worked, operations teams have difficulty turning requested work around and begin to hear developers want to do things for themselves. Those users might want to rebuild servers, get shell access, install software, run commands and scripts, provision VMs, modify network ACLs, update load balancers, etc. These users essentially want to do things for themselves and might feel like the centralized ops team needs to get out of their way.

How does the ops team, historically the one responsible for uptime in the production environment, permit or expand access to environments they support? How can they avoid being the bottleneck at the tail end of every application team’s project cycle? How does the business remove the friction but not invite chaos, outages and lack of compliance?

If you’re in this kind of enterprise environment, how do you start approaching devops? If you are a centralized operations team facing the pressure to adopt devops, here are some questions and concerns for the organization to ask or think about. The answer to these questions are important steps to forming your devops strategy.

How does a centralized group handle the work that needs to be done to make applications run in production or across other environments?

For some enterprises, they begin by creating a specialized team called “devops” whose purpose is to solve “devops problems”. Generally, this means making things more operations friendly. This kind of team might also be the group that takes the hand off from application development teams and wrap their software in automation tooling, deploy it, and hand it off to the Site Reliability team. Unfortunately, a centralized devops team can become a silo and suffer from the same “late in the cycle” handoff challenges the traditional ops group sees. Also, there is always more developers and development projects than there can be devops engineers and devops team bandwidth. A centralized devops team can end up facing the same pressures as a traditional QA department does when they try “adding quality testing” as a separate process stage.

To make sure an application operates well in production and across other environments the devops concerns must be baked into the application architecture. This means the work to make applications easy to configure, deploy and monitor is done inside the development stage. The centralized operations group must then learn to develop a shared software delivery process and tool chain. It’s inside the delivery tool chain where the work gets distributed across teams. The centralized ops group can support the tool chain like architects and service providers providing the application development teams a framework and scaffolding to populate the needed artifacts to drive their pipeline.

What about our compliance policies?

Most enterprises abide by a change policy that dictates who can make production changes. Many times this policy is interpreted to mean anybody outside of ops is not allowed to push changes. Software must be handed off to an ops person to push the change. This handoff can introduce extra lead time and possibly errors due to lack of information.

These compliance rules are defined by the business and many times people on the delivery end have never actually read the language of these policies and base process on assumptions or their beliefs formed by tribal knowledge. Over time, tools and processes can morph in arcane ways, twisting into inefficent bureaucracy.

It’s common to find different compliance rules apply depending on the application or customer type. When thinking about how to reduce delivery cycle time, these differences should be taken into account because there might be alternative ways for seeing who and how change can be made.

Besides understanding the compliance rules, it should also be simple and fast to audit your compliance.

This means make it easy to find out:

  • who made the change and were they authorized
  • where the change was applied
  • what change was made and is it acceptable

This kind of query should be instantly accessible and not something done through manual evidence gathering long after the fact (e.g., when something went wrong). Knowing how change was made to an environment should be as visible as seeing a report that shows how busy your servers were in the last 24 hours.
These audit views should contain infrastructure and artifact information because both development and operations people want to know about their environments in software and server terms. A change ticket with a bunch of verbiage and bug links does not paint a complete enough picture.

How do you open access but not lose controls?

After walking through a software delivery process it’s easy to see the flow of work slows anytime the work must be done by a single team that is already past their capacity and is losing effectiveness due to context switching between competing priorities. This is the situation an ops team often finds itself. Ops teams balance work that comes from application development teams (e.g., participate in agile dev sprints), network operations (e.g., handling outages and production issues), business users (e.g., gathering info for compliance, asset info for finance) and finally, their own project work to maintain or improve infrastructure.

To free this process bottleneck the organization must figure out how the work can be redistributed or can be satisified by some self service function. Since deployment, configuration and monitoring are ops concerns that should be designed into the application, distribute this development to the developers. This can really be a collaboration where ops maintains a base set of automation modules and give developers ways to extend it. Create a development environment and tooling that lets developers integrate their changes into this ops framework in their own project sandboxes.
Provide developer access to create hosted environments easily through a self service interface that spins up the VMs or containers and lets them test the ops management code.

Build the compliance auditing logs into the ops management framework so you can track what resources are being created and used. This is important if resource conflicts occur and let you learn where more sandboxing is needed or where more fine grained configuration should be defined.

Moving faster leads to less quality, right?

To the business, moving fast is critical to staying competitive by increasing their velocity of innovation. This need to quicken the software delivery pace is almost always the chief motivation to adopt devops practices.

Devops success stories often begin with how many times deployments are done a day. Ten deploys a day, 1000 deploys a day. To an enterprise these metrics can sound mythical. Some enterprises struggle to make one deploy a month and I have seen some enterprises making major releases on an annual basis and the rollout of this release to their customers taking over 30 days. That’s thirty days of lag time and puts the production environment in an inconsistent state making it hard for everyone to cope with production issues. “Is it the new version or the old version causing this yet unidentified issue?” A primary reason operations is reluctant to move faster is due to the problems that occur during or after a change had been made.

When change leads to problems these are typical outcomes:

  • More control process is added (more approval gates, shorter change windows)
  • Change batches get bigger (cram more work into the given change window)
  • Increase in “emergency fixes” (high priority features get fast tracked to avoid the normal change process)
  • High pressure to make application changes quickly results in patching systems and not through the normal software release cycle.

Given these outcomes the idea of moving faster is crazy because obviously it will lead to breaking more stuff more often.

The question is how do organizations learn to be good at making change to their systems? Firstly, it is helpful to think about what kind of safety practices are important to move change. Moving fast means being able to safely change things fast. Here are some general strategies to consider:

Small batches

Large batches of change require more people on hand due to the volume of work and the work can take longer to get done.
The solution is to push less change through so it’s easier to get it done and have less to check and verify when the change is completed.

Rehearsal

Here’s a good mantra, “Don’t practice until you get it right. Practice until you can’t get it wrong.” Don’t make the production change be the first time you have tried it this way. Your change should have been verified multiple times in non production environments before you tried it in production. Don’t rely on luck. Expect failure.

Verifiable process stages

Whether it is a site build out or an update to an existing application, be sure you have well defined checks for your preconditions. This means if you are deploying an application you have a scripted test that confirms your external or environment dependencies before you do the deployment. If you are building a site, be sure you have confirmed the hardware and network environment before you install the operating platform. Building this kind of automated testing at process stage boundaries adds a huge deal of safety by not letting problems slip down stream. You can use these verification checks to decide to “stop the line”.

Process discipline

What leads to places full of snow flake environments, each full of idiosyncratic, specially customized servers and networks? Lack of discipline. If the organization does not manage change consistently together, everyone ends up doing things their own way. How do you know you have process discipline? Look for how much variation you see. If process differs between environments, that is a variation. Snow flake servers are the symptoms of process variation. Process variation means you don’t have process under control. There are two simple metrics to understand how much control you have over your process: lead time and scrap rate. Lead time is how long it takes you to make the change. Scrap rate is how often the change must be reworked to make it right. Rehersal and verifiable process stages will help you bring process under control by reducing scrap rate and stabilizing lead time. The biggest benefit to process discipline is improving your ability to deliver change predictably. The business depends on predictability. With predictability the business can guage how fast or slow it can move.

More access into ops managed environments?

The better everyone understands how things perform in production the better the organization can design their systems to support operations. Making it hard for developers or testers to see how the service is running only delays improvements that benefit the customer and reduces pressure on operations. It should be easy for anyone to know what version of applications are deployed on what hosts, the host configuration and the performance of the application.

Sometimes data privacy rules make accessing data less straightforward. Some logs contain customer data and regulations might restrict access to only limited users. Instead of saying no or making the data collection and scrubbing process manual, make this data available as an automated self service so developers or auditors can get it for themselves.

Visibility into the production environment is crucial for developers to make their environments production-like. Modeling the development and test envrionment so that it resembles production is another example of reducing variabilty and bringing process under control.

Does this mean shell access for devs?

This question is sometimes the worst one for a traditional enterprise ops team. Often times the question is a symptom of another problem. Why does a dev want shell access to an environment operations is supporting? In a development or early test envrionment shell access might be needed to experiment with developing deployment and configuration code. This is a valid reason for shell access.

Is this request for shell access in a staging or production environment? Requests for shell access could be a sign of ad hoc change methods and undermine the stability of an environment. It’s important that change methods are encapsulated in the automation.

Fundamentally, shell access to live operational environments is a question about risk and trust.


The list doesn’t stop here, but these are the most common questions and concerns  I hear. Feel free to share your experiences in the comments below.

How to initiate a DevOps Transformation (Video)

Damon Edwards / 

Here is the full 30-minute video from the keynote I did at DevOps Days Mountain View 2013.

This talk address the single most common question I get asked:

“DevOps sounds great… but how do I go about introducing DevOps to my company?”

Which is usually followed by one or more of the following frustrated statements:

“My managers don’t get it”
“The Dev group won’t talk to me”
“The Ops group won’t talk to me”
“QA says I’m dangerous”
“I don’t know where to start”
“People say they are too busy getting real work done”
“Help! My boss told me to buy DevOps by next quarter or else”
“Everyone just argues about tools”

In this talk I give a condensed walk through of a 3 step process that we’ve found to work (who doesn’t love a 3 step process, right?):

1. Build the “why?” (the business case)
2. Build organizational alignment (the trickiest part… but there is another 5 step “workshop” process just for this!)
3. Continuous improvement loops (think: PDCA or Deming/Shewhart Cycles)

The process incorporates everything you would expect from a DevOps transformation (Lean and Systems Thinking, Value Stream Mapping, Waste Analysis, The 3 Ways, Silo busting, etc.) but it does so in a practical and approachable manner. You can even avoid using the word “DevOps” if it’s too politically charged in your organization.

This forms the core of what we do at DTO Solutions does with our DevOps Workshops (or “Service Delivery Workshops” for a non DevOps name). Through that work we’ve been fortunate enough to see this process in action at many different sizes and types of companies. But that being said, I’m always interested in more feedback and new ideas!

http://vimeo.com/69079272

 



 

 

John Willis Notes Notable DevOps Culture Traits

3

Damon Edwards / 

This is a great presentation by John Willis at the SVDevOps Meetup back in April. John discusses the various interesting trends and traits he is seeing in the industry. From Deming to CAMS to GitHub to Etsy… John, as he always does, paints an interesting picture of the roots of DevOps and successful DevOps cultures.

(Video: 59:02)

IT stability and business innovation are not enemies

Damon Edwards / 

Back before the hectic end of the year I was interviewed by HP’s Discover Performance newsletter and online magazine. The questions were about applying DevOps thinking inside enterprises can enable the pace of innovation without increasing risk.

Below is the interview in full. If you like this interview, I recommend signing up for the Discover Performance newsletter. They routinely have good articles on interesting and relevant topics and avoid injecting too much self-serving HP bias (a difficult task for enterprise funded content!).

 

IT stability and business innovation are not enemies

DevOps expert Damon Edwards discusses why Ops should neither resist innovation nor be a scapegoat when things go wrong.

Innovation is a mantra in business, one that the CIO hears more and more. As IT leaders feel pressure to be more responsive, faster moving, and more innovative, Operations leaders worry that their mission—the smooth, steady delivery of high-quality IT services—may be jeopardized by rushed experimentation.

Damon Edwards

Damon Edwards, co-founder of IT consultancy DTO Solutions, has spent more than a dozen years working on web operations from both the IT and business angles. A major DevOps proponent, he recently posted about “using DevOps to turn IT into a strategic weapon.” Discover Performance reached out and asked him to talk about how Operations leaders—and IT executives in general—should approach innovation.

The (completely achievable) goal, he says, aligns IT goals with business goals by “removing all of the bottlenecks, inefficiencies, and risks between a business idea (the ‘ah-ha!’) and a measurable customer outcome (the ‘ka-ching!’).”

Q: Does there tend to be a basic disconnection between the business and IT on the subject of innovation? Why?

DE: “Disconnect” has become somewhat of an ugly euphemism inside corporations. Unfortunately it’s become code for “I’m right and you’re wrong.” In reality, a “disconnect” is usually just two people operating and making assumptions based on differing definitions. As a result, you get unfortunate infighting between people who, in all other ways, both desperately want the company to succeed.

Talk to folks in the technology roles and they tend to see innovation as being synonymous to invention. There is a rich legacy of invention in the technology world. Getting your name on a patent was an ultimate trophy. Much of the myth and lore of tech and geek culture is built on a love of tinkering and invention. Now contrast that to what you see when you visit the business folks. They see innovation as the application of new ideas to create value for their current customers and to attract new customers. Unfortunately, now that you can win a patent for what is essentially a business idea, the invention/innovation distinction is even more muddled.

If executives let the two core parts of the company operate under completely different definitions, you’re bound to have conflicts and gridlock. You have to make it clear what innovation is and isn’t for your company, and how you’re going to measure its impact.

Q: Isn’t innovation an inherently risky endeavor?

DE: There is always some level of risk with innovation because you’re operating in the unknown. You don’t know if the customers will respond. You don’t know if the response will be what you want or expect. When the revenue and health of a company are tied to getting a large number of favorable responses, there is risk.

But I should be clear that innovation, especially on the web, should be low risk from a technology perspective. You reach your customers through standard interfaces and over standard protocols. We know how to deploy safely 20 times a day. We know how to scale services to hundreds of millions of users. We know how to manage petabytes of data. If you’re running a web company, your innovation risk should almost exclusively be on the business end, not the technology end.

Q: When risks compromise IT performance, heads roll, especially on the Ops team.  How do you decrease the risk of innovation for Ops?

DE: Again, I’d ask what went wrong organizationally that put the Ops team in a position of risk. Was the business asking for something that was never done before and needed some never-thought-of-before technology to work? Doubtful. Did the developers change their underlying framework or introduce new technology that wasn’t properly vetted or Ops didn’t know how to handle? Possible. Did Ops upgrade or switch a technology component? Also possible.

My point is that, while Ops is the common scapegoat, the problem often started somewhere else and likely had nothing to do with the business being more innovative. So Ops gets blamed—which is like blaming the canary for the gas in the coal mine—and in response Ops starts saying no all the time. Suddenly “innovation” is the bad guy when it really had nothing to do with it.

Q: You say innovation is a numbers game.  How so—and how does DevOps fit in?

DE: Innovation is a numbers game because, like most things in life, business has a countdown clock. If you’re a startup, it’s a simple clock. It’s the amount of cash left in the bank. If you are an established company, it’s a bit more complicated. It could be how long until a competitor beats you to the punch. It could be how long the CEO gives you to meet a goal. The point is, one way or another, you have a finite amount of time to absolutely delight the hell out of your customers by figuring out what they want and delivering exactly that to them.

You don’t control the clock and you don’t control the customer—what do you control? You can control the number of chances you get to delight the customer before the clock runs out. That’s where DevOps comes in. DevOps aims to remove all of the bottlenecks, inefficiencies, and risks between a business idea (the “ah-ha!”) and a measurable customer outcome (the “ka-ching!”). When you remove all of that, you get a lightning-fast and highly reliable service delivery pipeline that spans from the edge of Development all the way to the datacenter. That allows you to run more experiments, get faster feedback, and take more “shots at the prize.”

Q: DevOps promises a more responsive, more collaborative IT department that can realize business ideas faster.  So what is holding back its widespread adoption?  What’s the challenge or downside?

DE: There was a movie called “Charlie Wilson’s War” that had a great line between Tom Hanks, playing a U.S. Congressman, and Philip Seymour Hoffman, playing a CIA agent. Hoffman asks, “Why is Congress saying one thing and doing nothing?” Hanks replies, “Well, tradition mostly.”

All jokes aside, tradition is a powerful thing and hard to break. Tradition, or “what we’ve always done,” in IT is no different. There was a thread on Slashdot just this past month that asked whether developers should be allowed to deploy their own applications. You should have seen the outcry. The sheer number of commenters who shot down the idea as pure heresy was shocking. And the richest part of all of their denunciation was that the mob said over and over that the idea would “never work at a real company with real revenue at stake.” I thoroughly enjoyed sending that to John Allspaw, who runs all of technology at Etsy, and Jesse Robbins, who was in charge of risk and disaster planning for operations at Amazon. Etsy does over $600 million of transactions per year, and Amazon does about $50 billion in revenue. In both companies, developers are the ones who deploy and own the uptime for their own code. John’s reaction to the thread was a simple yet priceless one: “OMG.”

Q: Cloud and SaaS services promise flexibility and value to the business, but may seem to undermine or complicate traditional Ops teams.  How do these disruptive factors fit with efforts to embrace innovation?

DE: We have a saying that we use a lot at DTO Solutions: “Moving to the cloud without changing your processes is just expensive and complicated Hosting 2.0.” The cloud gives you a new abstraction layer that provides all sorts of benefits in the form of flexibility and speed. But to take advantage of those benefits, you first must change your application lifecycle and operating procedures. Furthermore, you have to revisit the architecture and deployment model for your applications. Often you’ll find that the choices that were made in the past were based on outdated ideas like the need for hardware conservation or to fit a monolithic codebase into a waterfall project delivery cycle. The conditions have changed, so companies need to rethink how and why they do things within the context of the new conditions.

The cloud removes all sorts of infrastructure barriers that makes moving at a faster pace even possible. DevOps addresses the process and cultural issues. Agile addresses the software development process issues. Customer Development and The Lean Startup remove the business process issues. You add it all up and you are ready for your organization to move at speeds that you never thought were possible.

For more from Damon Edwards, check out DTO Solutions, their DevOps blog, and the upcoming “DevOps Cookbook.” Then check out Discover Performance’s recent DevOps issue.

************************************

 

Improving Flow: Fix the Handoffs to Remove Your Worst Bottlenecks

1

Alex Honor / 

Minimizing time to market and getting faster feedback from customers are primary concern for businesses who want to stay competitive. You need to be able to go from a business idea to a customer-facing running service as quickly, reliably, and effortlessly as possible. This as a flow of work that crosses many organizational silos.

Where does this flow often bog down? Handoffs. Whether the handoffs are within a team (e.g. Dev to Dev) or between teams (e.g. Dev to Ops), there is always the need to pass work from one stage of the lifecycle to the next.

http://www.flickr.com/photos/seven13avenue/2791099838/in/photostream/

At DTO Solutions, our clients are often already aware that they have flow problems when they ask us to for help. When we use techniques like Value-Stream Mapping to learn how the work flows, handoff problems are prominent forms of waste that jump off of the page. The diagram below uses pie charts to highlight the relative time lost due to difficult handoffs during the product life cycle.

What are common reasons for difficult handoffs?

  • Conversations, email, multitudinous wikis, spreadsheets, and trouble ticket systems are used to describe, in human language, how to process work. Words are open to interpretation and documents often lag behind current operating procedure. Just imagine being the person planning or performing the work and traversing the information across these various tools.
  • Software product artifacts differ between stages of the process. Sometimes software resides in a directory on a file share and other times it’s a TAR file. The software handoff may contain the same bits, but must be handled or converted by the downstream stage of the software delivery process.
  • Work can be considered “done” yet be unfinished or in a non-working state.The lack of a test or means to verify the work was done correctly often leads to products not ready for the next person down the line. This can leave the person in the downstream stage with what is essentially scrap that has to be rejected or redone.
  • Ad hoc procedures or loose scripting often lead to different approaches and implementations for what should be standard operating procedure. This can lead to silo-specific utilities with different levels of quality and testing.

Handoff problems affect organizations, both big and small. Obviously, one answer to solving handoff problems is to minimize them. But if you are in an organization larger than just a handful of people, that just isn’t a realistic option. To decrease time to market and enable fast feedback, you are going to have to roll up your sleeves to solve the handoff problems.

Where are good places to start making handoffs smoother?

Here are a few of the top fixes that we find important for solving handoff problems at their source:

  • Consistent packaging
    • The most direct way to simplify software handoffs between Dev and Ops is using a common system package format like RPM or Debian. Using a system package format also aligns application deployment and system provisioning practices.
  • Encapsulated procedures
    • Rather than loose scripts or team-specific ones, choose a framework that enables modular automation. Using a modular approach results in a shared tool box of utilities and captured process.
  • Converting information flows into artifact flows
    • Rather than rely on human read text as the product for the downstream process to handle, formalize it as an automation product and build on the idea of encapsulated procedures.
  • Procedure verification tests
    • Verification testing should not be dominated by manual checks described in text documents. Building on the idea of converting information into artifacts, implement verification using a test automation framework. Most apps have some level of testing to verify functionality. Build a testing framework to verify an operation (eg, software deployment) procedure was successful by executing an automated test.

In subsequent posts, we’ll address each one of these fixes.

 

Defining and Improving DevOps Culture (Videos)

1

Damon Edwards / 

Culture. It’s the most mentioned and the most ignored part of the DevOps conversation.

Lots of lip service has been paid to the importance of culture (“It all starts with culture”, “DevOps is a cultural and professional movement”, “Culture over tools”, etc..). But just as fast as everyone agrees that culture is supreme, the conversation turns straight to tools, tools, and more tools.

Recently, John Willis, my fellow dev2ops.org contributor and DevOps Cookbook co-author, let this tweet fly:

John has been as big of a culture warrior as anyone — constantly fighting to elevate the importance of and the discussion around DevOps Culture. He later said that this tweet was part exasperation and part challenge.

It was obvious to John that the difference between high performing and low performing companies was their DevOps culture, not the tools. But rather than be satisfied by the default explanation of DevOps Culture maturity being either that a company “gets it” or “doesn’t get it”, John was challenging the community to dive deeper into the issue.

During the week of Velocity London and DevOps Days Rome, there were finally some presentations that answered that call and were all about the culture. I did a presentation on defining DevOps Culture and what high performing companies do to reinforce it (based on the work of DTO Solutions). Michael Rembetsy and Patrick McDonnell gave a great peek behind the scenes of Etsy’s transformation to a company with a fast moving and high performing culture. Mark Burgess (CFengine) gave an interesting talk on the importance of, and science behind, relationships.

http://vimeo.com/51120539


(slides were updated after the presentation)

 

 

http://vimeo.com/51120837

(when you watch Mark’s video you will understand why there are no slides posted here!)

Update: John Willis knocks it out of the park talking about the importance of culture and the classic influence of Deming on this recent episode of the Food Fight Show.

Page 1 of 41234