View All Videos

Search Results for “wall of confusion”

What is DevOps?

39

Damon Edwards / 

Update 1: Wikipedia now has a pretty good DevOps page

Update 2: Follow-up posts on the business problems that DevOps solves and the competitive business advantage that DevOps can provide.


 

If you are interested in IT management — and web operations in particular — you might have recently heard the term “DevOps” being tossed around. The #DevOps tag pops up regularly on Twitter. DevOps meetups and DevOpsDays conferences, are gaining steam.

DevOps is, in many ways, an umbrella concept that refers to anything that smoothes out the interaction between development and operations. However, the ideas behind DevOps run much deeper than that.

 

What is DevOps all about?

DevOps is a response to the growing awareness that there is a disconnect between what is traditionally considered development activity and what is traditionally considered operations activity. This disconnect often manifests itself as conflict and inefficiency.

As Lee Thompson and Andrew Shafer like to put it, there is a “Wall of Confusion” between development and operations. This “Wall” is caused by a combination of conflicting motivations, processes, and tooling.

 

Development-centric folks tend to come from a mindset where change is the thing that they are paid to accomplish. The business depends on them to respond to changing needs. Because of this relationship, they are often incentivized to create as much change as possible.

Operations folks tend to come from a mindset where change is the enemy.  The business depends on them to keep the lights on and deliver the services that make the business money today. Operations is motivated to resist change as it undermines stability and reliability. How many times have we heard the statistic that 80% of all downtime is due to those self-inflicted wounds known as changes?

Both development and operations fundamentally see the world, and their respective roles in it, differently. Each believe that they are doing the right thing for the business… and in isolation they are both correct!

To make matters worse, development and operations teams tend to fall into different parts of a company’s organizational structure (often with different managers and competing corporate politics) and often work at different geographic locations.

Adding to the Wall of Confusion is the all too common mismatch in development and operations tooling. Take a look at the popular tools that developers request and use on a daily basis. Then take a look at the popular tools that systems administrators request and use on a daily basis. With a few notable exceptions, like bug trackers and maybe SCM, it’s doubtful you’ll see much interest in using each others tools or significant integration between them. Even if there is some overlap in types of tools, often the implementations will be different in each group.

Nowhere is the Wall of Confusion more obvious than when it comes time for application changes to be pushed from development operations. Some organizations will call it a “release” some call it a “deployment”, but one thing they can all agree on is that trouble is likely to ensue. The following scenario is generalized, but if you’ve ever played a part in this process it should ring true.

Development kicks things off by “tossing” a software release “over the wall” to Operations. Operations picks up the release artifacts and begins preparing for their deployment. Operations manually hacks the deployment scripts provided by the developers or creates their own scripts. They also hand edit configuration files to reflect the production environment, which is significantly different than the Development or QA environments. At best they are duplicating work that was already done in previous environments, at worst they are about to introduce or uncover new bugs.

Operations then embarks on what they understand to be the currently correct deployment process, which at this point is essentially being performed for the first time due to the script, configuration, process, and environment differences between Development and Operations. Of course, somewhere along the way a problem occurs and the developers are called in to help troubleshoot. Operations claims that Development gave them faulty artifacts. Developers respond by pointing out that it worked just fine in their environments, so it must be the case that Operations did something wrong. Developers are having a difficult time even diagnosing the problem because the configuration, file locations, and procedure used to get into this state is different then what they expect (if security policies even allow them to access the production servers!).

Time is running out on the change window and, of course, there isn’t a reliable way to roll the environment back to a previously known good state. So what should have been an eventless deployment ended up being an all-hands-on-deck fire drill where a lot of trial and error finally hacked the production environment into a usable state.

While deployment is the most obvious pain point, it is only one part of the need for DevOps. As John Allspaw points out, the need for cooperation between development and operations starts well before and continues long after deployment.

 

What’s the benefit of DevOps?

DevOps is a powerful idea because it resonates on so many different levels.

From the perspective of individuals toiling in hands-on development or operational roles, DevOps points towards a life that is free from the source of so many of their hassles. It’s by no means a magical panacea, but if you can make DevOps work you are removing barriers that are both a significant time-sink and a source of morale killing frustration. It’s a simple calculation to make: invest in making DevOps a reality and we all should be more efficient, increasingly nimble, and less frustrated. Some may argue that DevOps is a lofty or even farfetched goal, but it’s difficult to argue that you shouldn’t try.

 

For the business, DevOps contributes directly to enabling two powerful and strategic business qualities, “business agility” and “IT alignment”. These may not be terms that the troops in the IT trenches worry about on a daily basis, but they should definitely get the attention of the executives who approve the budgets and sign the checks.

A simple definition of IT alignment is “a desired state in which a business organization is able to use information technology (IT) effectively to achieve business objectives — typically improved financial performance or marketplace competitiveness” [source].

DevOps helps to enable IT alignment by aligning development and operations roles and processes in the context of shared business objectives. Both development and operations need to understand that they are part of a unified business process. DevOps thinking ensures that individual decisions and actions strive to support and improve that unified business process, regardless of organizational structure.

A simple definition of agility in a business context is the “ability of an organization to rapidly adapt to market and environmental changes in productive and cost-effective ways” [source].

Of course, developers also have their own specialized meaning of the word “agile“, but the goals are very similar. Agile development methodologies are designed to keep software development efforts aligned with customer/company goals and produce high quality software despite changing requirements. For most organizations, Scrum, the iterative project management methodology, is the face of Agile.

Agile promises close interaction and fast feedback between the business stakeholders making the decisions and the developers acting on those decisions. If you look at the output of a well functioning Agile development group you should see a steady stream improvement that is in tune with business needs.

However, when you step back and look at the entire development-to-operations lifecycle from an enterprise point of view, that Agile stream and it’s associated benefits are often obscured. The Wall of Confusion leads to a dissociation of the application lifecycle. Development works at one pace and Operations works at another. The long intervals between production deployments, in effect, turn the Agile efforts of an organization right back into the waterfall lifecycle it was trying to avoid. No matter how Agile the development organization is, it’s exceedingly difficult to change the slow and lumbering nature of a business while the Wall of Confusion is in place. Andrew Rendell has a great post that tells the anecdotal story of how an organization’s cumbersome release processes turn their agile development efforts right back into a waterfall.

DevOps enables the benefits of Agile development to be felt at the organizational level. DevOps does this by allowing for fast and responsive, yet stable, operations that can be kept in sync with the pace of innovation coming out of the development process.

If you are seeking to establish a DevOps project within your organization, be sure to keep the terms “IT alignment” and “business agility” in mind.

 

How do we bring DevOps to life?

Like most emerging topics, it’s easier to find a consensus about the problem than it is about the solution.

If you listen to the current DevOps conversations, there does appear to be 3 areas of focus for DevOps related solutions:

1. Measurement and incentives to change culture – Changing culture and reward systems is never easy. However, if you don’t change your organization’s culture, fulfilling the promise of DevOps will be difficult, if not impossible.  When looking to influence culture in a business organization, you need to pay close attention to how you measure and judge performance. What you measure influences and incentivizes behavior. All parties across the development-to-operations lifecycle need to understand their stake in the larger business process of which they are a part. The success of both individuals and groups needs to be measured within the context of the success of the entire development-to-operations lifecycle. For many organizations this is a shift from more of a siloed approach to performance measurement, where each group measures and judges performance based on what matters to that specific group. This previous post I wrote dives deeper into the process for getting the correct end-to-end view of measurement into place.

2. Unified processes – The important theme of DevOps is that the entire development-to-operations lifecycle must be viewed as one end-to-end process. Individual methodologies can be followed for individual segments of that processes (such as Agile on one end and Visible Ops on the other), so long as those processes can be plugged together to form a unified process (and, in turn, be managed from that unified point-of-view). Much like the question of measurement and incentives, each organization will have slightly different requirements for achieving that unified process. Here is an excellent post by Six Sigma Blackbelt Ray Riescher on his experience bridging Scrum and ITIL.

3. Unified tooling –  This is the area in which most of the DevOps discussion has been focused. This isn’t surprising since it seems to be the natural reflex of technologists, for better or for worse, to jump straight into tooling discussions when looking to solve a problem. If you follow the communities of tools like Puppet, Chef, or ControlTier then you are probably already aware of the significant focus on bridging development and operations tooling. “Infrastructure as code”, “model driven automation”, and “continuous deployment” are all concepts that would fall under the DevOps banner. Alex Honor wrote a good post about some of the design patterns that toolsmiths working on DevOps tools need to worry about.

Jake Sorofman does a great job with the following overview of what types of tooling is required to make DevOps a reality:

A version-controlled software library—which ensures all system artifacts are well defined, consistently shared, and up to date across the release lifecycle. Development and QA organizations draw from the same platform version, and production groups deploy the exact same version that has been certified by QA.

Deeply modeled systems—where a versioned system manifest describes all of the components, policies and dependencies related to a software system, making it simple to reproduce a system on demand or to introduce change without conflicts.

Automation of manual tasks—taking the manual effort out of processes like dependency discovery and resolution, system construction, provisioning, update and rollback. Automation—not hoards of people—becomes the basis for command and control of high-velocity, conflict-free and massive-scale system administration.

It’s essential that all individual tools be considered part of a larger toolchain that spans the entire Development to Operations lifecycle (even if tight technical integration isn’t a option). Tool choice and implementation decisions (on both the toolchain and individual tool levels) need to be made in the context of their impact on that end-to-end lifecycle.  If you are wondering how that is done, take a look at this example of an open source fully automated provisioning toolchain that can be plugged into a larger Development to Operations toolchain.

 

What DevOps is not!

At the recent OpsCamp Austin, Adam Jacob from OpsCode/Chef railed against the idea that some system administrators were now seeking to change their job title to “DevOps”. I have to admit that, at the time, I was a bit skeptical that this was actually happening. However, I have since witnessed people on multiple occasions expressing this desire to rewrite job titles or establish DevOps as some sort of new role to be filled.

For example, Stephen Nelson-Smith wrote an excellent post about DevOps. While I agree with almost everything he said, I have to strongly disagree with the idea that DevOps should be a unique position or job title.

Turning “DevOps” into a new job title or special role sets a dangerous precedent. This makes DevOps someone else’s problem. You’re a DBA? Don’t worry about DevOps, that’s the DevOps team’s problem. You’re a security expert? Don’t worry about DevOps, that’s the DevOps team’s problem.

Think of it this way. You wouldn’t say “I need to hire an Agile” or “I need to hire a Scrum” or “I need to hire an ITIL” would you? No, you would just say I need to hire developers, project managers, testers, or systems administrators who understand these concepts and methodologies. DevOps is no different.

 

Why the name “DevOps”?

Probably because it’s catchy. It’s also a good mental image of the concept at the widest scale — when you bring Dev and Ops together you get DevOps. There has been other terms for this idea, such as Agile Operations, Agile Infrastructure, and Dev2Ops (a term we’ve been using on this blog since 2007). There is also plenty of examples of people arriving at the idea of DevOps on their own, without calling it “DevOps”. For an excellent example of this, read this recent post by Ernest Mueller or watch John Allspaw and John Hammond’s seminal presentation “10+ Deploys Per Day: Dev and Ops Cooperation at Flickr” from Velocity 2009.

For better or for worse, DevOps seems to be the name that is catching peoples’ imaginations. I credit the efforts of Patrick Dubois for championing the term “DevOps”, bringing the first DevOps Days conference to a (successful) reality, and maintaining the devops.info site.

Be sure to join in the DevOps conversation at the upcoming DevOps Day USA conference on June 25, 2010 in Mountain View, CA. It’s the day after O’Reilly’s Velocity 2010 conference, so be sure to hit both!

 

Q&A: Lee Thompson, former Chief Technologist of E*TRADE Financial

Q&A: Lee Thompson, former Chief Technologist of E*TRADE Financial

2

Damon Edwards / 

I recently caught up with Lee Thompson to discuss a variety of Dev to Ops topics including deployment, testing, and the conflict between dev and ops.

 

Lee recently left E*TRADE Financial where he was VP & Chief Technologist. Lee’s 13 years at E*TRADE saw two major boom and bust cycles and dramatic expansion in E*TRADE’s business.

 

 

Damon:
You’ve had large scale ops and dev roles… what lessons have you’ve learned the required the perspective of both roles?

Lee:
I was heavy into ops during the equity bubble of 1998 to 2000 and during that time we scaled E*TRADE from 40,000 trades a day to 350,000 trades a day. After going through that experience, all my software designs changed. Operability and deployability became large concerns for any kind of infrastructure I was designing. Your development and your architecture staff have to hand the application off to your production administrators so the architects can get some sleep. You don’t want your developers involved in running the site. You want them building the next business function for the company. The only way that is going to happen is to have the non-functional requirements –deployability, scalability, operability–  already built into your design. So that experience taught me the importance of non-functional requirements in the design process.

 

Damon:
You use the phrase “wall of confusion” a lot… can you explain the nature of the dev and ops conflict?

Lee:
When dealing with a large distributed compute infrastructure you are going to have applications that are difficult to run. The operations and systems engineering staff who is trying to keep the business functions running is going to say “oh these developers don’t get it”. And then back over on the developer side they are going to say “oh these ops guys don’t get it”. It’s just a totally different mindset. The developers are all about changing things very quickly and the ops team is all about stability and reliability. One company, but two very different mindsets. Both want the company to succeed, but they just see different sides of the same coin. Change and stability are both essential to a company’s success.

 

Damon:
How can we break down the wall of confusion and resolve the dev and ops conflicts?

Lee:
The first step is being clear about non-functional requirements and avoid what I call “peek-a-boo” requirements.

Here’s a common “peek-a-boo” scenario:
Development (glowing with pride in the business functions they’ve produced): “Here’s the application”
Operations: “Hey, this doesn’t work”
Development: “What do you mean? It works just fine”
Operations: “Well it doesn’t run with our deployment stack”
Development: “What deployment stack?”
Operations: “The stuff we use to push all of the production infrastructure”

The non-functional requirements become late cycle “peek-a-boo” requirements when they aren’t addressed early in development. Late cycle requirements violates continuous integration and agile development principles.  The production tooling and requirements have to be accounted for in the development environment but most enterprises don’t do that. Since the deployment requirements aren’t covered in dev, what ends up happening is that the operations staff receiving the application has to do innovation through necessity and they end up writing a number of tools that over time become bigger and bigger and more of a problem which Alex described last year in the Stone Axe post. Deployability is a business requirement and it needs to be accounted for in the development environment just like any other business requirement.

 

Damon:
Deployment seems to be topics of discussion that are increasing in popularity… why is that?

Lee:
Deployability and the other non-functional requirements have always been there, they were just often overlooked. You just made do. But a few things have happened.

1. Complexity and commodity computing, both in the hardware and software, has meant that your deployment is getting to the point where automation is mandatory. When I started at E*TRADE there were 3 Solaris boxes. When I left the number of servers was orders and orders of magnitude larger (the actual number is proprietary). Since the operations staff can’t afford to log into every box, they end up writing tools because the developers didn’t give them any.

2. Composite applications, where applications are composed of other applications, mean that every application has to be deployed and scaled independently.  Versioning further complicates matters. Before the Internet, in the PC and mainframe industries, you were used to delivering and maintaining multiple versions of software simultaneously. In the early days of the Internet, a lot of that went away and you only had two versions — the one deployed and the one about to be deployed. Now with the componentization of services and the mixing and matching of components you’ll find that typically you have several versions of a piece of infrastructure facing different business functions. So you might be running three or four independently deployed and managed versions of the same application component within your firewall.  

3. Cloud computing takes both complexity and the need for deployability up another notch. Now you are looking to remote a portion of your infrastructure into the Internet. Almost everyone who is starting up a company right now is not talking about building a datacenter, they are all talking about pushing to the cloud. So deployability is very much at the forefront of their thinking about how to deliver their business functions.  And the cloud story is only beginning. For example, what happens when you get a new generation of requirements like the ability to automate failover between cloud vendors?

 

Damon:
Testing is one of those things that everyone knows is good, but seems to rarely get adequately funded or properly executed. Why is that?

Lee:
Well, like many things it’s often simply a poor understanding of what goes into doing it right and an oversimplification of what the business value really is.

Just like deployment infrastructure, proper testing infrastructure is a distributed application in of itself. You have to coordinate the provisioning of a mocked up environment that mimics your production conditions and then boot up a distributed application that actually runs the tests. The level of thought and effort that has to go into properly doing that can’t be overlooked. Well, not if you are serious about delivering on quality of service.

While integration testing should be a very important piece of your infrastructure, the importance of antagonistic testing also can’t be overlooked. For example, the CEO is going to want to know what happens when you double the load on your business. The only way to really know that is to have a good facsimile of your business application mocked-up and those exact scenarios tested. That is a large scale application in of itself and takes investment.

Beyond service quality, there is business value in proper testing infrastructure that is often overlooked. When you start to build up a large quantity of high fidelity tests those tests actually represent knowledge about your business.  Knowledge is always an asset to your business. It’s pretty clear that businesses who know a lot about themselves tend to do well and those who lack that knowledge tend not to be very durable. 

 

Damon:
The culture of any large company is going to restrictive. Large financial institutions are, by their very nature, going to be more restrictive. Coming out of that culture, what are you most excited to do?

Lee:
Punditry, blogging and using social media to start! You really can’t do that from behind the firewall in the FI world. There are legitimate reasons for the restrictions, and I understand that. Because you have to contend with a lot of regulatory concerns, you just aren’t going to see a lot of financial institution technologist ranting online about what is going on behind the firewall. So I’m excited about becoming a producer of social media content rather than just a consumer.

I also find consulting exciting. It’s been fun getting out there and seeing a variety of companies and how similar their problems are to each other and to what I worked on at E*TRADE. It reminds me how advanced E*TRADE really is and what we had to contend with. I enjoy applying the lessons I’ve learned over my career to helping other companies avoid pitfalls and helping them position their IT organization for success.

DevOps is not a technology problem. DevOps is a business problem.

10

Damon Edwards / 

Update: follow-up post about the competitive business advantage that DevOps enables.

 


Since Patrick Debois called for the first DevOps Days event and unleashed the term “DevOps” upon the world, there is no denying that DevOps has evolved into a global movement.

Of course, DevOps has its detractors. Negative opinions range from the misguided (“DevOps is a new name for a Sys Admin”) to the dismissive (“DevOps is just some crazy Devs trying to get rid of Ops” or “DevOps is just crazy some Ops trying to act like Devs so they will be better liked”) to the outright offended (whose arguments tend to defy logic).

I’ve spent the past nine months or so overcoming resistance to the DevOps movement in both public forums and inside client companies. During that time, I’ve begun to notice a common misconception that I think is fueling much of the negative initial reaction that some people have to DevOps ideas. I want to take a shot at clearing it up now:

DevOps is not a technology problem.

Technology plays a key part in enabling solutions to DevOps problems. However, DevOps itself is fundamentally a business problem.

What does the business have to do with DevOps?

The most fundamental business process in any company is getting an idea from inception to where it is making you money.

Within that business process there are all kinds of activity that needs to happen, some technology-driven and some human-driven. This is where all of the different functions of IT come into play. Developers, QA, Architecture, Release Engineering, Security, Operations, etc each do their part to fulfill that process.

But if you take away the context of the business process, what have you got? You’ve got a bunch of people and groups doing their own thing. You lose any real incentive to fight inefficiency, duplication of effort, conflicts, and disconnects between those groups. It’s every person for themselves, literally.

But you know what else happens if you remove the context of the business process? Your job eventually goes away. Enabling the business is why we get paychecks and why we get to spend our time doing what we do.

If there is no business to enable or we don’t do any business enabling, this all just turns into a hobby. And by definition, it’s pretty difficult to get paid for a hobby.

The whole point of DevOps is to enable your business to react to market forces as quickly, efficiently, and reliably as possible. Without the business, there is no other reason for us to be talking about DevOps problems, much less spending any time solving them.

 

 

Doesn’t this sound a lot like the goals of Agile?

If the goals of DevOps sound similar to the goals of Agile, it’s because they are. But Agile and DevOps are different things. You can be great at Agile Development but still have plenty of DevOps issues. On the flip side of that coin, you could do a great job removing many DevOps issues and not use Agile Development methodologies at all (although that is increasingly unlikely).

I like to describe Agile and DevOps as being related ideas, who share a common Lean ancestry, but work on different planes. While Agile deep dives into improving one major IT function (delivering software), DevOps works on improving the interaction and flow across IT functions (stretching the length of the entire development to operations lifecycle).

 

But I thought DevOps was all about cool tools?

Technology is the great enabler for making almost any business process more efficient, scalable, and reliable. However, we have to remember that on their own, tools are just tools.

It’s just as likely that you’ll use a new tool to reinforce bad habits and old broken processes as you will to improve your organization. It’s the desired effect on the business process you are supporting that dictates why and how a tool is best used.

When people are clear on what their DevOps problems are and what process improvements need to happen to remedy those DevOps problems, the tool discussion becomes rather straightforward (if not trivial).

Since the nascent DevOps movement is mostly made up of technologists, it’s easy to understand why there is such excitement to jump right into tooling discussions. But perhaps we need to do more to make sure that everyone is up to speed on why the tools are needed and what the desired business process improvements are before diving into our standard “Puppet vs. Chef” or “Files Centric vs. Package Centric” arguments.

 

If DevOps is about a business process then why is it called “DevOps”?

In my opinion, one fault of the early DevOps conversations is that it wasn’t immediately clear just how big the scope of this problem truly is. Now that we have a year of perspective behind us, it turns out that we have been attacking one of the biggest problems in all of business: How to enable a business to react to market forces as quickly as possible.

But alas, the conversation had to start somewhere so it gravitated to the almost universal problem of conflict and disconnect between Developer culture and Operations culture. Every org chart is different, but it’s fairly easy to cartoonishly divide the world into a Dev camp and Ops camp for the sake of having common reference points to discuss (even though we all know the world is much more complex and grey).

Within that cartoon Dev/Ops example, most of the early DevOps attention has been about improving deployment activity. Since change activity makes up the bulk of the work across an IT organization, that too was a logical and natural place to start.

Maybe Patrick should have called that first event “BizDevQASecurityOpsCloudUsers Days” or “SolvingABroaderProblemThanAgile Days”… But I doubt anyone would have shown up.

Automated Infrastructure enables Agile Operations

Automated Infrastructure enables Agile Operations

5

Alex Honor / 

Agile” been applied to such unanticipated domains as enterprises, start ups, investing, etc. Agile encompasses several generic common sense principles (eg: simple design, sustainable pace, many incremental changes, action over bureaucracy, etc.) so the desire to bestow its virtues on all kinds of endeavors is understandable.

But why contemplate the idea of Agile Operations? Why would Agile Operations even make sense?

Let’s start by playing devils advocate. Some of the Agile principles appear to contradict well established and accepted systems administration goals, namely stability and availability. Traditional culture in operations leans towards risk-aversion and stasis in an attempt to assure maximum service levels. Many operations groups play a centralized role serving multiple business lines and have evolved to follow a top-down directed, command and control style management structure that wants to limit access to change. From their point of view, change is the enemy of stability and availability. With stability and availability being the primary goals of operations, it’s easy to see where the skepticism towards Agile Operations comes from.


The calls for Agile Operations has initially been driven by product development groups that employ Agile practices . These groups churn out frequent, small improvements to their software systems on a daily basis. The difference in change management philosophy has been the cause of a growing clash between development and operations. The clash intensifies when the business wants to drive these rapid product development iterations all the way through to production (even 10+ times a day).


So, if operations is to avoid being a bottleneck to this Agile empowered flow of product changes, how can they do it in a way that won’t create unmanageable chaos?

To apply Agile to the world of operations, one must first see all infrastructure as programmable. Rather than see infrastructure as islands of equipment that were setup by reading a manual and typing commands into a terminal, one sees infrastructure as a set of components that are bootstrapped and maintained through programs. In other words, infrastructure is managed by executing code not by directly applying changes manually at the keyboard.


Replacing manual tasks with executable code is the crucial enabler to sharing a common set of change management principles between development and operations. This alignment is truly the key first step in allying development and operations to support the business’ time to market needs. This shared change management model also facilitates a few additional beneficial practices.

  • Shared code bases: Store and control application and infrastructure code in the same place so both dev and ops staff have clear visibility into everything needed to create a running service.
  • Collaborative configuration management: Application and infrastructure configuration management code can be jointly developed early in the development cycle and tested in development integration environments. Code and configuration become the currency between dev and ops.
  • Skill transfer: App and ops engineers can transfer knowledge about the inner workings of the runtime application system and develop skills around tooling to maintain them.
  • Reproducibility: Reproducing a running application from source and a build specification is vital to managing a business at scale. (http://www.itpi.org/home/visibleops.php)

While some may argue that “Agile” in its entirety does not completely apply to the world of operations, an automated infrastructure based on principles like code sharing as a form of collaboration between dev and ops is a sound basis to enable business agility.