DevOps Archives

Archive for the ‘DevOps’ Category

It’s the 5th Anniversary of DevOps

Damon Edwards / October 21st, 2014

I’ve been proud to have played a part in the rise of the global phenomenon known as DevOps Days. If you aren’t aware of the history of the DevOps movement, it traces it’s roots (and name) directly back to the first DevOps Days event, organized by Patrick Debois, in Ghent Belgium.

For it’s 5th anniversary, DevOps Days is returning to Ghent. John Willis recorded these short interviews with some of the original attendees to commemorate the upcoming milestone event.

Common Objections to DevOps from Enterprise Operations

Alex Honor / June 25th, 2014

I’ve been in many large enterprise companies helping them learn about devops, helping them understand how to improve their service delivery capability. These companies have heard about devops and are looking for help creating a strategy to adopt devops principles because they need better time to market and higher quality. Not everyone in the company believes in devops for different reasons. To some, devops sounds like a free for all where devs make production changes. To others devops sounds like a bunch of nice sounding high ideals or that devops can’t be adopted because the necessary automation tooling does not exist for their domain.

In the enterprise, the operations group is often centralized and supports many different application groups. When it comes to site availability, the buck stops with ops. If there is a performance problem, outage or issue, the ops team is the first line of defense, sometimes escalating issues back to the application team for bug fixes or for help diagnosing a problem.

Enterprises interested in devops are also usually practicing or adopting agile methodology in which case demands on ops happen more often, during sprints (e.g., to set up a test environment) or after a sprint when ops needs to release software to the production site. The quickened pace puts a lot more pressure on the centralized ops team because they often get the work late in the project cycle (i.e., when it’s time to release to production). Because of time pressure or because they are over worked, operations teams have difficulty turning requested work around and begin to hear developers want to do things for themselves. Those users might want to rebuild servers, get shell access, install software, run commands and scripts, provision VMs, modify network ACLs, update load balancers, etc. These users essentially want to do things for themselves and might feel like the centralized ops team needs to get out of their way.

How does the ops team, historically the one responsible for uptime in the production environment, permit or expand access to environments they support? How can they avoid being the bottleneck at the tail end of every application team’s project cycle? How does the business remove the friction but not invite chaos, outages and lack of compliance?

If you’re in this kind of enterprise environment, how do you start approaching devops? If you are a centralized operations team facing the pressure to adopt devops, here are some questions and concerns for the organization to ask or think about. The answer to these questions are important steps to forming your devops strategy.

How does a centralized group handle the work that needs to be done to make applications run in production or across other environments?

For some enterprises, they begin by creating a specialized team called “devops” whose purpose is to solve “devops problems”. Generally, this means making things more operations friendly. This kind of team might also be the group that takes the hand off from application development teams and wrap their software in automation tooling, deploy it, and hand it off to the Site Reliability team. Unfortunately, a centralized devops team can become a silo and suffer from the same “late in the cycle” handoff challenges the traditional ops group sees. Also, there is always more developers and development projects than there can be devops engineers and devops team bandwidth. A centralized devops team can end up facing the same pressures as a traditional QA department does when they try “adding quality testing” as a separate process stage.

To make sure an application operates well in production and across other environments the devops concerns must be baked into the application architecture. This means the work to make applications easy to configure, deploy and monitor is done inside the development stage. The centralized operations group must then learn to develop a shared software delivery process and tool chain. It’s inside the delivery tool chain where the work gets distributed across teams. The centralized ops group can support the tool chain like architects and service providers providing the application development teams a framework and scaffolding to populate the needed artifacts to drive their pipeline.

What about our compliance policies?

Most enterprises abide by a change policy that dictates who can make production changes. Many times this policy is interpreted to mean anybody outside of ops is not allowed to push changes. Software must be handed off to an ops person to push the change. This handoff can introduce extra lead time and possibly errors due to lack of information.

These compliance rules are defined by the business and many times people on the delivery end have never actually read the language of these policies and base process on assumptions or their beliefs formed by tribal knowledge. Over time, tools and processes can morph in arcane ways, twisting into inefficent bureaucracy.

It’s common to find different compliance rules apply depending on the application or customer type. When thinking about how to reduce delivery cycle time, these differences should be taken into account because there might be alternative ways for seeing who and how change can be made.

Besides understanding the compliance rules, it should also be simple and fast to audit your compliance.

This means make it easy to find out:

who made the change and were they authorized
where the change was applied
what change was made and is it acceptable

This kind of query should be instantly accessible and not something done through manual evidence gathering long after the fact (e.g., when something went wrong). Knowing how change was made to an environment should be as visible as seeing a report that shows how busy your servers were in the last 24 hours.
These audit views should contain infrastructure and artifact information because both development and operations people want to know about their environments in software and server terms. A change ticket with a bunch of verbiage and bug links does not paint a complete enough picture.

How do you open access but not lose controls?

After walking through a software delivery process it’s easy to see the flow of work slows anytime the work must be done by a single team that is already past their capacity and is losing effectiveness due to context switching between competing priorities. This is the situation an ops team often finds itself. Ops teams balance work that comes from application development teams (e.g., participate in agile dev sprints), network operations (e.g., handling outages and production issues), business users (e.g., gathering info for compliance, asset info for finance) and finally, their own project work to maintain or improve infrastructure.

To free this process bottleneck the organization must figure out how the work can be redistributed or can be satisified by some self service function. Since deployment, configuration and monitoring are ops concerns that should be designed into the application, distribute this development to the developers. This can really be a collaboration where ops maintains a base set of automation modules and give developers ways to extend it. Create a development environment and tooling that lets developers integrate their changes into this ops framework in their own project sandboxes.
Provide developer access to create hosted environments easily through a self service interface that spins up the VMs or containers and lets them test the ops management code.

Build the compliance auditing logs into the ops management framework so you can track what resources are being created and used. This is important if resource conflicts occur and let you learn where more sandboxing is needed or where more fine grained configuration should be defined.

Moving faster leads to less quality, right?

To the business, moving fast is critical to staying competitive by increasing their velocity of innovation. This need to quicken the software delivery pace is almost always the chief motivation to adopt devops practices.

Devops success stories often begin with how many times deployments are done a day. Ten deploys a day, 1000 deploys a day. To an enterprise these metrics can sound mythical. Some enterprises struggle to make one deploy a month and I have seen some enterprises making major releases on an annual basis and the rollout of this release to their customers taking over 30 days. That’s thirty days of lag time and puts the production environment in an inconsistent state making it hard for everyone to cope with production issues. “Is it the new version or the old version causing this yet unidentified issue?” A primary reason operations is reluctant to move faster is due to the problems that occur during or after a change had been made.

When change leads to problems these are typical outcomes:

More control process is added (more approval gates, shorter change windows)
Change batches get bigger (cram more work into the given change window)
Increase in “emergency fixes” (high priority features get fast tracked to avoid the normal change process)
High pressure to make application changes quickly results in patching systems and not through the normal software release cycle.

Given these outcomes the idea of moving faster is crazy because obviously it will lead to breaking more stuff more often.

The question is how do organizations learn to be good at making change to their systems? Firstly, it is helpful to think about what kind of safety practices are important to move change. Moving fast means being able to safely change things fast. Here are some general strategies to consider:

Small batches

Large batches of change require more people on hand due to the volume of work and the work can take longer to get done.
The solution is to push less change through so it’s easier to get it done and have less to check and verify when the change is completed.

Rehearsal

Here’s a good mantra, “Don’t practice until you get it right. Practice until you can’t get it wrong.” Don’t make the production change be the first time you have tried it this way. Your change should have been verified multiple times in non production environments before you tried it in production. Don’t rely on luck. Expect failure.

Verifiable process stages

Whether it is a site build out or an update to an existing application, be sure you have well defined checks for your preconditions. This means if you are deploying an application you have a scripted test that confirms your external or environment dependencies before you do the deployment. If you are building a site, be sure you have confirmed the hardware and network environment before you install the operating platform. Building this kind of automated testing at process stage boundaries adds a huge deal of safety by not letting problems slip down stream. You can use these verification checks to decide to “stop the line”.

Process discipline

What leads to places full of snow flake environments, each full of idiosyncratic, specially customized servers and networks? Lack of discipline. If the organization does not manage change consistently together, everyone ends up doing things their own way. How do you know you have process discipline? Look for how much variation you see. If process differs between environments, that is a variation. Snow flake servers are the symptoms of process variation. Process variation means you don’t have process under control. There are two simple metrics to understand how much control you have over your process: lead time and scrap rate. Lead time is how long it takes you to make the change. Scrap rate is how often the change must be reworked to make it right. Rehersal and verifiable process stages will help you bring process under control by reducing scrap rate and stabilizing lead time. The biggest benefit to process discipline is improving your ability to deliver change predictably. The business depends on predictability. With predictability the business can guage how fast or slow it can move.

More access into ops managed environments?

The better everyone understands how things perform in production the better the organization can design their systems to support operations. Making it hard for developers or testers to see how the service is running only delays improvements that benefit the customer and reduces pressure on operations. It should be easy for anyone to know what version of applications are deployed on what hosts, the host configuration and the performance of the application.

Sometimes data privacy rules make accessing data less straightforward. Some logs contain customer data and regulations might restrict access to only limited users. Instead of saying no or making the data collection and scrubbing process manual, make this data available as an automated self service so developers or auditors can get it for themselves.

Visibility into the production environment is crucial for developers to make their environments production-like. Modeling the development and test envrionment so that it resembles production is another example of reducing variabilty and bringing process under control.

Does this mean shell access for devs?

This question is sometimes the worst one for a traditional enterprise ops team. Often times the question is a symptom of another problem. Why does a dev want shell access to an environment operations is supporting? In a development or early test envrionment shell access might be needed to experiment with developing deployment and configuration code. This is a valid reason for shell access.

Is this request for shell access in a staging or production environment? Requests for shell access could be a sign of ad hoc change methods and undermine the stability of an environment. It’s important that change methods are encapsulated in the automation.

Fundamentally, shell access to live operational environments is a question about risk and trust.

The list doesn’t stop here, but these are the most common questions and concerns I hear. Feel free to share your experiences in the comments below.

How to initiate a DevOps Transformation (Video)

Damon Edwards / December 8th, 2013

Here is the full 30-minute video from the keynote I did at DevOps Days Mountain View 2013.

This talk address the single most common question I get asked:

“DevOps sounds great… but how do I go about introducing DevOps to my company?”

Which is usually followed by one or more of the following frustrated statements:

“My managers don’t get it”
“The Dev group won’t talk to me”
“The Ops group won’t talk to me”
“QA says I’m dangerous”
“I don’t know where to start”
“People say they are too busy getting real work done”
“Help! My boss told me to buy DevOps by next quarter or else”
“Everyone just argues about tools”

In this talk I give a condensed walk through of a 3 step process that we’ve found to work (who doesn’t love a 3 step process, right?):

1. Build the “why?” (the business case)
2. Build organizational alignment (the trickiest part… but there is another 5 step “workshop” process just for this!)
3. Continuous improvement loops (think: PDCA or Deming/Shewhart Cycles)

The process incorporates everything you would expect from a DevOps transformation (Lean and Systems Thinking, Value Stream Mapping, Waste Analysis, The 3 Ways, Silo busting, etc.) but it does so in a practical and approachable manner. You can even avoid using the word “DevOps” if it’s too politically charged in your organization.

This forms the core of what we do at DTO Solutions does with our DevOps Workshops (or “Service Delivery Workshops” for a non DevOps name). Through that work we’ve been fortunate enough to see this process in action at many different sizes and types of companies. But that being said, I’m always interested in more feedback and new ideas!

http://vimeo.com/69079272

How Adobe turned operations into a sevice and built a service delivery platform in the cloud

Damon Edwards / October 2nd, 2013

This video is from Velocity Santa Clara in June 2013. Adobe’s Srinivas Peri and SimplifyOps’ Alex Honor discuss how a packaged software tools group turned itself into an internal provider of operations services. Featured in the presentation is CDOT, the platform they built out of open source tools like Rundeck, Chef, Jenkins, and Zabbix (and non-open source technologies like AWS, Splunk, and PagerDuty) . But Beyond the tools, this is an interesting story of learning how to shift a groups mindset and figuring out what your “customers” want along the way. You can view the slides from this presentation here.

John Willis Notes Notable DevOps Culture Traits

Damon Edwards / September 10th, 2013

This is a great presentation by John Willis at the SVDevOps Meetup back in April. John discusses the various interesting trends and traits he is seeing in the industry. From Deming to CAMS to GitHub to Etsy… John, as he always does, paints an interesting picture of the roots of DevOps and successful DevOps cultures.

(Video: 59:02)

IT stability and business innovation are not enemies

Damon Edwards / February 26th, 2013

Back before the hectic end of the year I was interviewed by HP’s Discover Performance newsletter and online magazine. The questions were about applying DevOps thinking inside enterprises can enable the pace of innovation without increasing risk.

Below is the interview in full. If you like this interview, I recommend signing up for the Discover Performance newsletter. They routinely have good articles on interesting and relevant topics and avoid injecting too much self-serving HP bias (a difficult task for enterprise funded content!).

************************************

IT stability and business innovation are not enemies

DevOps expert Damon Edwards discusses why Ops should neither resist innovation nor be a scapegoat when things go wrong.

Innovation is a mantra in business, one that the CIO hears more and more. As IT leaders feel pressure to be more responsive, faster moving, and more innovative, Operations leaders worry that their mission—the smooth, steady delivery of high-quality IT services—may be jeopardized by rushed experimentation.

Damon Edwards, co-founder of IT consultancy DTO Solutions, has spent more than a dozen years working on web operations from both the IT and business angles. A major DevOps proponent, he recently posted about “using DevOps to turn IT into a strategic weapon.” Discover Performance reached out and asked him to talk about how Operations leaders—and IT executives in general—should approach innovation.

The (completely achievable) goal, he says, aligns IT goals with business goals by “removing all of the bottlenecks, inefficiencies, and risks between a business idea (the ‘ah-ha!’) and a measurable customer outcome (the ‘ka-ching!’).”

Q: Does there tend to be a basic disconnection between the business and IT on the subject of innovation? Why?

DE: “Disconnect” has become somewhat of an ugly euphemism inside corporations. Unfortunately it’s become code for “I’m right and you’re wrong.” In reality, a “disconnect” is usually just two people operating and making assumptions based on differing definitions. As a result, you get unfortunate infighting between people who, in all other ways, both desperately want the company to succeed.

Talk to folks in the technology roles and they tend to see innovation as being synonymous to invention. There is a rich legacy of invention in the technology world. Getting your name on a patent was an ultimate trophy. Much of the myth and lore of tech and geek culture is built on a love of tinkering and invention. Now contrast that to what you see when you visit the business folks. They see innovation as the application of new ideas to create value for their current customers and to attract new customers. Unfortunately, now that you can win a patent for what is essentially a business idea, the invention/innovation distinction is even more muddled.

If executives let the two core parts of the company operate under completely different definitions, you’re bound to have conflicts and gridlock. You have to make it clear what innovation is and isn’t for your company, and how you’re going to measure its impact.

Q: Isn’t innovation an inherently risky endeavor?

DE: There is always some level of risk with innovation because you’re operating in the unknown. You don’t know if the customers will respond. You don’t know if the response will be what you want or expect. When the revenue and health of a company are tied to getting a large number of favorable responses, there is risk.

But I should be clear that innovation, especially on the web, should be low risk from a technology perspective. You reach your customers through standard interfaces and over standard protocols. We know how to deploy safely 20 times a day. We know how to scale services to hundreds of millions of users. We know how to manage petabytes of data. If you’re running a web company, your innovation risk should almost exclusively be on the business end, not the technology end.

Q: When risks compromise IT performance, heads roll, especially on the Ops team. How do you decrease the risk of innovation for Ops?

DE: Again, I’d ask what went wrong organizationally that put the Ops team in a position of risk. Was the business asking for something that was never done before and needed some never-thought-of-before technology to work? Doubtful. Did the developers change their underlying framework or introduce new technology that wasn’t properly vetted or Ops didn’t know how to handle? Possible. Did Ops upgrade or switch a technology component? Also possible.

My point is that, while Ops is the common scapegoat, the problem often started somewhere else and likely had nothing to do with the business being more innovative. So Ops gets blamed—which is like blaming the canary for the gas in the coal mine—and in response Ops starts saying no all the time. Suddenly “innovation” is the bad guy when it really had nothing to do with it.

Q: You say innovation is a numbers game. How so—and how does DevOps fit in?

DE: Innovation is a numbers game because, like most things in life, business has a countdown clock. If you’re a startup, it’s a simple clock. It’s the amount of cash left in the bank. If you are an established company, it’s a bit more complicated. It could be how long until a competitor beats you to the punch. It could be how long the CEO gives you to meet a goal. The point is, one way or another, you have a finite amount of time to absolutely delight the hell out of your customers by figuring out what they want and delivering exactly that to them.

You don’t control the clock and you don’t control the customer—what do you control? You can control the number of chances you get to delight the customer before the clock runs out. That’s where DevOps comes in. DevOps aims to remove all of the bottlenecks, inefficiencies, and risks between a business idea (the “ah-ha!”) and a measurable customer outcome (the “ka-ching!”). When you remove all of that, you get a lightning-fast and highly reliable service delivery pipeline that spans from the edge of Development all the way to the datacenter. That allows you to run more experiments, get faster feedback, and take more “shots at the prize.”

Q: DevOps promises a more responsive, more collaborative IT department that can realize business ideas faster. So what is holding back its widespread adoption? What’s the challenge or downside?

DE: There was a movie called “Charlie Wilson’s War” that had a great line between Tom Hanks, playing a U.S. Congressman, and Philip Seymour Hoffman, playing a CIA agent. Hoffman asks, “Why is Congress saying one thing and doing nothing?” Hanks replies, “Well, tradition mostly.”

All jokes aside, tradition is a powerful thing and hard to break. Tradition, or “what we’ve always done,” in IT is no different. There was a thread on Slashdot just this past month that asked whether developers should be allowed to deploy their own applications. You should have seen the outcry. The sheer number of commenters who shot down the idea as pure heresy was shocking. And the richest part of all of their denunciation was that the mob said over and over that the idea would “never work at a real company with real revenue at stake.” I thoroughly enjoyed sending that to John Allspaw, who runs all of technology at Etsy, and Jesse Robbins, who was in charge of risk and disaster planning for operations at Amazon. Etsy does over $600 million of transactions per year, and Amazon does about $50 billion in revenue. In both companies, developers are the ones who deploy and own the uptime for their own code. John’s reaction to the thread was a simple yet priceless one: “OMG.”

Q: Cloud and SaaS services promise flexibility and value to the business, but may seem to undermine or complicate traditional Ops teams. How do these disruptive factors fit with efforts to embrace innovation?

DE: We have a saying that we use a lot at DTO Solutions: “Moving to the cloud without changing your processes is just expensive and complicated Hosting 2.0.” The cloud gives you a new abstraction layer that provides all sorts of benefits in the form of flexibility and speed. But to take advantage of those benefits, you first must change your application lifecycle and operating procedures. Furthermore, you have to revisit the architecture and deployment model for your applications. Often you’ll find that the choices that were made in the past were based on outdated ideas like the need for hardware conservation or to fit a monolithic codebase into a waterfall project delivery cycle. The conditions have changed, so companies need to rethink how and why they do things within the context of the new conditions.

The cloud removes all sorts of infrastructure barriers that makes moving at a faster pace even possible. DevOps addresses the process and cultural issues. Agile addresses the software development process issues. Customer Development and The Lean Startup remove the business process issues. You add it all up and you are ready for your organization to move at speeds that you never thought were possible.

For more from Damon Edwards, check out DTO Solutions, their DevOps blog, and the upcoming “DevOps Cookbook.” Then check out Discover Performance’s recent DevOps issue.

************************************

Page 1 of 1412 3 4 5 »Last

dev2ops

Archive for the ‘DevOps’ Category

It’s the 5th Anniversary of DevOps

Common Objections to DevOps from Enterprise Operations

How to initiate a DevOps Transformation (Video)

How Adobe turned operations into a sevice and built a service delivery platform in the cloud

John Willis Notes Notable DevOps Culture Traits

(Video: 59:02)

IT stability and business innovation are not enemies

IT stability and business innovation are not enemies

Q: Does there tend to be a basic disconnection between the business and IT on the subject of innovation? Why?

Q: Isn’t innovation an inherently risky endeavor?

Q: When risks compromise IT performance, heads roll, especially on the Ops team. How do you decrease the risk of innovation for Ops?

Q: You say innovation is a numbers game. How so—and how does DevOps fit in?

Q: DevOps promises a more responsive, more collaborative IT department that can realize business ideas faster. So what is holding back its widespread adoption? What’s the challenge or downside?

Q: Cloud and SaaS services promise flexibility and value to the business, but may seem to undermine or complicate traditional Ops teams. How do these disruptive factors fit with efforts to embrace innovation?

For more from Damon Edwards, check out DTO Solutions, their DevOps blog, and the upcoming “DevOps Cookbook.” Then check out Discover Performance’s recent DevOps issue.

Get new posts by email

Browse

Dev2Ops Authors on Twitter

Archives