View All Videos

Value of DevOps Culture: It’s not just hugs and kumbaya

Damon Edwards / 

The importance of culture is a recurring theme in most DevOps discussions. It’s often cited as the thing your should start with and the thing you should worry about the most.

But other than the rather obvious idea that it’s beneficial for any company to have a culture of trust, communication, and collaboration… can using DevOps thinking to change your culture actually provide a distinct business advantage?

Let’s take the example of Continuous Deployment (or it’s sibling, Continuous Delivery). This is an operating model that embodies a lot of the ideals that you’ll hear about in DevOps circles and is impossible to properly implement if your org suffers from DevOps problems.

Read more →

Companies have plenty of monitoring, what they don’t have is control

Companies have plenty of monitoring, what they don’t have is control


Lee Thompson / 

I was honored to be asked to speak at DevOps Days in Manila and just got off stage. I was blown away when I found out over 400 people signed up to attend. Speaking gives me a chance to unload a bunch of baggage I’ve been carrying around years.

We all bring a lot of baggage with us into a job. The older you are, the more you bring. The first part of my career I did 10 years of real-time industrial control software design, implementation, and integration way way back before the web 1.0 days. Yes, I wrote the software for the furniture Homer Simpson sat in front of at the nuclear plant that was all sticky with donut crumbs…

I took that manufacturing background baggage to E*TRADE in ’96 where I ran into fellow dev2ops contributor Alex Honor who brought his Aimes Research Laboratory baggage of (at the time) massive compute infrastructure and mobile agents. We used to drink a bunch of coffee and try to figure out how this whole internet e-commerce thing needed to be put together. We’d get up crazy early at 4:30AM, listen to Miles, and watch the booming online world wake up and trade stocks and by 9:00AM have a game plan formulated to make it better.

My manufacturing background was always kicking in at those times looking for control points. Webserver hits per second, firewall MBits/sec, Auth success or fail per second, trades per second, quotes per second, service queue depths, and the dreaded position request response time. I was quite sure there was a correlation algorithm between these phenomena and I could figure it out if I had a few weeks that I didn’t have. I also knew that once I figured it out, the underlying hardware, software, network, and user demand would change radically throwing my math off. Controlling physical phenomena like oil, paper, and pharmaceutical products followed the math of physics. We didn’t have the math to predict operating system thread/process starvation and it took us years to figure out OS context switches per second has a huge kernel scaleability issue not often measured or written about.

One particularly busy morning in late ’96 Alex was watching our webserver, pointed at a measurement on the screen and said, “I think we’re gonna need another webserver”. With that, we also needed to figure out how to loadbalance webservers. As usual for the era, two webservers was a massive understatement. Within a year, there was more compute infrastructure at E*TRADE supporting the HTTPS web pages then the rest of the trading system and the trading system had been in place for 12 years by this time… Analytics of measurements (accompanied by jazz music) became an important part of our decision making.


Alex and I were also convinced in early ’97 that sound manufacturing principles used in the physical world made a ton of sense to apply to virtual online world of the internet. I’m still surprised the big control systems vendors like Honeywell and Emerson haven’t gotten into data center control. No matter, the DevOps community can make progress on it as its so complimentary to DevOps goals and its what the devops-toolchain project is all about.

Get a bunch of DevOps folks together and the topic of monitoring comes up every time. I always have to ask “Are you happy with it?” and the answer is always “no” (though I don’t think anyone at Etsy was there). When you drill into what’s wrong with their monitoring, you may find that most companies have plenty of monitoring, what they don’t have is control.

Say your app in production runs 100 logins/sec and you are getting nominally 3 username/password failures a second. While the load may go up and down, you learn that that the 3% ratio is nominal and in control. If the ratio increments higher, that may be emblematic of a script kiddie running a dictionary attack or the password hash database is offline or a application change making it harder for users to properly input their credentials. If it drops down, that may indicate a professional psyber criminal is running an automated attack and getting through the wire. Truman may or may not of said “if you want a new idea, read an old book”. In this case, you should be reading about “Statistical Process Control” or SPC. It was heavily used during WWII. With our login example, the ratio of success to failed login attempts would be “Control Charted” and the control chart would evaluate weather the control point was “in control” or “out of control” based on defined criteria like standard deviation thresholds.

Measurement itself is a very low level construct providing the raw material for the control goal. You have to go through several more toolchain layers before you get to the automation you are looking for. We hit upon this concept in our talk at Velocity in 2010…

Manufacturing has come a long long way since WWII. Toyota built significantly on SPC methodologies that eventually became the development of “Lean Manufacturing”; a big part of the reason Toyota became the worlds largest automobile manufacturer in 2008. A key part of lean is Value Stream Mapping which is “used to analyze and design the flow of materials and information required to bring a product or service to a consumer” (wikipedia).

Value Stream Mapping a typical online business through marketing, product, development, qa, and operations flows minimally will help effectively communicate rolls, responsibilities, and work flows through your org. More typically it becomes a tool to get to a “future state” which has eliminated waste and increase effectiveness of the org, even when nothing physical was “manufactured”. I find agile development, devops, and continuous deployment goals all support lean manufacturing thinking. My personal take is that ITIL has similar goals, but is more of process over people approach instead of a people over process approach and it’s utility will be dependent on the organizations management structure and culture. I prefer people over process, but I do reference ITIL every time I find a rough or wasteful organizational process for ideas on recommending a future state.

I was lucky enough to catch up with Alex, Anthony, and Damon over dinner and we were talking big about DevOps and Lean. Anthony mentioned that “we use value stream mapping in all of our DevOps engagements to make sure we are solving the right problem”. That really floored me on a few levels. First off, it takes Alex’s DevOps Design Patterns and DevOps Anti-Patterns to the next level similar to SPC to Lean adding a formalism to the DevOps implementation approach. It also adds a self correcting aspect to a companies investment into DevOps optimizations. I’ve spoken with many companies who made huge investments in converting to Agile development without any measurable uptick in product deployment rates. While these orgs haven’t reverted back to a waterfall approach as they like the iterative and collaborative approach, they hit the DevOps gap head on.

“We use Value-Stream Mapping in all of our DevOps engagements to make sure we are solving the right problem”
                                                 -Anthony Shortland (DTO Solutions)

Practicers of Lean Manufacturing see this all the time. Eliminating one bottleneck just flows downstream to the next bottleneck. To expect greater production rates, you have to look at the value stream in its entirety. If developers were producing motors instead of software functions, a value stream manager would see huge inventory build up of the motors which produce no value to the customer and identify the overproduction as waste. Development is a big part of the value stream and making that more efficient is a really good idea. But a measurement of the release backlog growing is seldom measured or managed. If you treat your business as a Digital Information Manufacturing plant and manage it appropriately to that goal, you can avoid the frequent mistake Anthony and other Lean practitioners are talking about where you solve a huge problem without benefiting the business or the customer.

To sum up, DevOps inspired technology can learn quite a bit from Lean Manufacturing and Value Stream Mapping. This DevOps stuff is really hard and you’ll need to leverage as much as possible. Always remember that “Good programmers are lazy” and its good when you apply established tools and techniques. If you don’t think your working in a Digital Information Manufacturing plant, I bet your CEO does.

Video: Marten Mickos and Rich Wolski talk DevOps and Private Clouds

Damon Edwards / 

I ran into Marten Mickos and Rich Wolski from Eucalyptus Systems at PuppetConf and got them to sit down for a quick video alongside my fellow and DevOps Cafe contributor, John Willis.

I had just come out of Marten’s keynote where he spoke about DevOps far more than I would have expected. In this video we explore the deep connection between DevOps and Private Clouds as well as other industry changes for which they are planning.

Eucalyptus was one of the first private cloud technologies on the scene, and consequently got the benefit and burden of being the early mover. The community had some ups and downs along the way, but their product and industry vision seems encouraging and warrants a closer look (and never count out Marten Mickos in an open source software battle).

Puppet and Chef Rock. Doh. What about all these shell scripts ?!


Alex Honor / 

Incorporating a next generation CM tool like Puppet or Chef into your application or system operations is a great way to throw control around your key administrative processes.

Of course, to make the move to a new CM tool, you need to adapt your current processes into the paradigm defined by the new CM tool. There is an upfront cost to retool (and sometimes to rethink) but later on the rewards will come in the form of great time savings and consistency. 

Seems like an easy argument. Why can’t everybody just start working that way? 

If you are in a startup or a greenfield environment, it is just as simple as deciding to work that way and then individually learning some new skills.


In an enterprise or legacy environment, it is not so simple. A lot of things can get in the way and the difficulty becomes apparent when you consider that you are asking an organization to make some pretty big changes:
  • It’s something new: It’s a new tool and a new process.
  • It changes the way people work: There’s a new methodology on how one manages change through a CM process and how teams will work together.
  • Skill base not there yet: The CM model and implementation languages needs to be institutionalized across the organization.
  • It’s a strategic technology choice: To pick a CM tool or not to pick a CM tool isn’t just which one you choose (eg, puppet vs chef). It’s about committing to a new way of working and designing how infrastructure and operations are managed.
Moving to a next generation CM tool like Chef or Puppet is big decision and in organizations already at scale it usually can’t be done whole hog in one mammoth step. I’ve seen all too often where organizations realize that the move to CM is a more complicated task than they thought and subsequently procrastinate.

So what are some blocking and tackling moves you can use to make progress?

Begin by asking the question, how are these activities being done right now?
I bet you’ll find that most activities are handled by shell scripts of various sorts: old ones, well written ones, hokey rickety hairballs, true works of art. You’ll see a huge continuum of quality and style. You’ll also find lots of people very comfortable creating automation using shell scripts. Many of those people have built comfortable careers on those skills.


This brings me to the next question, how do you get these people involved in your movement to drive CM? Ultimately, it is these people that will own and manage a CM-based environment so you need their participation. It might be obvious by this point but I think someone should consider how they can incorporate the work of the script writers. How long will it take to build up expertise for a new solution anyway? How can one bridge between the old and new paradigms?

The pragmatic answer is to start with what got you there. Start with the scripts but figure out a way to cleanly plug them in to a CM management paradigm. Plan for the two styles of automation (procedural scripting vs CM). Big enterprises can’t throw out all the old and bring in the new in one shot. From political, project management, education, and technology points of view, it’s got to be staged.

To facilitate this pragmatic move towards full CM, script writers need:
  • A clean consistent interface. Make integration easy.
  • Modularity so new stuff can be swapped/plugged in later.
  • Familiar environment. It must be nice for shell scripters
  • Easy distribution. Make it easy for a shell scripter to hand off a tool for a CM user (or anybody else for that matter)
Having these capabilities drives the early collaboration that is critical to the success of later CM projects. From the shell scripter’s point of view, these capabilities put some sanity, convention and a bit of a framework around how scripting is done. 

I know this mismatch between the old shell script way and the new CM way all too well. I’ve had to tackle this problem in several large enterprises. After a while, a solution pattern emerged. 

Since I think this is an important problem that the DevOps community needs to address, I created a GitHub project to document the pattern and provide a reference implementation. The project is called rerun. It’s extremely simple but I think it drives home the point. I’m looking forward to the feedback and hearing from others who have found themselves in similar situations.


For more explanation of the ideas behind this, see the “Why rerun?” page.
Devops Chicago and Devops Camp

Devops Chicago and Devops Camp

Dev2ops / 

Martin J. Logan @martinjlogan is the founder of and the DevOps Chicago meetup group. He is also an Erlang hacker and co-author of Erlang and OTP in Action. In his spare time, he serves as Director of Merchandising, SEO, and Mobile Technology at Orbitz Worldwide. In a former life, I had the pleasure of working with Martin on what would now be called a Platform as a Service (PaaS) team at Orbitz.

> @mattokeefe: Martin, when did you first hear about “DevOps”?

> @martinjlogan: For me it was about 10 years ago. I was working at a place called Vail Systems for one of the Camp DevOps speakers, Hal Snyder. He implemented CFEngine back then and got the company to a state of rather high production environment automation. I subsequently left Vail and thought for sure that moving on to larger companies I would be witness to amazing automation when compared with what I had seen at little ol Vail with Hal. I was shocked to find out that this was definitely not the case. This was the genesis of my discovery of DevOps and formal Ops automation which at the time was not called anything of the sort.

> @mattokeefe: Hal is working with you again now at Orbitz. Did you help to attract him with the mission of implementing DevOps-style automation?

> @martinjlogan: Indeed I did. I brought Hal in to Orbitz for a presentation on CM automation about 3 years ago. Everyone was impressed with the talk but the organization at the time was not quite ready for what he was showing us. Well, since then Orbitz has come a long way. Hal impressed our head of operations, Lou Arthur, that first time and I of course kept his name fresh in peoples heads. Some time later Lou, Hal and I had lunch some and Hal elaborated some very exciting concepts in automation; I think it was a done deal after that and an offer went out.

> @mattokeefe: Suppose you were to make another hire… would you look for a developer seeking to learn more about Ops, or a sysadmin looking to learn more about development?

> @martinjlogan: Well, that is an interesting question indeed. Companies tend to invest more heavily on the development side looking at Ops as more of a cost center than a driver for returns. DevOps is working to change this mentality. Spending money on broadening Ops is in line with looking at Ops as a revenue generator and I think most places are a bit unbalanced in this respect. At the end of the day though, I am looking for both. We need an engineer that is a sysadmin, or a sysadmin that is an engineer, and we need this person to help us build our Ops as a service that is more and more sophisticated and conducive to the efficient release of software to our customers.

> @mattokeefe: Can DevOps work in a company with highly centralized Operations? ITIL?

> @martinjlogan: There is a lot of technology that underpins DevOps and Continuous Delivery particularly but in a lot of ways DevOps is about breaking down walls. In a large organization there are a lot of walls. Any given organization will have appetite or see value, and subsequently benefit from, breaking down some of those walls.

> @mattokeefe: In some Agile orgs, walls are broken down by co-locating teams with all disciplines seated in the same area. Is this a good idea with DevOps too, or is it enough to perhaps have a war room where you can sit down together when needed?

> @martinjlogan: I am a big believer that DevOps is an extension of Agile in many ways. It is really taking Agile to its logical conclusion. In Agile we say, done means tested. Taken to its logical conclusion done means in production (in production of course implies tested). Agile is also again about breaking down walls and fostering the communication and feedback loops that allow for empirical process control to actually happen. If I want to be really Agile then I want to know when things are going wrong in in production, I want the whole team to feel the pain and solve the problem together and learn together and implement controls and improvements that solve the problem moving forward. I want them to own that together – so yes, I think sitting together fosters such more than does the use of a war room here and there. That said, I think putting whole teams in windowless rooms that were once used for meetings is cruel and unusual.

> @mattokeefe: What are some of the tools that your teams are using today? Do you find that developers and sysadmins have different preferences for tools?

> @martinjlogan: We have quite a variety of tools here. We are certainly big users of Graphite on the monitoring front. We are Jenkins users as well. We are moving over to Git in many places throughout the company. One of the reasons being that it fits in quite nicely with many third party tools like Gerrit. There is definitely some difference in the tools Dev and Ops naturally gravitate towards. I think this is natural. For example Ops has been a proponent of Puppet while there is a strong dev contingent advocating glu.

> @mattokeefe: Which session are you looking forward to the most at Camp DevOps?

> @martinjlogan: That is honestly a tough question. There are quite a few big brains presenting. Jez Humble is amazing and his book Chris Read speak a while back and he really impressed me as well with the tremendous depth of hands on practical knowledge he has. John Willis is also fantastic, he is quite a personality and has done so much for DevOps, not to mention the Chef expertise. It is honestly a difficult one to call. I guess if I had to pick for me right now, Pascal-Louis Perez. He was the CTO at Wealthfront where he moved them onto Continuous Deployment. Certainly doing such a thing for a financial services company takes some serious ingenuity. Very excited to learn from him. Really though, I am looking forward to the whole thing.

Infrastucture as Code, or insights into Crowbar, Cloud Foundry, and more

Infrastucture as Code, or insights into Crowbar, Cloud Foundry, and more

Keith Hudgins / 

Note: This is part 3 of a series on Crowbar and Cloud Foundry and integrating the two. If you haven’t yet, please go back and read part 1 and 2.

Over the last few days I’ve introduced you to Crowbarand Cloud Foundry. Both are fairly new tools in the web/cloud/DevOps orbit, and worth taking a look at, or even better, getting involved in the community efforts to flesh them out into full-featured pieces of kit that are easier to use, more stable, and closer to turnkey. Are either one ready for production? Oh heck yeah, but you’ll have to be able to read the code and follow it well in order to figure out what either project’s doing under the hood: they’re both real new, and the documentation (and architecture, when it comes right down to it) are still being written.

So where are these projects going?

Let’s start with Cloud Foundry:

Cloud Foundry works, right now. It’s rough around the edges yet, and there’s still some tooling and packaging that needs to be done to make it easier to run on your own infrastructure, but the bones are very solid and they work. VMWare is actively engaging partners to help expand the capabilities of the platform. Through its Community Leads program, the project has already gained application support for Python and PHP applications.

Just yesterday, the CF project announced Cloud Foundry Micro, which is a VMWare appliance image that you can use to set up a development box on your desktop. This is a neatly packaged box that allows you to test your applications before you deploy them into the Cloud Foundry PaaS. This is cool, but a little limited: it doesn’t yet support PHP and Python, and it’s a bit of a black box. Great for developers, but if you want to crack the hood and see the shiny engine, you’ll need to roll your own.

You can do that (sort of) with the older developer instance installer, which was documented in my last article. (Link’s here for convenience.) Very soon, VMWare will be releasing more robust Chef cookbooks that hopefully will come closer to a production style install library. The cookbooks inside that install package were the basis of the pending Cloud Foundry barclamp.

Since there are now three beta PaaS products based on Cloud Foundry, the future is looking bright. It’s fully open source, and you can either use it or help build it.

So what about Crowbar?

I’m glad you asked! Crowbar is a much lower-level tool on the stack (Your users and/or customers will never see it!) and is being put together by a smaller team (for now), but it solves a very interesting problem (how do you go from a rack of powered-down servers to a running infrastructure?) and is just beginning an important shift in focus.

Crowbar, as we’ve seen before, was originally written to do one thing: install OpenStack on Dell hardware. Very soon, it will begin to install other application stacks (Cloud Foundry, to start) and is opening up to be more easily extendable. CentOS/RHEL support is in the works for client nodes (currently, only Ubuntu Maverick is supported). The initial code has been committed to enable third-party barclamps. There are a small handful of community developers, and (I’ve been assured) the door is open for more. Fork the project on github, read the docs (I’ve got some more here), and start hacking.

Bonus: I’m documenting how to create your own barclamp and the lessons I’ve learned so far. As I write this, it’s a blank page. By the time I’m done , you’ll be able to make a barclamp to deploy your own application by following my instructions.

Page 5 of 26First34567Last