Following up from yesterday’s post.
Many of us technicians like to understand complex things and learn about the roots of hard problems.
We are also very well versed in technology, how it works – how to make machines and advanced sophisticated designs. But at our jobs we work in the context of the business. It’s the business outcome that is the important context to always keep foremost in our minds. Well, at least it should.
“We have to grasp not only the Know-How but also ‘Know Why…’”, – Shigeo Shingo (Toyota)
From time to time, the business comes to us asking for help to reach a desired outcome. The outcome might be to launch a new product, add new features to the web site, anticipate a bunch of new customers, open a new market, or shorten lead time for customer requests.
So, we technicians put on our thinking caps and get excited because we are going to get to make something.
“What’s the best solution?”, we ask ourselves. “It better cover all contingincies for changing scale or future concerns”, say the far thinkers. “It’s a big project with lots of moving parts”, exclaims the project manager.
What often happens? We technicians get mired in how we solve the problem, using all the tricks up our sleeves, relishing in our know how. We know how that goes. We say: “Meta-object protocols will make our solution infinitely extendable, even dynamically! This inheritance hiearchy design coupled with good component composition will ensure the software architectured is correctly multi layered.” Blank stare and reply from the business manager: “What?” We say: “Oh it’s superior technology”
Admit it. We’ve all been there saying or listening to this kind of stuff.
What stands between us and our goal is the complexity of the solution.
“The most dangerous kind of waste is the waste we do not recognize.” – Shigeo Shingo (Toyota)
Complexity is a killer. We already live in a dynamic, fast paced world and the business is always changing. Our work world is complex so why make our solutions complex? Complex solutions are fragile, hard to roll out and adopt, time consuming to fix and extend. Complex designs just make life harder. Does the business outcome depend on any of this complexity in the solution? If not, we are gold plating and are just creating a form of waste.
So, how do we avoid building the complexity? Here’s some good rules to live by:
1. Stay focused on the “Know Why”.
Be clear you understand the desired outcome the business expects. Are you sure about it? Do others agree with your interpretation. If not, you’ll be alone defending your choices or you might suffer living with your own decisions.
2. Don’t fall prey to your “Know How”.
Watch out for the inclination to create advanced/sophisticated/over-engineered designs and implementions. This just adds time, risk, and money to get the work done.
3. Be disciplined.
Building a new solution and migrating from the “old way” of doing things to the new way is by definition a transformation. The new way will require its own new “Know How” and will not be perfect/complete/usable the first time (or the 2nd, 3rd… time). If moving to the new solution is painful, you have to stay disciplined and keep iterating to make the solution less painful. Stay focussed on the outcome the business cares about and expects from you as your guiding principle.
4. Do the simplest thing.
Choosing products and tools that are inherently simple to use, require little Know How from everyone. Tool shininess gets the heart rate up because new is fun but this isn’t what the business cares about.
You know the old saying, the shortest path between two points is a straight line.
I recently made a couple of additional videos about the curiosity that is the Rerun project. You can find them below.
The conventional wisdom on shell scripts is that… well… “shell scripts suck”. But why? Shell as a language is extremely powerful and useful but shell scripts can quickly become unwieldy when trying to use amongst a team or in long-lived operations. But what if you had a framework that solved the team-level problems and the lacking of standardization while letting you use the full power and familiarity of shell? Enter Rerun, a simple tool that turns your favorite shell scripts into modular automation that has standardized options handling, command line completion, documentation generation, and a built-in test framework. Suddenly shell scripts don’t suck so bad anymore.
Why am I so interested in Rerun? Because I’ve seen Rerun have a positive effect on a very real human problem: In most non-startup organizations, the DevOps divide is made worse by a mismatch of skills, tools, and technologies.
It’s common for the Ops team to have used a tool like Puppet to automate server config and image building. But when it comes to app deployment and config, the various app teams don’t have the Puppet skills or motivation to follow suit. So each app team picks their own tooling or glue language. Of course, this just confuses Ops and makes their lives more difficult. Sometimes there will be a centralized release team (often now awkwardly rebranded as the “DevOps Team”) that will attempt to pick their own solution. But, neither Dev nor Ops ends up bring happy with the choice and the “DevOps Team” is now the bottleneck in the middle. Lots of noble DevOps intentions die in scenarios like this.
The effect of Rerun is that everyone can now come to the table on equal footing and use shell as their lingua franca. They can learn to collaborate using a simple framework for the “glue” that holds things together (of course, Ops still builds server images using a config management tool and Dev still builds their apps the way they want to). The built-in documentation generation and test automation framework makes handoffs easier. Everyone knows the simple command and options interfaces, but can also read each others code if need be (it’s just shell scripts, after all). Once you get everyone engaged and contributing to bridging the DevOps Gap, you can collaboratively start to look to other newer, specialized solutions.
I have to get Rerun’s creator, Alex Honor, to do a full post on Rerun. In the meantime you might find these videos interesting:
Video 1: Chuck Scott gives a tour of how he uses Rerun to turn his “keeper scripts” into reusable, standardized, test-driven automation
Video 2: Group discussion with Anthony Shortland, Lee Thompson, Chuck Scott that looks at an example of a DevOps toolchain automated with Rerun
Need a simple and self-contained* way to automate the full lifecycle of a Jenkins instance (install, uninstall, manage plug-ins, manage jobs, etc.)? Anthony Shortland shows how he gets it done with Rerun.
(*Why simple and self-contained? Many reasons… the company-wide adoption of full config management solution is proceeding at uneven pace, the need to use a lowest-common denominator language so you can have simple handoffs, you want to avoid “religious” tool wars, you need a very small footprint, you need it to be totally portable, …. and the list goes on)
Here is where you can find the Jenkins Rerun module:
I was honored to be asked to speak at DevOps Days in Manila and just got off stage. I was blown away when I found out over 400 people signed up to attend. Speaking gives me a chance to unload a bunch of baggage I’ve been carrying around years.
We all bring a lot of baggage with us into a job. The older you are, the more you bring. The first part of my career I did 10 years of real-time industrial control software design, implementation, and integration way way back before the web 1.0 days. Yes, I wrote the software for the furniture Homer Simpson sat in front of at the nuclear plant that was all sticky with donut crumbs…
I took that manufacturing background baggage to E*TRADE in ’96 where I ran into fellow dev2ops contributor Alex Honor who brought his Aimes Research Laboratory baggage of (at the time) massive compute infrastructure and mobile agents. We used to drink a bunch of coffee and try to figure out how this whole internet e-commerce thing needed to be put together. We’d get up crazy early at 4:30AM, listen to Miles, and watch the booming online world wake up and trade stocks and by 9:00AM have a game plan formulated to make it better.
My manufacturing background was always kicking in at those times looking for control points. Webserver hits per second, firewall MBits/sec, Auth success or fail per second, trades per second, quotes per second, service queue depths, and the dreaded position request response time. I was quite sure there was a correlation algorithm between these phenomena and I could figure it out if I had a few weeks that I didn’t have. I also knew that once I figured it out, the underlying hardware, software, network, and user demand would change radically throwing my math off. Controlling physical phenomena like oil, paper, and pharmaceutical products followed the math of physics. We didn’t have the math to predict operating system thread/process starvation and it took us years to figure out OS context switches per second has a huge kernel scaleability issue not often measured or written about.
One particularly busy morning in late ’96 Alex was watching our webserver, pointed at a measurement on the screen and said, “I think we’re gonna need another webserver”. With that, we also needed to figure out how to loadbalance webservers. As usual for the era, two webservers was a massive understatement. Within a year, there was more compute infrastructure at E*TRADE supporting the HTTPS web pages then the rest of the trading system and the trading system had been in place for 12 years by this time… Analytics of measurements (accompanied by jazz music) became an important part of our decision making.
Alex and I were also convinced in early ’97 that sound manufacturing principles used in the physical world made a ton of sense to apply to virtual online world of the internet. I’m still surprised the big control systems vendors like Honeywell and Emerson haven’t gotten into data center control. No matter, the DevOps community can make progress on it as its so complimentary to DevOps goals and its what the devops-toolchain project is all about.
Get a bunch of DevOps folks together and the topic of monitoring comes up every time. I always have to ask “Are you happy with it?” and the answer is always “no” (though I don’t think anyone at Etsy was there). When you drill into what’s wrong with their monitoring, you may find that most companies have plenty of monitoring, what they don’t have is control.
Say your app in production runs 100 logins/sec and you are getting nominally 3 username/password failures a second. While the load may go up and down, you learn that that the 3% ratio is nominal and in control. If the ratio increments higher, that may be emblematic of a script kiddie running a dictionary attack or the password hash database is offline or a application change making it harder for users to properly input their credentials. If it drops down, that may indicate a professional psyber criminal is running an automated attack and getting through the wire. Truman may or may not of said “if you want a new idea, read an old book”. In this case, you should be reading about “Statistical Process Control” or SPC. It was heavily used during WWII. With our login example, the ratio of success to failed login attempts would be “Control Charted” and the control chart would evaluate weather the control point was “in control” or “out of control” based on defined criteria like standard deviation thresholds.
Measurement itself is a very low level construct providing the raw material for the control goal. You have to go through several more toolchain layers before you get to the automation you are looking for. We hit upon this concept in our talk at Velocity in 2010…
Manufacturing has come a long long way since WWII. Toyota built significantly on SPC methodologies that eventually became the development of “Lean Manufacturing”; a big part of the reason Toyota became the worlds largest automobile manufacturer in 2008. A key part of lean is Value Stream Mapping which is “used to analyze and design the flow of materials and information required to bring a product or service to a consumer” (wikipedia).
Value Stream Mapping a typical online business through marketing, product, development, qa, and operations flows minimally will help effectively communicate rolls, responsibilities, and work flows through your org. More typically it becomes a tool to get to a “future state” which has eliminated waste and increase effectiveness of the org, even when nothing physical was “manufactured”. I find agile development, devops, and continuous deployment goals all support lean manufacturing thinking. My personal take is that ITIL has similar goals, but is more of process over people approach instead of a people over process approach and it’s utility will be dependent on the organizations management structure and culture. I prefer people over process, but I do reference ITIL every time I find a rough or wasteful organizational process for ideas on recommending a future state.
I was lucky enough to catch up with Alex, Anthony, and Damon over dinner and we were talking big about DevOps and Lean. Anthony mentioned that “we use value stream mapping in all of our DevOps engagements to make sure we are solving the right problem”. That really floored me on a few levels. First off, it takes Alex’s DevOps Design Patterns and DevOps Anti-Patterns to the next level similar to SPC to Lean adding a formalism to the DevOps implementation approach. It also adds a self correcting aspect to a companies investment into DevOps optimizations. I’ve spoken with many companies who made huge investments in converting to Agile development without any measurable uptick in product deployment rates. While these orgs haven’t reverted back to a waterfall approach as they like the iterative and collaborative approach, they hit the DevOps gap head on.
“We use Value-Stream Mapping in all of our DevOps engagements to make sure we are solving the right problem”
-Anthony Shortland (DTO Solutions)
Practicers of Lean Manufacturing see this all the time. Eliminating one bottleneck just flows downstream to the next bottleneck. To expect greater production rates, you have to look at the value stream in its entirety. If developers were producing motors instead of software functions, a value stream manager would see huge inventory build up of the motors which produce no value to the customer and identify the overproduction as waste. Development is a big part of the value stream and making that more efficient is a really good idea. But a measurement of the release backlog growing is seldom measured or managed. If you treat your business as a Digital Information Manufacturing plant and manage it appropriately to that goal, you can avoid the frequent mistake Anthony and other Lean practitioners are talking about where you solve a huge problem without benefiting the business or the customer.
To sum up, DevOps inspired technology can learn quite a bit from Lean Manufacturing and Value Stream Mapping. This DevOps stuff is really hard and you’ll need to leverage as much as possible. Always remember that “Good programmers are lazy” and its good when you apply established tools and techniques. If you don’t think your working in a Digital Information Manufacturing plant, I bet your CEO does.