Where are the design patterns for software operations?
In the world of software development, application developers are accustomed to drawing from the wealth of design patterns that address common programming problems, codify best practices, and establish proven reusable solutions. There are several well known design pattern repositories that catalog solutions into various categories from fundamental ones described by the GangOfFour to architecture specific ones like J2EE Patterns, even ones for social organization. An Anti-pattern is a pattern that tells how to go from a problem to a bad solution. Design patterns help avoid re-inventing solutions and when combined together can form the basis of a problem solving “play book.” When used effectively, design patterns become a common problem solving language and can lead to better written software.
But what happens after the code is written? For most organizations today, software operations – the acts of deploying, configuring, and operating software (and all of its related code and data artifacts) – is arguably as important as writing the software itself. If such an organization can’t efficiently and reliably operate the software, the quality of the software will not matter. But if one looks for design patterns that codify best practices for automating software operations, nothing turns up. Where is the catalog of design patterns that address the problems encountered when managing environments of software deployments and the overall life cycle of the business service?
Anyone that has managed software operations for different organizations, will recognize the same kinds of problems and will often re-invent solutions that were successful in the past. Others that work closer to the bleeding edge will encounter problems that other groups will face later. If these problems could be discussed in terms of design patterns (or failures as anti-patterns), solutions and best practices for managing software operations would be more consistent across organizations.
Here are two specific problem areas that everyone can identify with:
- Packages: Depending on the application and infrastructure, one will find multiple package formats in use. Operating systems use their own (eg, .rpm, .deb, .pkg, .msi, etc) and so do software runtime environments (eg, java, .net). Each format has its own way (to greater or lesser extents) of being created, extracted, and described (including dependencies). These differences lead to multiple package silos and administrative gray areas (cumbersome handoffs between dev and admin groups). It would be preferable to have a common repository that can host any kind of package type, and a homogeneous interface to controlling their life cycle (creation, installation and removal).
- Services: At a certain level, one can view applications as a set of interacting long running processes. Again, depending on the application architecture, these processes might be standalone unix-style daemons, or windows services. Each service has its own way of being started or stopped, as well as a procedure for checking its current runtime state. Often times, shutting down a service is not a simple matter of just invoking a single command. Things go wrong at shutdown requiring other logic to figure out the next course of action. Besides coping with these differences, the deployment process is also difficult because change of runtime state and software package installation is intertwined. Software operations would benefit from a body of design patterns that described proven strategies to managing runtime state and a common model for describing these states.
Here is a sampling of general recurring problems in the world of software operations:
- Complex application deployments: Applications are based on technologies from different vendors, are spread out over numerous machines in multiple environments, and use different architectures
- Inconsistent management interfaces: Every application component and supporting piece of infrastrucure has a different way of being managed. This includes both how components are controlled and how they are configured.
- Hard to scale administrative management: As the layers of software components increase, so does the difficulty to coordinate actions across them. This is especially difficult when the same application can be setup to run in a minimal footprint while another can be designed to support massive load and redundancy.
- Incoherent life cycles: Applications are typically multi-tiered, where each tier may be on its own development track, uses its own release paradigm and requisite tools.
- Generally, these problems are found in combination which means coping with them on the whole is a difficult challenge.
What’s needed: Domain specific patterns for software operations
The body of existing design patterns can and should be used to analyze and solve some of the above problems. To make the design patterns more readily useful to software operations, we need a set of domain specific patterns. These patterns would be expressed in terms of concepts familiar to software operations groups (eg, package, service, process, node, etc) and would be geared to coping with typical problems they face (eg, various startup, shutdown strategies for services among many others). Ideally, these patterns can be composed into a system of patterns that help solve larger scale problems.
Developing patterns is a bit of an organic process but the most durable patterns are ones that have been proven over and over again in different contexts. The first step is to establish a repository to which various patterns can be contributed and a supporting forum where their merits can be discussed. Ultimately, the software operations community will find consensus about some of these patterns, thus establishing some common vocabulary and a basis for framework development.