The New Yorker has a great article on the success of using checklists to tame extremely complex systems.
The primary example used in the article is intensive care units in hospitals. Anywhere you see the term “intensive care” substitute “data center” and anywhere you see a name of a medical procedure substitute the name of a technical procedure and the lessons are essentially the same.
What are the lessons?
1. Where checklists have been formalized and rigidly enforced (as a means of documenting and enforcing best practices), millions of dollars have been saved and many deaths (the ultimate “system outage”) have been avoided.
2. The concept of checklists is so simple and unsexy that their awesome saving power is often overlooked. Admit it, your inner geek yawns just thinking about checklists.
How can checklists immediately improve IT operations?
First, agree on your best practices and document them. Second, strictly enforce the rule that all operations activities must follow those procedures. Third, record the completion of each step of the procedure for trouble shooting and analysis.
Sounds like such common sense, doesn’t it? If it is then why do most IT operations fail at implementing such a simple culture of orderly change management?