This morning I was reminded about the importance of keeping notes about the problems I encounter and the solution that provided the fix. A co-worker of mine researching a problem came across my blog post through a google search. It seems their environment is suffering from the same symptoms. Had I not kept a note about that issue they would have most likely gone through the same level of effort I originally did to research and find the solution. Hopefully when they apply the fix it resolves their problem (I think it will) and they didn't have to spend several hours/days to get to resolution.
This brought to mind the importance of a run book. If you are not familiar with the term a run book is basically an operations manual. It spells out what needs to be done for various tasks. For example, if we need to deploy a new application into production the run book has a full set of instructions (a recipe if you will) on what to do. Likewise, when the trouble shooter debugs a problem they record in the run book what the problem was, the steps they took to determine root cause, the fixes they tried and which one(s) finally worked. This way should the problem reoccur and a different shift of people see the same problem they can refer to the run book and go through the same steps.
The run book does not have to only address technical details. It can also provide operational response instructions like how to run a war room,who needs to be involved, when various organizations/teams are engaged, what triggers an engagement, etc, etc. Every operational detail can be recorded in the run book.
Does your production environment have a run book?