“Those who cannot remember the past are condemned to repeat it”. - George Santayana
Post-mortem means “after death” in Latin. Usually you’d hear it in the medical context: an autopsy is a post-mortem examination to find out the cause of death.
It’s a tiny bit ironic that as a teenager I was into medicine (thanks to House M.D.), and the area that I had most interest in was… pathology (pathologists are the ones performing post-mortems). I wanted to discover the root causes of what happened.
Years later, I do get to do (almost) exactly that…
The concept of a post-mortem as well exists in tech and it can be extremely beneficial to learn from mistakes. In this Quality Foundations article, I’ll share more on how post-mortems are used in the tech development context.
Whenever I join a new tech project, I find myself asking: where are the post-mortems? Is there a post-mortem about this issue that just took place?
For anyone with a concern of quality and continuous improvement, post-mortems are a gold mine to learn from.
What is a Post-Mortem and when is it created?
In some contexts, post-mortem is used in project management after the project completion.
The way I’ve seen it used is related to the development teams is slightly different: when there’s a production issue like an outage or loss of service for our customers, we need to take that seriously and learn from it. Why did it happen? What should we do to make sure it does not happen again?
Post-Mortem is a written document which examines the reasons for the issue that affected production.
It records the timeline of events that took place, measures taken, analyzes the root cause, and gathers the follow-up action items. It’s like a themed retrospective of sorts. It could be done in both asynchronous and synchronous ways combined with a meeting if necessary.
It is essential to ensure that the post-mortem remains objective and focuses on facts rather than blame. Identify what happened, not whom to blame.
Who is involved in writing it?
Anyone involved in the incident may take part in writing the post-mortem. Usually it’s the people who have the biggest context on what happened and were the ones to resolve it.
When it comes to reading it - it’s open to all. In addition, you could combine post-mortem artifacts together with a mini retrospective going through all steps that took place. I especially appreciated post-mortems done with the whole team where the incident happened so that others have a chance to learn, too.
How does it look like?
Post-mortem is a written document. It could be written as a Notion or Confluence page (or any other tool/system you’re using). Usually the company has a space on where to store all the incident post-mortems so you can learn from them.
There are a few main points to cover in the post-mortem:
- Summary of the issue that took place
- Timeline of the events
- Root causes
- Any graphs/links/attachment that are helpful to understand the issue or possible solution
- Follow-up action items
You could do a much deeper analysis and adjust the points based on your needs and context.
Simplified example of a post-mortem template
We learned, wrote it down, now what?
The point of post-mortems is to learn from our mistakes. If you see that the same issue happens over and over again, it’s a great chance to catch the patterns and see if we did all the action points we said we’ll do. As an outcome of post-mortem, we should identify actionable items that can be implemented to prevent a similar incident in the future. These should include process improvements, system changes, and updated documentation.
Bonus points: If you’re extra into learning and utilizing post-mortems, there’s a great metric to follow: unresolved post-mortem items. Learning from failures means not only writing down what went wrong, but also acting on it. If we want to embrace a learning from mistakes culture and the lesson “we fail, we learn”, we need to complete our action items first.
Post-mortems are a great way to analyze the issues that took place. They help us to better understand what happened, share knowledge with more people, and collect action items to prevent issues like that in the future.