Postmortem: What Happened?

Postmortem: What Happened?

Postmortem is simply a report of what previously happened usually after a bug fix in the software industry. A postmortem is a process intended to help you learn from past incidents. It typically involves an analysis or discussion soon after an event have taken place. A postmortem, in fields such as medicine, typically refers to an examination conducted during an autopsy. However, in a technical context, the term is used to describe the process of analyzing and learning from past incidents, although the fundamental purpose of gaining insights from previous experiences remains consistent across both disciplines.

Whenever the users of a company's service is faced with any difficulty during the course of using the service, the company goes in to find and fix the issue which is very crucial to maintain their relevance and integrity to its users. The company usually sit back, write a report of the potential cause of the issue (bug) and how it was fixed and possible ways of avoiding such issue in future developments. This is known as a Postmortem.

In this article, we will see why postmortem is very important in software development and a template of how it looks like while relating that to a real-world business scenario.

Uses Of Postmortem

A postmortem is useful in many ways, below are some of the key points the list goes on.

  • Providing an overview of the major cause of an incident and why it happened.

  • Protecting company from future occurrence that maybe similar to the previous.

  • In cases were the teams forgets how similar issue was fixed, the postmortem will help team for future reference.

  • Helps maintain company integrity and trust to its users. Users will know that their needs are being met as continuous integration and developments are being made in the products they use.

  • Companies can improve their services and product development for its end-users.

Key Points To Note When Writing A Postmortem

  • Brief: Every point and statement made in a postmortem should be short and precise. No need for long talks, just go straight to the point.

  • Clarity: All your points must be clear and understandable. Nobody will want to read what he/she doesn't understand right?. Let your work be understandable even to non-technical people that may come across it.

  • Comprehensive: While being brief, try to be comprehensive and include every detail about the incident. These set of questions might help in your writing:

    1. What happened? Describe the incident clearly and objectively.

    2. When did it happen? Provide a timeline of events.

    3. Who/What was involved? Note any body or system that was part of the response.

    4. What was the impact? Detail the consequences, including downtime, financial loss, customer impact and other relevant metrics.

  • Accountability: For every action taken in solving the issue, you need to be accountable for them. This ensures that during the process of solving the problem, all actions are documented.

  • Resolution: This is usually the last, the solutions, lessons learned and how the company plans to improve in case of similar issues. This should be detailed and discussed to avoid any future occurrence.

Writing A Postmortem

Let me show how a postmortem is written, short and straight to the point. This template will help you report any incident. It is not limited to only software engineers but to as many fields that require documentation of incidents

POSTMORTEM REPORT

Incident Overview
Title: <Title of the Incident based on what happened>

Date: <date of writing the postmortem>

Duration: <time frame of the identifying and solving the incident>

Impact: <impact of incident on company or users>

Report Prepared By: <writer of postmortem>

Summary of Incident: <summary of incident>

Timeline Of Event: < the timeline of each event, just like you would in a minute>
                ...
18:28 UTC - Monitoring tools detect an unusually high rate of 500 errors in the server logs. An alert is automatically sent to the on-call system administrator and development team.
18:35 UTC - The first response team acknowledges the alert and begins initial assessment.
                ...

Corrective and Preventive Measures: <the measures to be taken for any similar event>

Conclusion: <your conclusion on the event>

Recommendation for Future Incidents: <what you think for the future in such cases>

It becomes really easy to write a simple postmortem about an incident. To see a visible example of the postmortem in action, you can head over to my GITHUB page.

ย