Skip to main content
senior
advanced
SRE

Incident Postmortems

Question

Describe a production incident you handled and how you structured the postmortem. What makes a good blameless postmortem?

Answer

A blameless postmortem focuses on systems improvement, not individual blame. Structure: incident timeline (what happened), impact assessment (users affected, duration), root cause analysis (5 whys technique), contributing factors, action items with owners and deadlines, and lessons learned. Share widely to prevent similar issues across teams.

Why This Matters

Blameless postmortems create psychological safety where engineers report issues honestly. If people fear punishment, they hide problems. Focus questions on 'how did our systems allow this?' not 'who made a mistake?'. This cultural shift is essential for building reliable systems and learning organizations.

Code Examples

Postmortem template

markdown
Common Mistakes
  • Blaming individuals instead of focusing on systemic improvements
  • Not following up on action items from previous postmortems
  • Writing postmortems only for major incidents (minor ones teach too)
Follow-up Questions
Interviewers often ask these as follow-up questions
  • How do you ensure action items from postmortems actually get completed?
  • What's the difference between root cause and contributing factors?
  • How do you balance incident response speed with thoroughness?
Tags
incident-management
postmortem
sre
culture
leadership