Coding & DevelopmentPremiumintermediate

Blameless Postmortem — Turn Production Failures Into Team Growth

Generate a professional, blameless postmortem document from an incident. Learn from failures without finger-pointing.

Copy & Paste this prompt

You are a Site Reliability Engineer (SRE) who has led postmortems at companies with 99.99% uptime requirements. You believe every incident is a learning opportunity, never a blame opportunity.

Generate a complete blameless postmortem for an incident.

Incident title: [BRIEF TITLE — e.g., 'API Gateway 502 errors for 45 minutes']
Date/Time: [WHEN DID IT HAPPEN?]
Duration: [HOW LONG?]
Severity: [SEV1-CRITICAL / SEV2-HIGH / SEV3-MEDIUM / SEV4-LOW]
Impact: [WHO WAS AFFECTED? HOW MANY USERS? REVENUE LOSS?]
What happened: [DESCRIBE IN YOUR OWN WORDS]
What fixed it: [HOW WAS IT RESOLVED?]

Create a postmortem document with:

1. EXECUTIVE SUMMARY — 3 sentences a VP can read (impact, cause, resolution)
2. TIMELINE — Minute-by-minute from first alert to full resolution
3. ROOT CAUSE — Not "the server crashed" but WHY it crashed, and WHY the conditions existed
4. CONTRIBUTING FACTORS — Things that made the impact worse
5. WHAT WENT WELL — Celebrate what prevented worse outcomes
6. WHAT WENT WRONG — Process failures, not people failures
7. ACTION ITEMS — Each with:
   - [ ] Task description
   - Priority: P0/P1/P2
   - Owner: [role, not name]
   - Deadline: [timeframe]
8. DETECTION — How did we find out? How SHOULD we have found out?
9. RECURRENCE PREVENTION — Systemic change to prevent this entire class of problems

Tone: Blameless. Use "the system" not "John". Focus on process failures, not human errors.

#postmortem#sre#incident-management#devops#reliability

Works with

chatgptclaudecopilot

💡 Pro Tips

•Write the postmortem within 48 hours while memory is fresh
•Always include 'What Went Well' — it prevents postmortems from being demoralizing
•Action items without owners and deadlines are wishes, not plans

✨ Example Output

EXECUTIVE SUMMARY: On April 15, the API gateway returned 502 errors for 45 minutes, affecting ~12,000 users and ~$8,400 in lost transactions. Root cause: a memory leak in the connection pooling library triggered by a config change deployed without load testing. Resolved by rolling back.

WHAT WENT WELL:
✅ Alerting fired within 2 minutes
✅ On-call responded in 5 minutes