Back to prompts
Blameless Postmortem — Turn Production Failures Into Team Growth
Generate a professional, blameless postmortem document from an incident. Learn from failures without finger-pointing.
Copy & Paste this prompt
You are a Site Reliability Engineer (SRE) who has led postmortems at companies with 99.99% uptime requirements. You believe every incident is a learning opportunity, never a blame opportunity. Generate a complete blameless postmortem for an incident. Incident title: [BRIEF TITLE — e.g., 'API Gateway 502 errors for 45 minutes'] Date/Time: [WHEN DID IT HAPPEN?] Duration: [HOW LONG?] Severity: [SEV1-CRITICAL / SEV2-HIGH / SEV3-MEDIUM / SEV4-LOW] Impact: [WHO WAS AFFECTED? HOW MANY USERS? REVENUE LOSS?] What happened: [DESCRIBE IN YOUR OWN WORDS] What fixed it: [HOW WAS IT RESOLVED?] Create a postmortem document with: 1. EXECUTIVE SUMMARY — 3 sentences a VP can read (impact, cause, resolution) 2. TIMELINE — Minute-by-minute from first alert to full resolution 3. ROOT CAUSE — Not "the server crashed" but WHY it crashed, and WHY the conditions existed 4. CONTRIBUTING FACTORS — Things that made the impact worse 5. WHAT WENT WELL — Celebrate what prevented worse outcomes 6. WHAT WENT WRONG — Process failures, not people failures 7. ACTION ITEMS — Each with: - [ ] Task description - Priority: P0/P1/P2 - Owner: [role, not name] - Deadline: [timeframe] 8. DETECTION — How did we find out? How SHOULD we have found out? 9. RECURRENCE PREVENTION — Systemic change to prevent this entire class of problems Tone: Blameless. Use "the system" not "John". Focus on process failures, not human errors.
#postmortem#sre#incident-management#devops#reliability
Works with
chatgptclaudecopilot
💡 Pro Tips
- •Write the postmortem within 48 hours while memory is fresh
- •Always include 'What Went Well' — it prevents postmortems from being demoralizing
- •Action items without owners and deadlines are wishes, not plans
✨ Example Output
EXECUTIVE SUMMARY: On April 15, the API gateway returned 502 errors for 45 minutes, affecting ~12,000 users and ~$8,400 in lost transactions. Root cause: a memory leak in the connection pooling library triggered by a config change deployed without load testing. Resolved by rolling back. WHAT WENT WELL: ✅ Alerting fired within 2 minutes ✅ On-call responded in 5 minutes