Build an AI Incident Postmortem Generator

Your pager fires at 3 AM. The database connection pool is exhausted, requests are piling up, and customers are seeing 500 errors. Two hours later you've patched the leak, restarted the service, and crawled back to bed. Now comes the part nobody enjoys: writing the postmortem.

Incident postmortems are critical for engineering teams — they capture what went wrong, why, and what to do about it. But writing one from scattered logs, Slack threads, and PagerDuty alerts is tedious. In this tutorial, you'll build a Python CLI tool that ingests raw incident data and produces a structured postmortem report using Claude via EzAI API. The entire thing runs in about 30 seconds and costs less than a penny per report.

How the Pipeline Works

The generator follows a five-stage pipeline: collect raw data, send it through Claude for root cause analysis, score the impact, generate a structured report, and optionally push it to Slack or email. Each stage is a single function, and Claude handles the heavy lifting — pattern recognition across log lines, timeline reconstruction, and natural-language summarization.

AI Postmortem Pipeline — From Logs to Report

Five-stage pipeline: logs in, structured postmortem out — powered by Claude via EzAI

Setup and Dependencies

You need an EzAI API key and Python 3.10+. Install the Anthropic SDK — it works with EzAI out of the box since EzAI is a drop-in replacement for the Anthropic API.

bash

pip install anthropic python-dateutil

Set your API key as an environment variable:

bash

export EZAI_API_KEY="your-key-here"

Building the Core Generator

The generator takes a directory of log files (plain text, JSON lines, or structured alerting output) and feeds them to Claude with a carefully crafted system prompt. Claude returns a JSON object with distinct sections: summary, timeline, root cause, impact assessment, and action items.

python

import anthropic
import json
import os
from pathlib import Path

client = anthropic.Anthropic(
    api_key=os.environ["EZAI_API_KEY"],
    base_url="https://ezaiapi.com",
)

SYSTEM_PROMPT = """You are a senior SRE writing an incident postmortem.
Analyze the provided logs and produce a JSON object with these keys:

- "title": concise incident title (max 80 chars)
- "severity": "SEV1" | "SEV2" | "SEV3" | "SEV4"
- "summary": 2-3 sentence executive summary
- "timeline": array of {"time": "ISO8601", "event": "description"}
- "root_cause": detailed root cause analysis (3-5 sentences)
- "impact": {"users_affected": int|null, "duration_minutes": int,
             "revenue_impact": string, "services": [string]}
- "action_items": [{"priority": "P0"|"P1"|"P2", "owner": string,
                     "task": string, "due": string}]
- "lessons_learned": array of strings
- "detection": how the issue was detected, time-to-detect

Be specific. Reference exact timestamps, error codes, and metrics
from the logs. Do not invent data not present in the input."""


def collect_logs(log_dir: str) -> str:
    """Read all log files from a directory, sorted by name."""
    log_dir = Path(log_dir)
    chunks = []
    for f in sorted(log_dir.glob("*")):
        if f.is_file() and f.suffix in (".log", ".txt", ".json", ".jsonl"):
            content = f.read_text(errors="replace")[:50000]
            chunks.append(f"=== {f.name} ===\n{content}")
    return "\n\n".join(chunks)


def generate_postmortem(log_dir: str, model: str = "claude-sonnet-4-5") -> dict:
    """Analyze logs and generate a structured postmortem."""
    raw_logs = collect_logs(log_dir)
    if not raw_logs:
        raise ValueError(f"No log files found in {log_dir}")

    message = client.messages.create(
        model=model,
        max_tokens=4096,
        system=SYSTEM_PROMPT,
        messages=[{
            "role": "user",
            "content": f"Analyze this incident data:\n\n{raw_logs}"
        }],
    )

    # Parse the JSON response
    text = message.content[0].text
    # Strip markdown code fences if present
    if text.startswith("```"):
        text = text.split("\n", 1)[1].rsplit("```", 1)[0]

    return json.loads(text)

Two things to note here. First, the base_url points to ezaiapi.com — that's the only change from a direct Anthropic setup. Second, we cap each log file at 50,000 characters. Claude Sonnet's 200K context window can handle far more, but keeping the input focused reduces cost and improves accuracy.

Rendering the Report

Raw JSON isn't what you'd paste into a Notion doc. Let's add a renderer that produces clean Markdown suitable for Confluence, GitHub Issues, or Slack.

python

def render_markdown(pm: dict) -> str:
    """Convert postmortem JSON to formatted Markdown."""
    lines = [
        f"# {pm['title']}",
        f"**Severity:** {pm['severity']}  ",
        f"**Duration:** {pm['impact']['duration_minutes']} minutes  ",
        f"**Services:** {', '.join(pm['impact']['services'])}\n",
        f"## Summary\n{pm['summary']}\n",
        "## Timeline",
    ]
    for entry in pm["timeline"]:
        lines.append(f"- **{entry['time']}** — {entry['event']}")

    lines.append(f"\n## Root Cause\n{pm['root_cause']}\n")
    lines.append(f"## Detection\n{pm['detection']}\n")

    lines.append("## Action Items")
    lines.append("| Priority | Owner | Task | Due |")
    lines.append("|----------|-------|------|-----|")
    for item in pm["action_items"]:
        lines.append(
            f"| {item['priority']} | {item['owner']} "
            f"| {item['task']} | {item['due']} |"
        )

    lines.append("\n## Lessons Learned")
    for lesson in pm["lessons_learned"]:
        lines.append(f"- {lesson}")

    return "\n".join(lines)

Adding the CLI Interface

Wrap it in a CLI with argparse so your on-call engineers can run postmortem ./incident-logs/ right from the terminal. We support multiple output formats and model selection for teams that want to balance speed vs. depth.

python

import argparse
import sys

def main():
    parser = argparse.ArgumentParser(
        description="Generate incident postmortems from log files"
    )
    parser.add_argument("log_dir", help="Directory containing incident logs")
    parser.add_argument(
        "--model", default="claude-sonnet-4-5",
        help="Model to use (default: claude-sonnet-4-5)"
    )
    parser.add_argument(
        "--format", choices=["markdown", "json"],
        default="markdown", help="Output format"
    )
    parser.add_argument("--output", "-o", help="Write to file instead of stdout")

    args = parser.parse_args()

    print("Analyzing incident logs...", file=sys.stderr)
    pm = generate_postmortem(args.log_dir, model=args.model)

    if args.format == "json":
        output = json.dumps(pm, indent=2)
    else:
        output = render_markdown(pm)

    if args.output:
        Path(args.output).write_text(output)
        print(f"Postmortem saved to {args.output}", file=sys.stderr)
    else:
        print(output)


if __name__ == "__main__":
    main()

Now your engineers can run:

bash

# Quick postmortem from last night's incident
python postmortem.py ./incidents/2026-04-03-db-pool/ -o postmortem.md

# Use Opus for complex multi-service outages
python postmortem.py ./incidents/2026-04-03-db-pool/ --model claude-opus-4 -o postmortem.md

# JSON output for integration with ticketing systems
python postmortem.py ./incidents/2026-04-03-db-pool/ --format json | jq '.action_items'

Extending with Streaming for Long Incidents

Major outages produce massive log files. For incidents with 100K+ lines, you'll want streaming responses so the team gets real-time feedback instead of staring at a spinner. EzAI supports SSE streaming through the same SDK:

python

def generate_streaming(log_dir: str, model: str = "claude-sonnet-4-5"):
    """Stream the postmortem generation for real-time feedback."""
    raw_logs = collect_logs(log_dir)

    with client.messages.stream(
        model=model,
        max_tokens=4096,
        system=SYSTEM_PROMPT,
        messages=[{
            "role": "user",
            "content": f"Analyze this incident data:\n\n{raw_logs}"
        }],
    ) as stream:
        full_text = ""
        for text in stream.text_stream:
            full_text += text
            print(text, end="", flush=True)

    return json.loads(full_text)

Cost Breakdown

Postmortem generation is surprisingly cheap. A typical incident with 5-10 log files (~30K tokens input) runs through Sonnet for about $0.003 per report. Even if your team runs 50 postmortems a month, that's $0.15 total. With EzAI's discounted pricing, it's even less.

For complex multi-service outages where you need Claude Opus's deeper reasoning, the cost bumps to roughly $0.05 per report — still cheaper than the 2 hours an engineer would spend writing it manually.

Production Tips

Chunk large log files — If a single log exceeds 100K characters, split it by time window before sending. Claude performs better on focused chunks than on massive undifferentiated blobs.
Add retry logic — Wrap the API call with exponential backoff. Check our retry strategies guide for a production-ready implementation.
Cache results — Hash the input logs and cache the JSON output. Same incident data should return the same postmortem without burning another API call.
Validate JSON output — Claude occasionally produces slightly malformed JSON on very long responses. Add a try/except around json.loads() with a repair fallback.
Hook into your incident workflow — Trigger the generator from PagerDuty webhooks or Slack commands so postmortems start generating the moment an incident resolves.

What's Next

You now have a working postmortem generator that turns scattered logs into structured, actionable reports. Extend it further by adding Jira ticket creation from action items, Slack delivery with your AI Slack bot, or a web UI that lets engineers annotate and edit before publishing.

The full source code is about 120 lines of Python. Get your EzAI API key and start generating postmortems — your on-call team will thank you.

Build an AI Incident Postmortem Generator

How the Pipeline Works

Setup and Dependencies

Building the Core Generator

Rendering the Report

Adding the CLI Interface

Extending with Streaming for Long Incidents

Cost Breakdown

Production Tips

What's Next

Related Posts

AI API Retry Strategies for Production

Build an AI-Powered Slack Bot with EzAI