Lesson 4 of 5·14 min read·Includes quiz

Post-Incident Review & Lessons Learned

Blameless PIRs, root cause analysis, 5-Whys, MTTD/MTTR metrics, updating playbooks

What You'll Learn

  • Explain why post-incident reviews (PIRs) are essential for improving detection, response, and organizational resilience
  • Conduct a blameless PIR using timeline reconstruction, root cause analysis, and the 5-Whys technique
  • Document actionable lessons learned and convert them into detection rule updates, playbook revisions, and process improvements
  • Calculate and interpret key IR metrics — Mean Time to Detect (MTTD), Mean Time to Respond (MTTR), and dwell time
  • Share incident-derived threat intelligence through MISP using appropriate TLP markings
  • Apply PIR methodology in Lab 13.4 to produce a complete post-incident documentation package

Why Post-Incident Reviews Matter

Every incident — whether a ransomware outbreak, a phishing compromise, or a false alarm that consumed four hours of analyst time — contains information your team can use to improve. The post-incident review (PIR) is the mechanism that extracts that information and converts it into concrete improvements.

Organizations that skip PIRs repeat the same mistakes. The same detection gaps that let an attacker dwell for 72 hours will let the next attacker dwell for 72 hours. The same playbook ambiguity that delayed containment by 45 minutes will delay it again. The PIR breaks this cycle.

PIR vs. Post-Mortem: These terms are often used interchangeably. Some organizations reserve "post-mortem" for major incidents and "PIR" for all incidents. The process is identical — what matters is that it happens consistently, regardless of incident severity.

The Blameless Culture Imperative

A PIR fails if people are afraid to speak honestly. If the analyst who missed the initial alert fears punishment, they will not disclose what happened. If the responder who ran the wrong containment command fears blame, the team will never learn from that mistake.

Blameless does not mean accountable-less. It means:

  • Focus on systems and processes, not individuals
  • Ask "what allowed this to happen?" not "who let this happen?"
  • Treat human error as a symptom of process gaps, not a root cause
  • Document findings as opportunities for improvement, not evidence for disciplinary action
Blame-Oriented LanguageBlameless Language
"The analyst failed to detect the alert""The alert was not visible in the analyst's queue due to filter configuration"
"The responder made a mistake during containment""The containment playbook did not cover this scenario, leading to an improvised response"
"The admin left the port open""The change management process did not include a security review step for firewall changes"

Conducting an Effective PIR

When to Hold the PIR

Schedule the PIR within 48-72 hours of incident closure — soon enough that details are fresh, but far enough from the crisis that participants can reflect objectively. For major incidents, hold an initial "hot wash" within 24 hours for immediate fixes, followed by a deeper PIR within a week.

Who Should Attend

RoleWhy They Attend
Incident Commander / LeadProvides the authoritative timeline and decision log
All responding analysts (L1, L2, L3)Each tier saw different aspects of the incident
Detection engineeringEvaluates why existing rules did or did not fire
IT/sysadmin (if involved)Explains infrastructure context and containment actions taken
Management (observer)Understands resource and process gaps without directing the conversation

The PIR Agenda

A structured agenda prevents the PIR from becoming a rambling war story session:

  1. Timeline reconstruction (20 min) — Build the authoritative chronological record
  2. What went well (10 min) — Identify strengths to preserve
  3. What needs improvement (15 min) — Identify gaps without blame
  4. Root cause analysis (15 min) — Determine why the incident happened and why response gaps existed
  5. Action items (15 min) — Assign specific, measurable improvements with owners and deadlines

Timeline Reconstruction

The timeline is the foundation of every PIR. Reconstruct it from multiple sources:

Source                  | What It Provides
------------------------|------------------------------------------
SIEM/Wazuh logs         | Alert timestamps, rule IDs, severity
TheHive case log        | Analyst actions, task assignments, notes
Communication records   | Slack/Teams messages, email threads, phone logs
Endpoint telemetry      | Velociraptor artifacts, process timelines
Network logs            | Firewall blocks, DNS queries, proxy logs
Change management       | Any infrastructure changes during the window

Build the timeline collaboratively on a shared screen. Each participant fills in their piece. The result should answer: What happened, in what order, and what was the gap between each event and the team's awareness/response?

Time (UTC)    | Event                                    | Source         | Gap
--------------|------------------------------------------|----------------|------------
Day 1 09:00   | Phishing email delivered                 | Email gateway  | —
Day 1 09:04   | User clicks link, credential harvested   | Proxy logs     | —
Day 1 09:15   | Attacker logs in via VPN                 | VPN logs       | Not detected
Day 1 10:30   | Lateral movement to file server          | Wazuh          | Alert fired
Day 1 10:45   | L1 analyst triages alert                 | TheHive        | 15 min
Day 1 11:20   | L2 escalation — confirms compromise      | TheHive        | 35 min
Day 1 11:45   | Containment — VPN session killed, pwd reset | TheHive     | 25 min
Day 1 12:00   | Sweep begins for lateral movement        | Velociraptor   | —

Root Cause Analysis: The 5-Whys Technique

The 5-Whys is a simple but powerful technique for drilling past symptoms to find root causes. Start with the problem statement and ask "why?" repeatedly:

Problem: Attacker dwelled in the environment for 90 minutes before detection.

  1. Why? The initial VPN login from the attacker's IP did not trigger an alert.
  2. Why? The VPN authentication rule only alerts on failed logins, not successful logins from new locations.
  3. Why? The detection rule was written two years ago when geo-impossible travel detection was not available.
  4. Why? There is no scheduled review cycle for detection rules.
  5. Why? Detection engineering resources are allocated 100% to new rule development with no maintenance budget.

Root cause: No process exists for periodic detection rule review and update.

Action item: Implement quarterly detection rule review cycle. Assign detection engineering 20% maintenance time.

Stop when you reach a systemic cause. If your 5-Whys chain ends at "the analyst did not notice" — you have not gone deep enough. Ask why the analyst did not notice: alert fatigue? Poor dashboard design? Inadequate staffing? The root cause is always a process, tooling, or resource gap.

Post-incident review template showing the five sections: Timeline, What Went Well, Improvement Areas, Root Cause Analysis, and Action Items with owners and deadlines

Documenting Lessons Learned

A PIR without documented action items is a conversation that changes nothing. Every lesson learned must be:

  • Specific — Not "improve detection" but "add geo-impossible travel rule to VPN authentication alerts"
  • Assigned — Every action item has a named owner
  • Deadlined — Every action item has a completion date
  • Tracked — Action items live in a tracking system (Jira, TheHive tasks, or a dedicated PIR tracker), not in a document that gets forgotten

Converting Lessons into Improvements

Lesson CategoryExample FindingConcrete Action
Detection gapVPN login from new country was not detectedWrite Sigma rule for geo-impossible VPN logins; deploy to Wazuh within 2 weeks
Playbook gapNo playbook for compromised VPN credentialsCreate VPN credential compromise playbook; include session kill, password reset, MFA re-enrollment
Tool gapNo way to isolate VPN users remotelyEvaluate Velociraptor network isolation capability; test in staging within 30 days
Process gapEscalation from L1 to L2 took 35 minutesImplement auto-escalation for alerts matching known campaign IOCs; reduce target to <10 min
Communication gapIT team was not notified of containment actionsAdd IT notification step to all containment playbooks

Updating Playbooks and Detection Rules

The most valuable PIR output is immediate improvement to your defensive posture:

Detection Rule Updates

After every true positive incident, ask:

  1. Did an existing rule fire? If yes, was the alert clear enough for triage? Could it fire earlier in the kill chain?
  2. Should a new rule be created? What behavior would have detected this attack at an earlier stage?
  3. Should existing rules be tuned? Did false positives from unrelated rules slow down triage?
title: VPN Login from New Country - Post-PIR Rule
status: test
description: |
  Created after PIR-2026-017. Detects VPN authentication
  from a country not seen in the user's 30-day history.
  Addresses 90-minute detection gap identified in the
  credential harvesting incident.
references:
    - internal:PIR-2026-017
logsource:
    product: vpn
    service: authentication
detection:
    selection:
        action: "login_success"
    filter_known:
        src_country|expand: "%user_country_history_30d%"
    condition: selection and not filter_known
level: high
tags:
    - attack.initial_access
    - attack.t1078.004

Playbook Updates

After every incident where the playbook was insufficient:

  1. Add the missing scenario or decision branch
  2. Update containment actions with the correct tool commands
  3. Add the "lessons learned" reference so future analysts know why the step exists
  4. Test the updated playbook in a tabletop exercise within 30 days

Measuring IR Effectiveness

You cannot improve what you do not measure. Three metrics form the foundation of IR performance tracking:

Mean Time to Detect (MTTD)

Definition: Average time between an attack action and the SOC becoming aware of it.

MTTD = (Time of first detection) - (Time of initial compromise)

Industry benchmarks:

CategoryMTTD
Best-in-class SOC< 1 hour
Average enterprise24-72 hours
Mandiant global median (2024)10 days
Target for CyberBlue analysts< 4 hours

Mean Time to Respond (MTTR)

Definition: Average time between first detection and containment of the threat.

MTTR = (Time of containment) - (Time of first detection)
CategoryMTTR
Best-in-class SOC< 30 minutes
Average enterprise4-24 hours
Target for CyberBlue analysts< 2 hours

Dwell Time

Definition: Total time an attacker has access to the environment before being fully eradicated.

Dwell Time = MTTD + MTTR + Eradication Time

IR metrics dashboard showing MTTD, MTTR, and dwell time trends over 12 months with target threshold lines and incident count overlay

Tracking Metrics Over Time

Track these metrics per incident and plot trends quarterly:

Incident ID  | Type           | MTTD      | MTTR      | Dwell Time
-------------|----------------|-----------|-----------|------------
PIR-2026-012 | Phishing       | 45 min    | 2 hr      | 3.5 hr
PIR-2026-013 | Malware        | 6 hr      | 1.5 hr    | 9 hr
PIR-2026-014 | Brute Force    | 10 min    | 20 min    | 35 min
PIR-2026-015 | Lateral Mvmt   | 3 hr      | 4 hr      | 12 hr
PIR-2026-016 | Data Exfil     | 18 hr     | 6 hr      | 30 hr
PIR-2026-017 | Cred Harvest   | 90 min    | 75 min    | 4 hr

Trends reveal systemic issues. If MTTD is consistently high for phishing incidents, invest in email gateway detection. If MTTR is high for lateral movement, invest in network segmentation and automated containment.

💡

Track improvement after PIR action items are completed. If your Q1 average MTTD for phishing was 2 hours and you deployed a new email detection rule in the Q1 PIR, measure Q2 phishing MTTD to validate the improvement. Metrics without action are vanity; action without metrics is guesswork.

Sharing Threat Intelligence from Incidents

Every incident produces IOCs, TTPs, and behavioral patterns that may be valuable to other teams and organizations. Sharing this intelligence completes the feedback loop.

Internal Sharing

Update your MISP instance with incident-derived IOCs:

IOC TypeExampleMISP Category
IP addressesC2 servers, exfiltration endpointsNetwork activity
DomainsPhishing domains, C2 domainsNetwork activity
File hashesMalware samples, tools droppedPayload delivery
Email addressesPhishing sender addressesSocial network
URLsCredential harvesting pagesNetwork activity
YARA rulesDetection signatures created during investigationPayload delivery

TLP Markings for Sharing

Apply the appropriate Traffic Light Protocol (TLP) marking before sharing outside your organization:

TLP LevelSharing ScopeWhen to Use
TLP:REDNamed recipients onlyIncident involves sensitive business data or ongoing investigation
TLP:AMBEROrganization + need-to-know partnersIOCs from confirmed incidents, not yet public
TLP:GREENCommunity (sector ISAC, trusted groups)Anonymized IOCs and TTPs useful to peers
TLP:CLEARPublicFully anonymized indicators, published advisories

External Sharing via MISP

1. Create a MISP event for the incident
2. Add IOCs with appropriate types and categories
3. Add ATT&CK tags for the techniques observed
4. Set TLP marking based on sensitivity
5. Add a narrative description (sanitized of internal details)
6. Publish to your sharing groups (sector ISAC, partner organizations)
🚨

Sanitize before sharing. Never include internal hostnames, IP ranges, employee names, or business-specific details in externally shared intelligence. Share the IOCs and TTPs, not the story of your organization's vulnerability.

Building Institutional Memory

Individual analysts come and go. Institutional memory — the collective knowledge of past incidents, decisions, and improvements — must persist regardless of staff turnover.

The PIR Knowledge Base

Maintain a searchable archive of all PIRs. When a new incident occurs, analysts should search past PIRs for similar scenarios:

  • "Have we seen this malware family before? What worked last time?"
  • "We had a similar VPN compromise in Q3 — what did the PIR recommend?"
  • "This looks like the same TTP as PIR-2026-013 — check if the action items were implemented"

Connecting PIRs to Detection Coverage

Map every PIR to the ATT&CK techniques involved. Over time, this creates a heat map showing which techniques your organization has encountered and whether you have adequate detection for each:

ATT&CK Technique   | Incidents | Detection Rule Exists | Rule Quality
--------------------|-----------|----------------------|-------------
T1566.001 Phishing  | 5         | Yes                  | High (tuned)
T1078.004 Cloud Acc  | 2         | Yes (post-PIR)       | Medium (new)
T1021.001 RDP       | 3         | Yes                  | High (tuned)
T1053.005 Sched Task | 1         | No                   | — COVERAGE GAP
T1071.001 Web C2    | 4         | Yes                  | High (tuned)

Key Takeaways

  • Post-incident reviews transform every incident — true positive or false positive — into concrete improvements to detection, response, and organizational process
  • Blameless culture is non-negotiable: focus on systems and processes, not individuals, or people will hide information that the team needs to improve
  • The 5-Whys technique drills past symptoms to systemic root causes — keep asking until you reach a process, tooling, or resource gap
  • Every PIR action item must be specific, assigned, deadlined, and tracked — undocumented lessons are forgotten lessons
  • MTTD, MTTR, and dwell time are the three foundational IR metrics; track them per incident and trend quarterly to identify systemic improvement opportunities
  • Share incident-derived intelligence through MISP with appropriate TLP markings — your IOCs may prevent the same attack at a partner organization
  • Build institutional memory by maintaining a searchable PIR archive linked to ATT&CK techniques and detection coverage

What's Next

You have learned how to extract maximum value from every incident through structured post-incident reviews. In Lesson 13.5 — Incident Reporting & Communication, you will learn to write professional incident reports for different audiences — from technical deep-dives for your security team to executive summaries for leadership — and understand regulatory reporting requirements that apply when incidents involve protected data.

Knowledge Check: Post-Incident Review

10 questions · 70% to pass

1

What is the primary purpose of a blameless post-incident review?

2

Using the 5-Whys technique, an analyst determines that a 90-minute detection gap occurred because there is no process for periodic detection rule review. What type of root cause is this?

3

When should a post-incident review be scheduled after incident closure?

4

Which IR metric measures the time between an attack action and the SOC becoming aware of it?

5

What TLP marking should you apply when sharing incident IOCs with your sector ISAC (Information Sharing and Analysis Center)?

6

In Lab 13.4, you conduct a PIR for a simulated incident. Which of the following is the FIRST step in the PIR agenda?

7

An organization's MTTD for phishing incidents averaged 2 hours in Q1. After implementing a new email detection rule from a Q1 PIR, what should they measure in Q2?

8

What must every PIR action item include to be effective?

9

In Lab 13.4, you reconstruct an incident timeline from multiple sources. Why is timeline reconstruction done collaboratively with all responders rather than by a single person?

10

Before sharing incident-derived IOCs externally through MISP, what critical step must you perform?

0/10 answered