Security Incident Response
A security incident is a race against an adversary. The goal of Incident Response (IR) is to minimize the **Blast Radius** and **Time to Containment**. Without a practiced runbook, teams often perform "Eradication" before "Containment," which tips off the attacker and leads to data destruction.
The NIST IR Lifecycle
1. **Preparation:** Hardening the environment and establishing communication channels (e.g., an out-of-band Signal group).
2. **Detection & Analysis:** Identifying the "Indicator of Compromise" (IoC) via SIEM or EDR.
3. **Containment:** Isolating the affected systems. **This is the priority.**
4. **Eradication:** Removing the attacker's persistence (backdoors, web shells).
5. **Recovery:** Restoring services from known-clean backups.
6. **Post-Incident Activity:** The "Blameless Post-Mortem" and root cause analysis.
Containment Strategies
| Type | Action | When to use? |
|---|---|---|
| **Network Isolation** | Modify security groups to block all ingress/egress. | Active C2 (Command & Control) beaconing. |
| **Account Suspension** | Revoke all OAuth tokens and force password resets. | Compromised credentials / Phishing. |
| **System Shutdown** | Hard power-off of virtual machines. | Last resort; destroys volatile memory (RAM). |
| **Process Suspension** | `kill -STOP` the malicious PID. | Preserves memory for forensic analysis. |
Forensic Evidence Collection
Never "clean" a compromised server until you have captured the **volatile evidence**.
1. **Memory Dump:** Capture RAM for rootkits and fileless malware.
2. **Disk Image:** Bit-for-bit copy of the storage.
3. **Network Logs:** VPC Flow Logs or PCAPs showing data exfiltration.
```bash
Example: capturing a memory dump on Linux (using LiME)
insmod lime.ko "path=/mnt/usb/mem_dump.bin format=raw"
```
The "Golden Hour" Checklist
- **Identify the IC:** One person is the Incident Commander; everyone else reports to them.
- **Open a War Room:** Create a dedicated, locked Slack/Teams channel.
- **Snapshot Everything:** Before you touch the system, take a cloud provider snapshot of the VM.
- **Log Everything:** Assign a "Scribe" to record every decision and timestamp. This is critical for legal compliance and post-mortems.
Common Failure: The "Whack-a-Mole" Error
The most common IR failure is killing an attacker's process as soon as you see it. Advanced persistent threats (APTs) often have 3-4 different persistence mechanisms. If you kill one, they will use the others to hide deeper.
**Fix:** Observe and map the attacker's footprint before initiating a coordinated wipe of all entry points simultaneously.
Metrics that Matter
- **MTTD (Mean Time to Detect):** How long was the attacker inside before you noticed?
- **MTTC (Mean Time to Contain):** How long did it take to stop the bleeding once noticed?
- **MTTR (Mean Time to Recover):** Total time until the system was back to a "clean" production state.
Further Reading
- [ThreatModeling](ThreatModeling) — Proactive security design to prevent incidents.
- [SecurityLoggingAndAuditTrails](SecurityLoggingAndAuditTrails) — What you need to have in place *before* the breach.
- [BlamelessPostMortems](BlamelessPostMortems) — Learning from the incident without finger-pointing.