Security Incident Response

A security incident is a race against an adversary. The goal of Incident Response (IR) is to minimize the Blast Radius and Time to Containment. Without a practiced runbook, teams often perform "Eradication" before "Containment," which tips off the attacker and leads to data destruction.

The NIST IR Lifecycle

Preparation: Hardening the environment and establishing communication channels (e.g., an out-of-band Signal group).
Detection & Analysis: Identifying the "Indicator of Compromise" (IoC) via SIEM or EDR.
Containment: Isolating the affected systems. This is the priority.
Eradication: Removing the attacker's persistence (backdoors, web shells).
Recovery: Restoring services from known-clean backups.
Post-Incident Activity: The "Blameless Post-Mortem" and root cause analysis.

Containment Strategies

Type	Action	When to use?
Network Isolation	Modify security groups to block all ingress/egress.	Active C2 (Command & Control) beaconing.
Account Suspension	Revoke all OAuth tokens and force password resets.	Compromised credentials / Phishing.
System Shutdown	Hard power-off of virtual machines.	Last resort; destroys volatile memory (RAM).
Process Suspension	`kill -STOP` the malicious PID.	Preserves memory for forensic analysis.

Forensic Evidence Collection

Never "clean" a compromised server until you have captured the volatile evidence.

Memory Dump: Capture RAM for rootkits and fileless malware.
Disk Image: Bit-for-bit copy of the storage.
Network Logs: VPC Flow Logs or PCAPs showing data exfiltration.

# Example: capturing a memory dump on Linux (using LiME)
insmod lime.ko "path=/mnt/usb/mem_dump.bin format=raw"

The "Golden Hour" Checklist

Identify the IC: One person is the Incident Commander; everyone else reports to them.
Open a War Room: Create a dedicated, locked Slack/Teams channel.
Snapshot Everything: Before you touch the system, take a cloud provider snapshot of the VM.
Log Everything: Assign a "Scribe" to record every decision and timestamp. This is critical for legal compliance and post-mortems.

Common Failure: The "Whack-a-Mole" Error

The most common IR failure is killing an attacker's process as soon as you see it. Advanced persistent threats (APTs) often have 3-4 different persistence mechanisms. If you kill one, they will use the others to hide deeper. Fix: Observe and map the attacker's footprint before initiating a coordinated wipe of all entry points simultaneously.

Metrics that Matter

MTTD (Mean Time to Detect): How long was the attacker inside before you noticed?
MTTC (Mean Time to Contain): How long did it take to stop the bleeding once noticed?
MTTR (Mean Time to Recover): Total time until the system was back to a "clean" production state.