Study Guide: Chapter 9 — Incident Response & Remediation

Pre-Quiz — Sections 1 & 2: IR Framework & Recovery Procedures

1. What are the four phases of the NIST incident response lifecycle?

Detection, Containment, Recovery, Closure

Preparation; Detection & Analysis; Containment, Eradication & Recovery; Post-Incident Activity

Planning, Execution, Verification, Documentation

Assessment, Response, Remediation, Monitoring

2. During an incident, why can compromised systems' communication tools not be trusted?

They are typically too slow for emergency communications

Attackers may still have access and can observe responder communications

They lack encryption capabilities by default

Legal regulations prohibit their use during investigations

3. What is a “fully hydrated snapshot” in Cohesity’s SnapTree technology?

A snapshot compressed using deduplication to minimize storage

A snapshot that records only incremental changes since the last backup

A complete, self-contained representation of data at a point in time, requiring no incremental reconstruction

A snapshot that has been replicated to a cloud vault

4. What is the correct order of steps in the instant mass restore workflow?

Present storage, identify clean data, power on VMs, cleanup, migrate

Identify clean data, present NFS datastore to ESX, instantiate VMs, storage vMotion, cleanup

Instantiate VMs, scan for threats, migrate storage, validate, cleanup

Scan backups, clone VMs, present to ESX, run vMotion, verify

5. What is the primary purpose of a live mount operation?

To permanently restore a VM to production storage

To boot a VM directly from a backup snapshot without copying data to production storage first

To create an incremental backup of a running VM

To replicate a VM to the FortKnox vault

Section 1: Incident Response Framework

When a cyberattack strikes, the difference between a contained incident and a full-blown catastrophe often comes down to preparation. Cohesity integrates into every phase of the NIST incident response lifecycle, providing both detection and recovery capabilities from a unified platform.

NIST Incident Response Lifecycle

The NIST framework defines four phases that serve as the industry-standard approach to incident response:

Phase	Description	Cohesity's Role
1. Preparation	Establish policies, tools, and teams	Configure backup policies, FortKnox vaulting, anomaly detection, IR simulations
2. Detection & Analysis	Identify and assess incident scope	ML anomaly detection via Helios, CyberScan vulnerability indexing, curated IoC feeds mapped to MITRE ATT&CK
3. Containment, Eradication & Recovery	Stop spread, remove threats, restore	Clean room provisioning, instant mass restore, point-in-time recovery from immutable snapshots
4. Post-Incident Activity	Learn and improve defenses	Audit log analysis, backup policy adjustments, lessons-learned documentation

flowchart LR A["1. Preparation\n- Backup policies\n- FortKnox vaulting\n- Anomaly detection\n- IR simulations"] --> B["2. Detection &\nAnalysis\n- ML anomaly detection\n- CyberScan indexing\n- IoC feeds\n- MITRE ATT&CK mapping"] B --> C["3. Containment,\nEradication &\nRecovery\n- Clean room provisioning\n- Instant mass restore\n- Point-in-time recovery"] C --> D["4. Post-Incident\nActivity\n- Audit log analysis\n- Policy adjustments\n- Lessons learned"] D -->|"Continuous\nImprovement"| A style A fill:#2d6a4f,color:#fff style B fill:#e76f51,color:#fff style C fill:#264653,color:#fff style D fill:#6a4c93,color:#fff

Roles and Responsibilities

Effective response requires clearly defined roles bridging security operations and IT operations on a single platform:

Role	Responsibility	Cohesity Interaction
Incident Commander	Coordinates overall response	Reviews Helios dashboards, approves recovery plans
Security Analyst	Investigates threats, performs forensics	Uses clean room, reviews IoC feeds, applies YARA rules
Backup Administrator	Manages recovery operations	Executes instant mass restore, manages FortKnox vault
Network Engineer	Implements isolation controls	Configures VLANs for clean room environments
Executive Sponsor	Approves business decisions	Receives briefings, approves quorum-based operations

Communication, Escalation, and IR Integration

During an incident, compromised systems' communication tools cannot be trusted. Organizations need secured backup copies containing collaboration and authentication tools — including Active Directory — stored in the cyber vault so responders can establish trustworthy communication channels.

Cohesity integrates into IR plans through three mechanisms:

CERT Partnerships — Access to expert responders who understand the Cohesity platform
Third-Party Tool Interoperability — Seamless integration with existing security toolchains
Managed Recovery Services — Partnership with 11:11 Systems for fully isolated clean room environments

Cohesity also provides a Cyber Incident Response Simulator — a gamified training tool for practicing response procedures in realistic scenarios before a real attack occurs.

Section 2: Recovery Procedures

Point-in-Time Recovery and Snapshot Selection

Point-in-time recovery allows administrators to roll back data to a specific moment before a disruption. Cohesity's patented SnapTree technology uses a B+ tree metadata structure to create fully hydrated snapshots — each snapshot is a complete, self-contained representation of the data at that point in time rather than a chain of incremental changes.

Selecting the right recovery point is critical during ransomware incidents. Two key tools assist this decision:

CyberScan — Displays each snapshot's vulnerability index and actionable recommendations to identify clean restore points
Helios Anomaly Detection — Uses ML to monitor change rates, ingest patterns, and entropy levels, alerting on deviations from normal baselines

Instant Mass Restore

Instant mass restore recovers hundreds of files, objects, VMs, and databases simultaneously, reducing RTO to minutes. The five-step workflow:

flowchart TD S1["Step 1: Identify Clean Backup Data\n(CyberScan vulnerability index)"] S2["Step 2: Present NFS Datastore to ESX\n(QoS policies applied)"] S3["Step 3: Instantiate & Power On VMs\n(Fully hydrated snapshots)"] S4["Step 4: Storage vMotion Migration\n(Automated, non-disruptive)"] S5["Step 5: Cleanup Temporary Datastore"] S1 --> S2 --> S3 --> S4 --> S5 style S1 fill:#e76f51,color:#fff style S2 fill:#264653,color:#fff style S3 fill:#2a9d8f,color:#fff style S4 fill:#e9c46a,color:#000 style S5 fill:#6a4c93,color:#fff

Animation Slot: Step-by-step instant mass restore workflow showing VMs coming online from Cohesity cluster NFS datastore, then migrating via storage vMotion to production storage

Worked Example: Ransomware Recovery of 50 VMs

An organization detects ransomware at 2:00 AM affecting 50 production VMs. The backup administrator opens CyberScan and identifies that yesterday's 10:00 PM snapshot has a clean vulnerability index. Within minutes, the NFS datastore is presented to ESX and all 50 VMs are instantiated from the clean snapshot. Applications come online immediately while storage vMotion runs in the background. By 8:00 AM, the environment is fully restored.

Two additional technologies underpin this performance:

MegaFile Technology — Distributes large files across cluster nodes for parallel operations, achieving 3x performance improvements
SpanFS Distributed File System — Immediately surfaces data via NFS/SMB during background recovery

Granular Recovery and Live Mount

For targeted attacks or accidental deletions, Cohesity supports granular recovery through Enterprise Search — a simple file name search to select the most recent clean copy and restore it.

Live mount boots a VM directly from a backup snapshot without copying data to production storage. Use cases include: rapid backup integrity validation, forensic analysis on snapshots without affecting production, and providing temporary application access during recovery.

Pre-Quiz — Sections 3 & 4: Clean Room Recovery & Post-Incident Remediation

6. What is a Minimum Viable Recovery Environment (MVRE)?

A full-scale replica of the production environment for disaster recovery

An environment sized based on IR team guidance with just enough infrastructure for investigation and initial recovery

The smallest possible Cohesity cluster configuration

A temporary VM running forensic tools on the production network

7. What are the four stages of FortKnox cyber vaulting?

Backup, Encrypt, Store, Verify

Replicate, Scan, Lock, Recover

Ingest, Deduplicate, Compress, Archive

Connect, Transfer, Validate, Disconnect

8. Which Cohesity feature can automatically provision a clean room without human intervention when an anomaly is detected?

CyberScan

Helios Dashboard

Recovery Agent

DataLock WORM

9. Why is backup retention data especially valuable for root cause analysis compared to security logs?

Backup data is encrypted and therefore more trustworthy

Backup retention typically extends far beyond security log retention, revealing attack activity from weeks or months before detection

Backup data contains network traffic captures that security logs lack

Security logs are always deleted by attackers, while backups are never targeted

10. Which of the following is NOT a common post-incident backup policy adjustment?

Increased snapshot frequency for critical systems

Extended retention periods for regulatory compliance

Reducing FortKnox vault copies to save costs

Adding new YARA rules based on discovered threat indicators

Section 3: Clean Room and Isolated Recovery

Isolated Recovery Environment (IRE) Concepts

An isolated recovery environment (IRE) is a trusted, segregated infrastructure where security teams examine digital evidence without risk of contamination or detection by adversaries. Cohesity implements this through its clean room architecture, creating a Minimum Viable Recovery Environment (MVRE) — sized based on IR team guidance rather than matching production scale.

Component	Purpose	Implementation
Hardware & Sizing	Compute and storage for recovery	Sized per IR team guidance, not production scale
Network Isolation	Prevent reinfection and attacker observation	VLANs, separate firewalls, or physical cable disconnection
Forensic Tools	Enable threat investigation	Pre-staged in the vault as secured backup copies
Gold Images	Trusted OS and app baselines	Critical system images stored in cyber vault
Authentication	Establish trusted identity services	Active Directory backup restored in isolation
Bare Metal Restoration	Rebuild infrastructure from scratch	Capabilities stored with vaulted resources

Cohesity FortKnox

Cohesity FortKnox is a SaaS-based cyber vaulting and recovery solution providing an immutable copy of data through a virtual air gap — logical and physical isolation that prevents ransomware from reaching vaulted data even when production systems are fully compromised.

flowchart LR R["Replicate\nSecure copy to\nvaulted environment"] --> S["Scan\nAnomaly detection &\nthreat scanning"] S --> L["Lock\nImmutable snapshots\nprevent modification"] L --> RC["Recover\nRestore with confidence\nin minutes"] subgraph VirtualAirGap["Virtual Air Gap"] R S L RC end P["Production\nEnvironment"] -.->|"Isolated\nReplication"| R style P fill:#e76f51,color:#fff style R fill:#264653,color:#fff style S fill:#2a9d8f,color:#fff style L fill:#e9c46a,color:#000 style RC fill:#2d6a4f,color:#fff

FortKnox is available as a fully managed SaaS on AWS, Azure, and GCP, as well as on-premises/self-managed. Security controls include:

Multi-Factor Authentication (MFA) — Only verified users access the vault
Quorum-Based Access — Multiple authorized personnel must approve critical operations
Role-Based Access Controls (RBAC) — Restricts vault interaction to authorized personnel
DataLock / WORM — Write Once Read Many immutability that even security officers cannot modify or delete

The underlying SpanFS immutable file system maintains backup jobs in time-based snapshots that cannot be accessed externally or modified by ransomware. FortKnox modernizes the classic 3-2-1 backup strategy by serving as the critical offsite, isolated copy.

Clean Room Recovery Procedures

Cohesity implements a three-phase response process within the clean room:

flowchart TD DET["Anomaly Detected\n(Recovery Agent)"] --> P1 subgraph P1["Phase 1: Isolate the Threats"] ISO1["Spin up isolated\nclean room"] --> ISO2["Move suspect asset\ninto clean room"] end P1 --> P2 subgraph P2["Phase 2: Secure Forensic Investigation"] F1["AI/ML threat hunting"] --> F2["IoC feed analysis\n& YARA rules"] F2 --> F3["Timeline analysis\nacross snapshots"] end P2 --> P3 subgraph P3["Phase 3: Structured Recovery"] R1["Eliminate threats"] --> R2["Implement enhanced\ncontrols"] R2 --> R3["Restore validated data\nto production"] end style DET fill:#e76f51,color:#fff style P1 fill:#264653,color:#fff style P2 fill:#2a9d8f,color:#fff style P3 fill:#2d6a4f,color:#fff

Animation Slot: Clean room provisioning sequence — anomaly detected, isolated environment spins up, suspect VM live-mounted, forensic tools loaded, investigation timeline displayed

Worked Example: Automated Clean Room Provisioning

At 3:15 AM, Cohesity's anomaly detection identifies unusual entropy patterns in a backup job for a critical database server. The Recovery Agent automatically provisions a clean room, instantiates the suspect VM via live mount, and alerts the on-call analyst. By login time, the isolated environment is ready with forensic tools, the suspect system mounted, and a 72-hour timeline view. The analyst confirms ransomware infection, identifies the initial compromise at 11:42 PM, and the team restores from the 11:00 PM snapshot — all without the attacker knowing the investigation was underway.

Validating Backup Integrity

Before data returns to production, multiple validation mechanisms apply:

CyberScan vulnerability indexing — Per-snapshot cleanliness assessment
Integrated anomaly detection — Verifies data cleanliness during FortKnox "Scan" phase
Live mount testing — Boot recovered VM in isolation to verify application functionality
Custom YARA rule scanning — Targeted search for specific threat indicators from the investigation

Section 4: Post-Incident Remediation

Root Cause Analysis

Cohesity provides several data sources for root cause analysis:

Timeline Analysis — Compare filesystem states across multiple snapshots to trace attack progression; backup retention extends far beyond security log retention
Anomaly Detection History — Helios records historical change rates, ingest patterns, and entropy levels; retrospective review may reveal early indicators
CyberScan Vulnerability Reports — Snapshot-level assessments reveal when vulnerabilities were introduced
Audit Logs — Document all administrative actions, access attempts, and configuration changes

Re-Securing the Environment

After recovery, the environment must be hardened against the specific attack and similar patterns:

Credential Rotation — Rotate all exposed credentials, including backup service accounts
Access Control Review — Tighten RBAC, enable quorum-based access for critical operations, verify MFA enforcement
Network Segmentation Validation — Remediate VLAN and firewall gaps identified during investigation
Patch Management — Address all CyberScan-identified vulnerabilities before returning to production
Enhanced Monitoring — Update anomaly detection baselines and thresholds

Lessons Learned and IR Plan Updates

Review Area	Questions to Address
Detection Effectiveness	How long was the attacker present? Could anomaly thresholds be tuned?
Response Time	How quickly was the clean room provisioned? Was Recovery Agent behavior appropriate?
Recovery Completeness	Were all affected systems identified? Were clean restore points missed?
Communication	Did escalation work? Were backup communication tools accessible?
Tool Readiness	Were forensic tools and gold images current in the vault? Were YARA rules up to date?

Backup Policy Adjustments

Incidents frequently reveal that backup policies need adjustment. Common post-incident changes include:

Increased snapshot frequency for critical systems (reducing RPO)
Extended retention periods for regulatory and legal requirements
Additional FortKnox vault copies for systems previously protected only by local snapshots
Updated DataLock/WORM policies extending immutability windows based on observed attack dwell time
New YARA rules reflecting threat indicators discovered during investigation
Compliance alignment with NIST, ISO 27040, and DORA frameworks

flowchart TD INC["Incident Resolved"] --> RCA["Root Cause Analysis\n- Timeline analysis\n- Anomaly detection history\n- CyberScan reports\n- Audit logs"] RCA --> SEC["Re-Secure Environment\n- Credential rotation\n- Access control review\n- Network segmentation\n- Patch management"] SEC --> LL["Lessons Learned\n- Detection effectiveness\n- Response time review\n- Recovery completeness"] LL --> POL["Backup Policy\nAdjustments"] POL --> PREP["Updated Preparation\n(NIST Phase 1)"] PREP -.->|"Next Incident\nCycle"| INC style INC fill:#e76f51,color:#fff style RCA fill:#264653,color:#fff style SEC fill:#2a9d8f,color:#fff style LL fill:#e9c46a,color:#000 style POL fill:#6a4c93,color:#fff style PREP fill:#2d6a4f,color:#fff

Post-Quiz — Sections 1 & 2: IR Framework & Recovery Procedures