Chapter 9: Incident Response & Remediation

Learning Objectives

Pre-Quiz — Sections 1 & 2: IR Framework & Recovery Procedures

1. What are the four phases of the NIST incident response lifecycle?

Detection, Containment, Recovery, Closure
Preparation; Detection & Analysis; Containment, Eradication & Recovery; Post-Incident Activity
Planning, Execution, Verification, Documentation
Assessment, Response, Remediation, Monitoring

2. During an incident, why can compromised systems' communication tools not be trusted?

They are typically too slow for emergency communications
Attackers may still have access and can observe responder communications
They lack encryption capabilities by default
Legal regulations prohibit their use during investigations

3. What is a “fully hydrated snapshot” in Cohesity’s SnapTree technology?

A snapshot compressed using deduplication to minimize storage
A snapshot that records only incremental changes since the last backup
A complete, self-contained representation of data at a point in time, requiring no incremental reconstruction
A snapshot that has been replicated to a cloud vault

4. What is the correct order of steps in the instant mass restore workflow?

Present storage, identify clean data, power on VMs, cleanup, migrate
Identify clean data, present NFS datastore to ESX, instantiate VMs, storage vMotion, cleanup
Instantiate VMs, scan for threats, migrate storage, validate, cleanup
Scan backups, clone VMs, present to ESX, run vMotion, verify

5. What is the primary purpose of a live mount operation?

To permanently restore a VM to production storage
To boot a VM directly from a backup snapshot without copying data to production storage first
To create an incremental backup of a running VM
To replicate a VM to the FortKnox vault

Section 1: Incident Response Framework

When a cyberattack strikes, the difference between a contained incident and a full-blown catastrophe often comes down to preparation. Cohesity integrates into every phase of the NIST incident response lifecycle, providing both detection and recovery capabilities from a unified platform.

NIST Incident Response Lifecycle

The NIST framework defines four phases that serve as the industry-standard approach to incident response:

PhaseDescriptionCohesity's Role
1. PreparationEstablish policies, tools, and teamsConfigure backup policies, FortKnox vaulting, anomaly detection, IR simulations
2. Detection & AnalysisIdentify and assess incident scopeML anomaly detection via Helios, CyberScan vulnerability indexing, curated IoC feeds mapped to MITRE ATT&CK
3. Containment, Eradication & RecoveryStop spread, remove threats, restoreClean room provisioning, instant mass restore, point-in-time recovery from immutable snapshots
4. Post-Incident ActivityLearn and improve defensesAudit log analysis, backup policy adjustments, lessons-learned documentation
flowchart LR A["1. Preparation\n- Backup policies\n- FortKnox vaulting\n- Anomaly detection\n- IR simulations"] --> B["2. Detection &\nAnalysis\n- ML anomaly detection\n- CyberScan indexing\n- IoC feeds\n- MITRE ATT&CK mapping"] B --> C["3. Containment,\nEradication &\nRecovery\n- Clean room provisioning\n- Instant mass restore\n- Point-in-time recovery"] C --> D["4. Post-Incident\nActivity\n- Audit log analysis\n- Policy adjustments\n- Lessons learned"] D -->|"Continuous\nImprovement"| A style A fill:#2d6a4f,color:#fff style B fill:#e76f51,color:#fff style C fill:#264653,color:#fff style D fill:#6a4c93,color:#fff

Roles and Responsibilities

Effective response requires clearly defined roles bridging security operations and IT operations on a single platform:

RoleResponsibilityCohesity Interaction
Incident CommanderCoordinates overall responseReviews Helios dashboards, approves recovery plans
Security AnalystInvestigates threats, performs forensicsUses clean room, reviews IoC feeds, applies YARA rules
Backup AdministratorManages recovery operationsExecutes instant mass restore, manages FortKnox vault
Network EngineerImplements isolation controlsConfigures VLANs for clean room environments
Executive SponsorApproves business decisionsReceives briefings, approves quorum-based operations

Communication, Escalation, and IR Integration

During an incident, compromised systems' communication tools cannot be trusted. Organizations need secured backup copies containing collaboration and authentication tools — including Active Directory — stored in the cyber vault so responders can establish trustworthy communication channels.

Cohesity integrates into IR plans through three mechanisms:

Cohesity also provides a Cyber Incident Response Simulator — a gamified training tool for practicing response procedures in realistic scenarios before a real attack occurs.

Key Points — Incident Response Framework

Section 2: Recovery Procedures

Point-in-Time Recovery and Snapshot Selection

Point-in-time recovery allows administrators to roll back data to a specific moment before a disruption. Cohesity's patented SnapTree technology uses a B+ tree metadata structure to create fully hydrated snapshots — each snapshot is a complete, self-contained representation of the data at that point in time rather than a chain of incremental changes.

Selecting the right recovery point is critical during ransomware incidents. Two key tools assist this decision:

Instant Mass Restore

Instant mass restore recovers hundreds of files, objects, VMs, and databases simultaneously, reducing RTO to minutes. The five-step workflow:

flowchart TD S1["Step 1: Identify Clean Backup Data\n(CyberScan vulnerability index)"] S2["Step 2: Present NFS Datastore to ESX\n(QoS policies applied)"] S3["Step 3: Instantiate & Power On VMs\n(Fully hydrated snapshots)"] S4["Step 4: Storage vMotion Migration\n(Automated, non-disruptive)"] S5["Step 5: Cleanup Temporary Datastore"] S1 --> S2 --> S3 --> S4 --> S5 style S1 fill:#e76f51,color:#fff style S2 fill:#264653,color:#fff style S3 fill:#2a9d8f,color:#fff style S4 fill:#e9c46a,color:#000 style S5 fill:#6a4c93,color:#fff
Animation Slot: Step-by-step instant mass restore workflow showing VMs coming online from Cohesity cluster NFS datastore, then migrating via storage vMotion to production storage

Worked Example: Ransomware Recovery of 50 VMs

An organization detects ransomware at 2:00 AM affecting 50 production VMs. The backup administrator opens CyberScan and identifies that yesterday's 10:00 PM snapshot has a clean vulnerability index. Within minutes, the NFS datastore is presented to ESX and all 50 VMs are instantiated from the clean snapshot. Applications come online immediately while storage vMotion runs in the background. By 8:00 AM, the environment is fully restored.

Two additional technologies underpin this performance:

Granular Recovery and Live Mount

For targeted attacks or accidental deletions, Cohesity supports granular recovery through Enterprise Search — a simple file name search to select the most recent clean copy and restore it.

Live mount boots a VM directly from a backup snapshot without copying data to production storage. Use cases include: rapid backup integrity validation, forensic analysis on snapshots without affecting production, and providing temporary application access during recovery.

Key Points — Recovery Procedures

Pre-Quiz — Sections 3 & 4: Clean Room Recovery & Post-Incident Remediation

6. What is a Minimum Viable Recovery Environment (MVRE)?

A full-scale replica of the production environment for disaster recovery
An environment sized based on IR team guidance with just enough infrastructure for investigation and initial recovery
The smallest possible Cohesity cluster configuration
A temporary VM running forensic tools on the production network

7. What are the four stages of FortKnox cyber vaulting?

Backup, Encrypt, Store, Verify
Replicate, Scan, Lock, Recover
Ingest, Deduplicate, Compress, Archive
Connect, Transfer, Validate, Disconnect

8. Which Cohesity feature can automatically provision a clean room without human intervention when an anomaly is detected?

CyberScan
Helios Dashboard
Recovery Agent
DataLock WORM

9. Why is backup retention data especially valuable for root cause analysis compared to security logs?

Backup data is encrypted and therefore more trustworthy
Backup retention typically extends far beyond security log retention, revealing attack activity from weeks or months before detection
Backup data contains network traffic captures that security logs lack
Security logs are always deleted by attackers, while backups are never targeted

10. Which of the following is NOT a common post-incident backup policy adjustment?

Increased snapshot frequency for critical systems
Extended retention periods for regulatory compliance
Reducing FortKnox vault copies to save costs
Adding new YARA rules based on discovered threat indicators

Section 3: Clean Room and Isolated Recovery

Isolated Recovery Environment (IRE) Concepts

An isolated recovery environment (IRE) is a trusted, segregated infrastructure where security teams examine digital evidence without risk of contamination or detection by adversaries. Cohesity implements this through its clean room architecture, creating a Minimum Viable Recovery Environment (MVRE) — sized based on IR team guidance rather than matching production scale.

ComponentPurposeImplementation
Hardware & SizingCompute and storage for recoverySized per IR team guidance, not production scale
Network IsolationPrevent reinfection and attacker observationVLANs, separate firewalls, or physical cable disconnection
Forensic ToolsEnable threat investigationPre-staged in the vault as secured backup copies
Gold ImagesTrusted OS and app baselinesCritical system images stored in cyber vault
AuthenticationEstablish trusted identity servicesActive Directory backup restored in isolation
Bare Metal RestorationRebuild infrastructure from scratchCapabilities stored with vaulted resources

Cohesity FortKnox

Cohesity FortKnox is a SaaS-based cyber vaulting and recovery solution providing an immutable copy of data through a virtual air gap — logical and physical isolation that prevents ransomware from reaching vaulted data even when production systems are fully compromised.

flowchart LR R["Replicate\nSecure copy to\nvaulted environment"] --> S["Scan\nAnomaly detection &\nthreat scanning"] S --> L["Lock\nImmutable snapshots\nprevent modification"] L --> RC["Recover\nRestore with confidence\nin minutes"] subgraph VirtualAirGap["Virtual Air Gap"] R S L RC end P["Production\nEnvironment"] -.->|"Isolated\nReplication"| R style P fill:#e76f51,color:#fff style R fill:#264653,color:#fff style S fill:#2a9d8f,color:#fff style L fill:#e9c46a,color:#000 style RC fill:#2d6a4f,color:#fff

FortKnox is available as a fully managed SaaS on AWS, Azure, and GCP, as well as on-premises/self-managed. Security controls include:

The underlying SpanFS immutable file system maintains backup jobs in time-based snapshots that cannot be accessed externally or modified by ransomware. FortKnox modernizes the classic 3-2-1 backup strategy by serving as the critical offsite, isolated copy.

Clean Room Recovery Procedures

Cohesity implements a three-phase response process within the clean room:

flowchart TD DET["Anomaly Detected\n(Recovery Agent)"] --> P1 subgraph P1["Phase 1: Isolate the Threats"] ISO1["Spin up isolated\nclean room"] --> ISO2["Move suspect asset\ninto clean room"] end P1 --> P2 subgraph P2["Phase 2: Secure Forensic Investigation"] F1["AI/ML threat hunting"] --> F2["IoC feed analysis\n& YARA rules"] F2 --> F3["Timeline analysis\nacross snapshots"] end P2 --> P3 subgraph P3["Phase 3: Structured Recovery"] R1["Eliminate threats"] --> R2["Implement enhanced\ncontrols"] R2 --> R3["Restore validated data\nto production"] end style DET fill:#e76f51,color:#fff style P1 fill:#264653,color:#fff style P2 fill:#2a9d8f,color:#fff style P3 fill:#2d6a4f,color:#fff
Animation Slot: Clean room provisioning sequence — anomaly detected, isolated environment spins up, suspect VM live-mounted, forensic tools loaded, investigation timeline displayed

Worked Example: Automated Clean Room Provisioning

At 3:15 AM, Cohesity's anomaly detection identifies unusual entropy patterns in a backup job for a critical database server. The Recovery Agent automatically provisions a clean room, instantiates the suspect VM via live mount, and alerts the on-call analyst. By login time, the isolated environment is ready with forensic tools, the suspect system mounted, and a 72-hour timeline view. The analyst confirms ransomware infection, identifies the initial compromise at 11:42 PM, and the team restores from the 11:00 PM snapshot — all without the attacker knowing the investigation was underway.

Validating Backup Integrity

Before data returns to production, multiple validation mechanisms apply:

Key Points — Clean Room and Isolated Recovery

Section 4: Post-Incident Remediation

Root Cause Analysis

Cohesity provides several data sources for root cause analysis:

Re-Securing the Environment

After recovery, the environment must be hardened against the specific attack and similar patterns:

  1. Credential Rotation — Rotate all exposed credentials, including backup service accounts
  2. Access Control Review — Tighten RBAC, enable quorum-based access for critical operations, verify MFA enforcement
  3. Network Segmentation Validation — Remediate VLAN and firewall gaps identified during investigation
  4. Patch Management — Address all CyberScan-identified vulnerabilities before returning to production
  5. Enhanced Monitoring — Update anomaly detection baselines and thresholds

Lessons Learned and IR Plan Updates

Review AreaQuestions to Address
Detection EffectivenessHow long was the attacker present? Could anomaly thresholds be tuned?
Response TimeHow quickly was the clean room provisioned? Was Recovery Agent behavior appropriate?
Recovery CompletenessWere all affected systems identified? Were clean restore points missed?
CommunicationDid escalation work? Were backup communication tools accessible?
Tool ReadinessWere forensic tools and gold images current in the vault? Were YARA rules up to date?

Backup Policy Adjustments

Incidents frequently reveal that backup policies need adjustment. Common post-incident changes include:

flowchart TD INC["Incident Resolved"] --> RCA["Root Cause Analysis\n- Timeline analysis\n- Anomaly detection history\n- CyberScan reports\n- Audit logs"] RCA --> SEC["Re-Secure Environment\n- Credential rotation\n- Access control review\n- Network segmentation\n- Patch management"] SEC --> LL["Lessons Learned\n- Detection effectiveness\n- Response time review\n- Recovery completeness"] LL --> POL["Backup Policy\nAdjustments"] POL --> PREP["Updated Preparation\n(NIST Phase 1)"] PREP -.->|"Next Incident\nCycle"| INC style INC fill:#e76f51,color:#fff style RCA fill:#264653,color:#fff style SEC fill:#2a9d8f,color:#fff style LL fill:#e9c46a,color:#000 style POL fill:#6a4c93,color:#fff style PREP fill:#2d6a4f,color:#fff

Key Points — Post-Incident Remediation

Post-Quiz — Sections 1 & 2: IR Framework & Recovery Procedures

1. What are the four phases of the NIST incident response lifecycle?

Detection, Containment, Recovery, Closure
Preparation; Detection & Analysis; Containment, Eradication & Recovery; Post-Incident Activity
Planning, Execution, Verification, Documentation
Assessment, Response, Remediation, Monitoring

2. During an incident, why can compromised systems' communication tools not be trusted?

They are typically too slow for emergency communications
Attackers may still have access and can observe responder communications
They lack encryption capabilities by default
Legal regulations prohibit their use during investigations

3. What is a “fully hydrated snapshot” in Cohesity’s SnapTree technology?

A snapshot compressed using deduplication to minimize storage
A snapshot that records only incremental changes since the last backup
A complete, self-contained representation of data at a point in time, requiring no incremental reconstruction
A snapshot that has been replicated to a cloud vault

4. What is the correct order of steps in the instant mass restore workflow?

Present storage, identify clean data, power on VMs, cleanup, migrate
Identify clean data, present NFS datastore to ESX, instantiate VMs, storage vMotion, cleanup
Instantiate VMs, scan for threats, migrate storage, validate, cleanup
Scan backups, clone VMs, present to ESX, run vMotion, verify

5. What is the primary purpose of a live mount operation?

To permanently restore a VM to production storage
To boot a VM directly from a backup snapshot without copying data to production storage first
To create an incremental backup of a running VM
To replicate a VM to the FortKnox vault
Post-Quiz — Sections 3 & 4: Clean Room Recovery & Post-Incident Remediation

6. What is a Minimum Viable Recovery Environment (MVRE)?

A full-scale replica of the production environment for disaster recovery
An environment sized based on IR team guidance with just enough infrastructure for investigation and initial recovery
The smallest possible Cohesity cluster configuration
A temporary VM running forensic tools on the production network

7. What are the four stages of FortKnox cyber vaulting?

Backup, Encrypt, Store, Verify
Replicate, Scan, Lock, Recover
Ingest, Deduplicate, Compress, Archive
Connect, Transfer, Validate, Disconnect

8. Which Cohesity feature can automatically provision a clean room without human intervention when an anomaly is detected?

CyberScan
Helios Dashboard
Recovery Agent
DataLock WORM

9. Why is backup retention data especially valuable for root cause analysis compared to security logs?

Backup data is encrypted and therefore more trustworthy
Backup retention typically extends far beyond security log retention, revealing attack activity from weeks or months before detection
Backup data contains network traffic captures that security logs lack
Security logs are always deleted by attackers, while backups are never targeted

10. Which of the following is NOT a common post-incident backup policy adjustment?

Increased snapshot frequency for critical systems
Extended retention periods for regulatory compliance
Reducing FortKnox vault copies to save costs
Adding new YARA rules based on discovered threat indicators

Your Progress

Answer Explanations