Chapter 7: Digital Forensics and Malware Analysis

Learning Objectives

Section 1: Evidence Types and Classification

Digital evidence is any binary data — stored on or transmitted by a computing device — that can be used in an investigation. Its sources span hard drive images, packet captures, audit logs, deleted file fragments, and memory dumps. Evidence is not equally useful: courts apply a classification hierarchy that determines how much weight each piece carries and what documentation is required.

The Three Core Evidence Classes

Best Evidence

Best evidence is the original, unaltered record. In digital forensics, best evidence includes:

A SHA-256 hash taken at acquisition time and re-verified at every subsequent step is the mechanism by which analysts prove a file has not changed.

Corroborative Evidence

Corroborative evidence supports and reinforces primary evidence. It does not prove the main fact on its own, but it makes the primary evidence more credible. Examples:

Indirect (Circumstantial) Evidence

Indirect evidence does not directly prove a fact — instead, it allows a reasonable inference about what happened. Examples:

Evidence TypeDefinitionDigital ExampleStrength
Best EvidenceOriginal, unaltered recordSHA-256-verified disk imageHighest — direct proof
CorroborativeSupports and reinforces primary evidenceFirewall log matching SIEM alertMedium — strengthens primary
Indirect/CircumstantialAllows inference about a factDeleted temp files suggesting executionVariable — builds cumulatively

Chain of Custody

Chain of custody is the documented, unbroken record of who collected evidence and when, how it was transported and stored, who accessed it and for what purpose, and what transformations were performed. A practical form records the SHA-256 hash of every artifact at the moment of collection. Before analysis, the hash is re-verified. If hashes do not match, the evidence may be inadmissible.

Rules of Evidence

For digital evidence to be accepted in legal proceedings, it must satisfy four properties:

  1. Authentic — it is what it claims to be (proven via hash verification and documentation)
  2. Complete — it tells the whole story, not a cherry-picked fragment
  3. Reliable — collected using sound, repeatable methods
  4. Believable — can be explained to a judge or jury in plain terms

FIGURE 7.1 — Evidence Classification and Chain of Custody Flow

SECURITY INCIDENT Evidence Collection Begins BEST EVIDENCE Original + SHA-256 hash CORROBORATIVE Supports primary facts INDIRECT Circumstantial inference CHAIN OF CUSTODY Document every handoff — legal admissibility

Key Points — Evidence Types and Classification

flowchart TD E[Digital Evidence] --> CF[Computer Forensics\nDisk images, file systems] E --> NF[Network Forensics\nPacket captures, flow data] E --> MF[Mobile Device Forensics\nPhone storage, app data] E --> MEM[Memory Forensics\nVolatile RAM — processes] E --> MM[Multimedia Forensics\nImage / video / audio] CF --> CF_T["Autopsy · FTK · EnCase"] NF --> NF_T["Wireshark · Zeek · NetworkMiner"] MF --> MF_T["Cellebrite · Oxygen Forensics"] MEM --> MEM_T["Volatility · Rekall"] MM --> MM_T["ExifTool · FotoForensics"] style E fill:#0d2137,stroke:#58a6ff,color:#e6edf3 style CF fill:#161b22,stroke:#58a6ff,color:#e6edf3 style NF fill:#161b22,stroke:#58a6ff,color:#e6edf3 style MF fill:#161b22,stroke:#58a6ff,color:#e6edf3 style MEM fill:#161b22,stroke:#58a6ff,color:#e6edf3 style MM fill:#161b22,stroke:#58a6ff,color:#e6edf3 style CF_T fill:#0d1117,stroke:#30363d,color:#8b949e style NF_T fill:#0d1117,stroke:#30363d,color:#8b949e style MF_T fill:#0d1117,stroke:#30363d,color:#8b949e style MEM_T fill:#0d1117,stroke:#30363d,color:#8b949e style MM_T fill:#0d1117,stroke:#30363d,color:#8b949e
Pre-Check — Section 1: Evidence Types

1. An analyst captures a disk image with a write-blocker and records its SHA-256 hash. What class of evidence is this?

2. Firewall logs that independently confirm the same blocked connection recorded in a SIEM alert are an example of what?

3. Prefetch records suggesting a malicious executable ran, even though the executable was deleted, are classified as:

Section 2: Log Analysis and Event Identification

Logs are the primary narrative of any security incident. Every operating system, application, network device, and security platform generates log records as it operates. The investigator's job is to correlate these records across sources and reconstruct a coherent timeline.

Windows Event Logs

Windows Event Logs are stored in .evtx format under C:\Windows\System32\winevt\Logs\. The most security-relevant event IDs:

Log ChannelEvent IDWhat It Records
Security4624Successful logon
Security4625Failed logon — watch for brute force patterns
Security4648Logon using explicit credentials (pass-the-hash indicator)
Security4688Process creation — records command execution
Security4698 / 4702Scheduled task created / modified (persistence)
Security4720 / 4732User account created / added to privileged group
System7045New service installed — common malware persistence method
Security4104PowerShell Script Block Logging — full script content

Pattern: Lateral Movement Detection

  1. Event 4625 (failed logon) for administrator from 192.168.10.45 — 47 times in 90 seconds
  2. Event 4624 (successful logon) from the same IP — logon type 3 (network logon)
  3. Event 4688 (process creation) — cmd.exe spawned by services.exe

This sequence — brute force → network logon → command shell from a service process — is a classic indicator of credential stuffing and remote command execution via a compromised service account.

Linux / syslog

FileContents
/var/log/auth.logSSH logins, sudo usage, PAM authentication
/var/log/syslogGeneral system messages
/var/log/kern.logKernel messages — driver errors, network issues
/var/log/secure (RHEL/CentOS)Authentication events
/var/log/audit/audit.logAuditd events — file access, syscalls

A burst of "Failed password" entries from a single IP followed by "Accepted password" is the syslog signature of a successful brute-force attack.

SIEM: Aggregation, Correlation, and Alerting

A SIEM ingests log data from hundreds of sources, normalizes it into a common schema, and applies correlation rules. A single failed login is noise. Ten thousand failed logins across fifty accounts from one IP in five minutes is an alert. Common platforms: Splunk (SPL), Microsoft Sentinel (KQL), IBM QRadar, ELK Stack.

SIEM workflow:

  1. Collection — agents push logs to the SIEM
  2. Normalization — raw data parsed into structured fields
  3. Indexing — records stored for search
  4. Correlation — rules match patterns across multiple events
  5. Alerting — correlation rule fires; alert routed to SOC queue
  6. Investigation — analysts query raw logs surrounding the alert
index=windows EventCode=4656 Object_Name="*lsass*" Access_Mask="0x1410"
| stats count by ComputerName, Account_Name, Process_Name
| where count > 1

This Splunk SPL query detects attempts to open a handle to lsass.exe with read permission — the hallmark of credential dumping tools like Mimikatz.

SOAR: Automated Response Playbooks

SOAR platforms take SIEM alerts and execute automated response workflows (playbooks). A typical phishing SOAR playbook:

  1. Receive alert: "User clicked suspicious URL"
  2. Automatically query VirusTotal for URL hash score
  3. Pull email headers, extract sender, reply-to, originating IP
  4. Check originating IP against threat intelligence feeds
  5. If malicious: quarantine inbox, block URL at proxy, create ITSM ticket
  6. Notify SOC analyst with pre-populated summary for human review

SOAR reduces mean-time-to-respond (MTTR) from hours to minutes. Platforms include Palo Alto XSOAR, Splunk SOAR, and IBM Security SOAR.

Application and Command-Line Logs

A typical suspicious Apache log entry:

192.168.1.100 - admin [15/Jan/2026:03:15:42 +0000] "GET /admin/config.php?cmd=id HTTP/1.1" 200 1234 "-" "curl/7.68.0"

Red flags: cmd=id parameter (command injection attempt), user-agent curl/7.68.0 (automated tool), HTTP 200 response (succeeded), /admin/ from external IP (privileged path accessed remotely).

PowerShell Event ID 4104 logs the full content of scripts as they execute. An encoded command:

powershell.exe -EncodedCommand JABjAGwAaQBlAG4AdAAgAD0AIABOAGUAdwAtAE8AYgBqAGUAYwB0...

is a strong indicator of obfuscation — decoding the Base64 payload in a sandbox often reveals a reverse shell or dropper.

FIGURE 7.2 — SIEM Collection and SOAR Response Pipeline

OS Logs Win / Linux App Logs Web / DB / API Network FW / IDS / DNS SIEM Normalize Correlate ALERT Rule fires Threshold met SOAR Playbook Auto-response Block IP Quarantine Create ticket SOC ANALYST Human review and escalation

Key Points — Log Analysis and Event Identification

sequenceDiagram participant SIEM as SIEM participant SOAR as SOAR Playbook participant VT as VirusTotal API participant TI as Threat Intel Feed participant FW as Proxy / Firewall participant ITSM as ITSM (Ticket) participant SOC as SOC Analyst SIEM->>SOAR: Alert — "User clicked suspicious URL" SOAR->>VT: Query URL hash score VT-->>SOAR: Score: Malicious (87/90 engines) SOAR->>TI: Check originating IP reputation TI-->>SOAR: IP listed — known C2 infrastructure SOAR->>FW: Block URL at proxy SOAR->>ITSM: Create incident ticket (pre-populated) SOAR->>SOC: Notify — summary report for human review SOC->>SOC: Validate, escalate, or close
Post-Check — Section 2: Log Analysis

4. Which Windows Event ID records a new service being installed — a common malware persistence method?

5. An analyst sees 47 Event ID 4625 entries from 192.168.10.45 in 90 seconds, followed immediately by a 4624 (network logon) and then a 4688 with cmd.exe spawned by services.exe. What does this pattern indicate?

6. What is the primary purpose of a SOAR playbook compared to a SIEM?

7. A PowerShell command line contains -EncodedCommand JABjAGwAaQBlAG4AdA.... What does this indicate?

Section 3: Malware Analysis Techniques

Analyzing malware presents a fundamental problem: to understand what it does, you must run it — but running it risks infecting your analysis environment. The solution combines two complementary approaches: static analysis (examining code without executing it) and dynamic analysis (executing it in a controlled, isolated environment).

Static Analysis

Static analysis examines a malware sample without executing it. Key techniques:

File Identification and Hashing

The first step is always to hash the file:

sha256sum suspicious_file.exe
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Submit the SHA-256 hash to VirusTotal or threat intelligence databases — this immediately reveals if the file is a known malware family.

String Extraction

Running strings against a binary extracts printable ASCII/Unicode sequences, often revealing hardcoded C2 addresses, file paths for dropped payloads, registry persistence keys, and HTTP user-agent strings.

strings -n 8 malware.exe | grep -E "(http|https|cmd|powershell|HKCU|HKLM)"

PE Header Analysis

For Windows PE format executables, tools like PEStudio examine:

Disassembly and Decompilation

Ghidra (NIST-released, free) and IDA Pro disassemble binary code into assembly language or pseudo-C, allowing analysts to trace execution logic and identify anti-analysis techniques.

Dynamic Analysis: Sandboxes and Detonation Chambers

When a sample is heavily packed, obfuscated, or uses runtime decryption, dynamic analysis in a sandbox or detonation chamber is required.

FeatureSandboxDetonation Chamber
Primary purposeAutomated behavioral analysisDeep investigation, often manual
ImplementationCloud-based VM, automatedFull emulation or dedicated hardware
SpeedMinutes per sampleMinutes to hours
OutputAutomated report + IOC extractionDetailed forensic artifacts
ExamplesCuckoo, Any.run, Hybrid AnalysisFireEye Malware Analysis, Falcon X

The Sandbox Detonation Workflow

  1. File submission — analyst uploads sample via UI or API
  2. Pre-filter — quick signature check; known malware flagged immediately
  3. Detonation — sample executes in isolated VM (Windows, Linux, or Android)
  4. Event logging — sandbox monitors all system activity during execution
  5. Report generation — analysis returned in human-readable format, typically within minutes

What the Sandbox Monitors

Artifact CategorySpecific Observations
Network communicationsDNS queries, HTTP/HTTPS requests, C2 beacon patterns
File system changesFiles created, modified, or deleted; dropped payloads
Registry modificationsPersistence keys added (Run, RunOnce, Services)
Process activityChild processes spawned, process injection, code hollowing
Memory operationsHeap allocations, injected shellcode
System callsAPI function calls and their arguments

Interpreting a Sandbox Report

VERDICT: MALICIOUS (High Confidence)
MALWARE FAMILY: Ransomware — LockBit variant

NETWORK ACTIVITY:
  DNS: resolves lockbit-news.onion.to — [C2 CHECK-IN]
  HTTP POST: http://45.33.32.156/upload — [DATA EXFILTRATION]

FILE SYSTEM:
  CREATED: C:\Users\Public\readme.txt — [RANSOM NOTE]
  MODIFIED: 847 files with extension change to .locked

REGISTRY:
  HKCU\Software\Microsoft\Windows\CurrentVersion\Run\svchost32 — [PERSISTENCE]

PROCESS:
  cmd.exe -> vssadmin.exe delete shadows /all /quiet — [SHADOW COPY DELETION]

MITRE ATT&CK MAPPING:
  T1486 — Data Encrypted for Impact
  T1490 — Inhibit System Recovery
  T1547.001 — Registry Run Keys / Startup Folder

Sandbox Evasion Techniques

TechniqueHow It WorksDetection Approach
Sleep/DelaySleeps longer than sandbox timeout (e.g., 10 min)Accelerate system clock in sandbox
VM detectionChecks for VMware/VirtualBox artifactsUse bare-metal sandboxes or mask VM indicators
Human interaction checkWaits for mouse movement or keystrokesSandbox simulates user activity
Environment fingerprintingChecks screen resolution, username, file countConfigure sandbox with realistic profiles
Anti-debuggingDetects debugger via timing or API checksUse stealthy debugger configurations
flowchart TD START([Suspicious File Received]) --> HASH[Hash the file — sha256sum] HASH --> VT{Known in VirusTotal?} VT -- Yes --> REPORT1[Document family and IOCs] VT -- No --> STATIC[Static Analysis\nStrings · PE headers · Disassembly] STATIC --> PACKED{Packed / Obfuscated?} PACKED -- No --> STATICDONE[Document findings\nImports, C2 strings, artifacts] PACKED -- Yes --> DYNAMIC[Dynamic Analysis\nSandbox / Detonation Chamber] DYNAMIC --> MONITOR[Monitor: Network · Registry\nFile system · Processes] MONITOR --> SANDBOX_REPORT[Sandbox Report\nIOCs + ATT&CK mapping] STATICDONE --> COMBINE[Combine Findings] SANDBOX_REPORT --> COMBINE REPORT1 --> COMBINE COMBINE --> SHARE[Share via STIX/TAXII\nUpdate SIEM rules] style START fill:#0d2137,stroke:#58a6ff,color:#e6edf3 style HASH fill:#161b22,stroke:#58a6ff,color:#e6edf3 style VT fill:#1a1200,stroke:#d29922,color:#e6edf3 style PACKED fill:#1a1200,stroke:#d29922,color:#e6edf3 style DYNAMIC fill:#1a0a0a,stroke:#f85149,color:#e6edf3 style MONITOR fill:#1a0a0a,stroke:#f85149,color:#e6edf3 style SANDBOX_REPORT fill:#0a1f0a,stroke:#3fb950,color:#e6edf3 style COMBINE fill:#161b22,stroke:#58a6ff,color:#e6edf3 style SHARE fill:#0a1f0a,stroke:#3fb950,color:#e6edf3

FIGURE 7.3 — Sandbox / Detonation Chamber Workflow

SUBMIT File / URL / API PRE-FILTER Sig check / AV DETONATE Isolated VM Win / Linux / Android Full monitoring ARTIFACTS PCAP · memdump · video IOC EXTRACT Hashes · IPs · domains ATT&CK MAP TTP technique IDs REPORT HTML / JSON verdict

Key Points — Malware Analysis Techniques

Post-Check — Section 3: Malware Analysis

8. A PE file's .text section shows an entropy value of 7.9. What does this indicate?

9. During sandbox detonation, a sample executes vssadmin.exe delete shadows /all /quiet. Which MITRE ATT&CK technique does this map to?

10. What does the presence of CreateRemoteThread, VirtualAllocEx, and WriteProcessMemory in a PE file's Import Address Table strongly indicate?

11. A malware sample detects a mouse cursor that hasn't moved in 3 minutes and refuses to execute its payload. What evasion technique is this?

Section 4: IOC Recognition and Threat Intelligence

An Indicator of Compromise (IOC) is a forensic artifact that, when observed in a system or network, indicates with high confidence that a security breach has occurred or is in progress. IOCs are the actionable outputs of malware analysis and incident investigation — they answer: "What observable evidence can we use to detect this threat?"

File-Level IOCs: Hash Values

The most precise IOC for a specific file is its cryptographic hash. SHA-256 is the current standard:

AlgorithmOutput LengthCurrent StatusUse Case
MD5128-bit (32 hex chars)Deprecated — collision-proneLegacy systems only
SHA-1160-bit (40 hex chars)DeprecatedLegacy compatibility
SHA-256256-bit (64 hex chars)Preferred standardAll modern forensic and IOC use
SHA-512512-bit (128 hex chars)High securitySensitive data integrity
ssdeepVariable (fuzzy hash)Active useSimilarity matching between variants

ssdeep (fuzzy hashing) identifies malware variants that share significant code regions — useful for tracking malware families even after recompilation.

Network-Level IOCs

IP Addresses

IP IOCs have a short shelf life — attackers frequently rotate infrastructure. Validation rules:

Domain Names

Domain IOCs are more stable than IPs. Malicious domains frequently show:

IOC TypeSpecificityStabilityAction
IP AddressHigh (exact server)Low (easily changed)Block at firewall / proxy
DomainMedium (campaign level)Medium (days to weeks)Block at DNS resolver
URLVery high (specific resource)Low (path can change)Block at proxy / WAF

System Artifact IOCs

Registry Keys (Windows Persistence)

Registry PathPurpose
HKCU\Software\Microsoft\Windows\CurrentVersion\RunUser-level persistence — runs on user login
HKLM\Software\Microsoft\Windows\CurrentVersion\RunSystem-level persistence — runs on every boot
HKLM\System\CurrentControlSet\Services\Service installation — runs as SYSTEM
HKCU\Software\Microsoft\Windows NT\CurrentVersion\WinlogonWinlogon shell replacement

File System and Process Artifacts

STIX and TAXII: Sharing Intelligence

STIX (Structured Threat Information eXpression) is a JSON-based language for describing cyber threat intelligence objects:

STIX Object TypeRepresents
indicatorAn IOC with detection pattern (hashes, IPs, domains)
malwareA malware family — behaviors, capabilities
threat-actorAn APT group or threat actor
campaignA coordinated series of attacks
attack-patternA MITRE ATT&CK technique
relationshipLinks between objects (malware used by actor)

TAXII (Trusted Automated eXchange of Indicator Information) defines how STIX bundles are distributed. TAXII 2.1 uses REST API endpoints with Collections (named repositories of STIX objects) and Channels (pub/sub for real-time distribution). TAXII consumers (SIEMs, EDR platforms) automatically pull new indicators and create detection rules.

The Pyramid of Pain

David Bianco's Pyramid of Pain ranks IOC types by how difficult they are for attackers to change once defenders start detecting them:

LevelIOC TypePain for AttackerDefender Value
(Bottom) TrivialHash valuesRecompile or pad the fileEasy to detect, easy to evade
EasyIP addressesRotate infrastructureModerate detection value
SimpleDomain namesNew domain registrationBetter — takes hours
AnnoyingNetwork/Host artifactsModify toolsHigh value
ChallengingToolsRetool entire capabilityVery high
(Top) ToughTTPsChange entire attack methodologyHighest — forces new tradecraft
flowchart TD TTP["TTPs — Tactics, Techniques & Procedures\nHardest to change — forces attacker to retrain"] TOOLS["Tools\nMust replace entire toolchain"] ARTIFACTS["Network & Host Artifacts\nRequires modifying tool behavior"] DOMAINS["Domain Names\nNew registration + propagation delay"] IPS["IP Addresses\nRotate to new server — easy"] HASHES["Hash Values\nRecompile or pad file — trivial"] HASHES --> IPS --> DOMAINS --> ARTIFACTS --> TOOLS --> TTP style TTP fill:#0a1f0a,stroke:#3fb950,color:#e6edf3 style TOOLS fill:#0d2137,stroke:#58a6ff,color:#e6edf3 style ARTIFACTS fill:#161b22,stroke:#58a6ff,color:#e6edf3 style DOMAINS fill:#1a1200,stroke:#d29922,color:#e6edf3 style IPS fill:#1a1200,stroke:#d29922,color:#e6edf3 style HASHES fill:#1a0a0a,stroke:#f85149,color:#e6edf3

FIGURE 7.4 — IOC Sources, STIX Packaging, and TAXII Distribution

FILE HASH SHA-256 (64 hex) IP / DOMAIN C2 indicators REGISTRY KEY Persistence artifact EMAIL / URL Phishing indicators STIX 2.1 JSON data model Structured objects TAXII 2.1 REST API transport Collections / Channels SIEM Auto-rule creation EDR Hash / IP blocking Threat Intel Feed enrichment

Key Points — IOC Recognition and Threat Intelligence

Post-Check — Section 4: IOC Recognition and Threat Intelligence

12. An analyst identifies a malicious file and wants to share its IOC with partner organizations. Which hash algorithm should they use?

13. According to the Pyramid of Pain, which IOC type is hardest for an attacker to change once defenders start detecting it?

14. What is the relationship between STIX and TAXII?

15. A SIEM alert fires on an unexpected registry value at HKCU\Software\Microsoft\Windows\CurrentVersion\Run\svchost32 pointing to %APPDATA%\temp\update.exe. This is an example of which type of IOC?

Your Progress

Answer Explanations