Chapter 18: Software Management and Network Health Monitoring
Learning Objectives
Automate device software version management using Catalyst Center SWIM APIs
Build network health monitoring solutions using Catalyst Center and Meraki APIs
Implement automated software image distribution and upgrade workflows using Python and Ansible
Construct health dashboards and alerting systems that consume controller data and trigger automated remediation
Section 1: Software Image Management (SWIM)
Pre-Quiz — Section 1: Software Image Management
1. Which step in the SWIM workflow is a mandatory policy gate that must be completed before Catalyst Center will allow image distribution to proceed?
A. Import ImageB. Tag as Golden ImageC. Distribute ImageD. Poll Task Status
2. A SWIM distribution call returns immediately with a taskId. What is the correct subsequent action?
A. Wait a fixed 10 minutes, then check device versionB. Poll /dna/intent/api/v1/task/{task_id} until endTime is populatedC. Issue the activation call immediately without waitingD. Check the Software Images dashboard in the GUI only
3. Which parameter in the SWIM activation API allows scheduling a device reload for a future maintenance window?
A. maintenanceWindowB. delayActivationC. scheduleAtD. activationTime
4. Which Ansible module from the cisco.dnac collection handles the full SWIM lifecycle declaratively?
A. cisco.dnac.image_distributionB. cisco.dnac.swim_workflow_managerC. cisco.dnac.software_upgradeD. cisco.ios.software_install
5. During SWIM, which step causes service interruption on the target device?
A. Import — the image binary is uploaded to Catalyst CenterB. Tag as Golden — compliance policy is appliedC. Distribute — the image is copied to device flash/diskD. Activate — the device reloads to boot the new image
1.1 What Is SWIM?
Software Image Management (SWIM) is Catalyst Center's lifecycle automation framework for network device operating system images. It replaces ad-hoc manual processes with a governed pipeline that enforces approval gates, tracks compliance, and coordinates upgrades at scale.
Think of SWIM as a combination of an enterprise software package manager (like apt or yum) and a change-management workflow engine. SWIM maintains a repository of network OS images and enforces the concept of a golden image — the single approved version for each device family and role.
1.2 The Five-Step SWIM Workflow
The SWIM lifecycle consists of five sequential operations. Distribution and activation are asynchronous — they return a taskId immediately and require polling for completion.
Animation: SWIM Five-Step Pipeline
1. ImportUpload binary to DNAC repository
→
2. Tag GoldenMark approved for device family + site
→
3. DistributePush to device flash (no disruption)
→
4. ActivateSchedule reload (disruptive)
→
5. Poll TaskWait for endTime or error
flowchart TD
A([Start: Security Advisory or Version Policy]) --> B[Step 1: Import Image\nUpload binary to DNAC\nrepository via URL or file]
B --> C{Import task\ncomplete?}
C -- Poll taskId --> C
C -- endTime populated --> D[Step 2: Tag as Golden\nAssign approved image to\ndevice family + role + site]
D --> E[Step 3: Distribute\nPush image binary to\ndevice flash/disk via HTTPS/SFTP\nNo service interruption]
E --> F{Distribution task\ncomplete?}
F -- Poll taskId --> F
F -- endTime populated --> G[Step 4: Activate\nSchedule reload for\nmaintenance window\nscheduleAt parameter]
G --> H{Activation task\ncomplete?\nTimeout: 1800s}
H -- Poll taskId --> H
H -- endTime populated --> I([Device running\nnew golden image])
H -- isError=true --> J([Raise RuntimeError\ncheck failureReason])
style A fill:#1a4a7a,color:#fff,stroke:#0d2d4a
style I fill:#1a6b3a,color:#fff,stroke:#0d3d20
style J fill:#8b1a1a,color:#fff,stroke:#5a0d0d
style D fill:#4a3a7a,color:#fff,stroke:#2d2050
1.3 Core SWIM API Endpoints
Operation
Method
Endpoint
Import image via URL
POST
/dna/intent/api/v1/image/importation/source/url
List imported images
GET
/dna/intent/api/v1/image/importation
Tag as golden image
POST
/dna/intent/api/v1/image/importation/golden
Distribute to device
POST
/dna/intent/api/v1/image/distribution
Activate on device
POST
/dna/intent/api/v1/image/activation/device
Check task status
GET
/dna/intent/api/v1/task/{task_id}
All endpoints require the X-Auth-Token header obtained from /dna/system/api/v1/auth/token.
1.4 Golden Image Compliance Enforcement
Once you tag an image golden for a site/family/role combination, Catalyst Center continuously evaluates every device for compliance. Non-compliant devices (running a non-golden OS version) can be queried programmatically and automatically upgraded:
GET /dna/intent/api/v1/image/importation?isTaggedGolden=false&siteId=<uuid>
flowchart TD
A([Scheduled Compliance Check]) --> B["GET /image/importation\n?isTaggedGolden=false&siteId=X"]
B --> C{Non-compliant\ndevices found?}
C -- No --> Z([All devices compliant\nLog and exit])
C -- Yes --> D[Open change ticket\nor auto-initiate SWIM]
D --> E[Tag golden image\nfor site + role]
E --> F["POST /image/distribution\nReturns taskId"]
F --> G["Poll /task/{taskId}\nevery 10s"]
G --> H{task.endTime\npopulated?}
H -- No, elapsed < timeout --> G
H -- isError = true --> I([Raise RuntimeError\nfailureReason logged])
H -- Yes --> J["POST /image/activation\nscheduleAt = maintenance window\nReturns taskId"]
J --> K["Poll /task/{taskId}\nevery 10s, timeout 1800s"]
K --> L{Activation\ncomplete?}
L -- No --> K
L -- isError = true --> I
L -- Yes --> M([Device upgraded\nUpdate compliance record])
style A fill:#1a4a7a,color:#fff,stroke:#0d2d4a
style Z fill:#1a6b3a,color:#fff,stroke:#0d3d20
style M fill:#1a6b3a,color:#fff,stroke:#0d3d20
style I fill:#8b1a1a,color:#fff,stroke:#5a0d0d
1.5 Python SDK — Async Task Polling Pattern
The dnacentersdk library wraps all SWIM REST endpoints. The critical pattern every SWIM integration must implement is async task polling:
def poll_task(task_id, timeout=600, interval=10):
"""Poll a Catalyst Center async task until completion or timeout."""
elapsed = 0
while elapsed < timeout:
result = api.task.get_task_by_id(task_id=task_id)
task_data = result.response
if task_data.isError:
raise RuntimeError(f"Task failed: {task_data.failureReason}")
if task_data.endTime: # Task completed successfully
return task_data
time.sleep(interval)
elapsed += interval
raise TimeoutError(f"Task {task_id} did not complete within {timeout}s")
1.6 Ansible SWIM: swim_workflow_manager
The cisco.dnac.swim_workflow_manager Ansible module handles the full lifecycle in a single task. The dnac_api_task_timeout and dnac_task_poll_interval parameters control async wait behavior. Setting taggingPriority: true supersedes any previously tagged golden image for the same combination.
1.7 SWIM at Scale: Scheduling Maintenance Windows
The scheduleAt parameter accepts a UTC epoch timestamp in milliseconds. Distribution can happen during business hours (non-disruptive) while activation is deferred to the weekend window — enabling fire-and-forget upgrade campaigns across hundreds of devices.
Key Points — Section 1: SWIM
SWIM automates the full OS upgrade lifecycle through five ordered steps: import, tag-as-golden, distribute, activate, poll.
The golden image tag is a mandatory policy gate — Catalyst Center rejects distribution requests for any image not tagged golden for the target site/family/role.
Both distribution and activation are asynchronous; the caller must poll /dna/intent/api/v1/task/{task_id} until endTime is populated or isError is true.
Allow up to 1800 seconds (30 min) timeout for activation polling; device reloads can take 10–20 minutes depending on platform.
Use scheduleAt (epoch milliseconds) to target a maintenance window without running the script at 2 AM; combine with non-disruptive distribution during business hours.
Post-Quiz — Section 1: Software Image Management
1. Which step in the SWIM workflow is a mandatory policy gate that must be completed before Catalyst Center will allow image distribution to proceed?
A. Import ImageB. Tag as Golden ImageC. Distribute ImageD. Poll Task Status
2. A SWIM distribution call returns immediately with a taskId. What is the correct subsequent action?
A. Wait a fixed 10 minutes, then check device versionB. Poll /dna/intent/api/v1/task/{task_id} until endTime is populatedC. Issue the activation call immediately without waitingD. Check the Software Images dashboard in the GUI only
3. Which parameter in the SWIM activation API allows scheduling a device reload for a future maintenance window?
A. maintenanceWindowB. delayActivationC. scheduleAtD. activationTime
4. Which Ansible module from the cisco.dnac collection handles the full SWIM lifecycle declaratively?
A. cisco.dnac.image_distributionB. cisco.dnac.swim_workflow_managerC. cisco.dnac.software_upgradeD. cisco.ios.software_install
5. During SWIM, which step causes service interruption on the target device?
A. Import — the image binary is uploaded to Catalyst CenterB. Tag as Golden — compliance policy is appliedC. Distribute — the image is copied to device flash/diskD. Activate — the device reloads to boot the new image
Section 2: Network Health Monitoring with Catalyst Center
Pre-Quiz — Section 2: Network Health Monitoring
1. An individual device has System Health = 9, Data Plane Connectivity = 3, and Control Plane Connectivity = 8. What is its Device Health Score?
A. 9 (the highest component score)B. 6.67 (the average of all three)C. 3 (the minimum of all three)D. 20 (the sum of all three)
2. Catalyst Center's overall Network Health Score (%) is calculated as:
A. Average of all individual device scoresB. Percentage of devices with a score in the 8–10 healthy range divided by total devicesC. Number of healthy devices minus number of unhealthy devicesD. Percentage of devices reachable via SNMP polling
3. Which Catalyst Center Assurance API endpoint returns per-device and aggregate infrastructure health scores?
A. GET /dna/intent/api/v1/client-healthB. GET /dna/intent/api/v1/application-healthC. GET /dna/intent/api/v1/network-healthD. GET /dna/intent/api/v1/device-health
4. For Application Health scoring, which three KPIs are evaluated against CVD thresholds?
A. CPU utilization, memory usage, and interface errorsB. Packet loss, network latency, and jitterC. Uptime, reachability, and SNMP response timeD. Throughput, VLAN count, and spanning-tree convergence time
5. How do you retrieve Catalyst Center Assurance health data for a specific point in the past (e.g., during a reported outage)?
A. Query a separate historical archive API at /dna/intent/api/v1/historyB. Pass a timestamp query parameter (epoch milliseconds) to the standard health APIC. Historical data is only accessible through the Catalyst Center GUI, not via APID. Use the startTime and endTime parameters on the inventory API
2.1 The Assurance Architecture
Catalyst Center Assurance continuously collects telemetry from every managed device using SNMP polling, model-driven streaming telemetry (gRPC/gNMI), syslog ingestion, NetFlow records, and 802.11 wireless radio data. Raw telemetry is normalized, correlated, and aggregated into health scores that update every five minutes.
Client health uses the same 8–10 healthy threshold but is maintained separately for wired and wireless populations. This separation prevents a large healthy wired fleet from masking a spike in wireless issues after an AP firmware upgrade.
2.4 Application Health Score and CVD Thresholds
Traffic Class
Latency Threshold
Packet Loss
Jitter
Voice
< 150ms
< 1%
< 30ms
Video
< 200ms
< 1%
< 50ms
Transactional
< 300ms
< 3%
N/A
Bulk Data
< 500ms
< 5%
N/A
Thresholds are customizable per traffic class via PUT /dna/intent/api/v1/AssuranceGetHealthScoreDefinitions.
2.5 Historical Health Queries
All three Assurance APIs accept an optional timestamp query parameter (epoch milliseconds) for point-in-time historical retrieval. Catalyst Center retains Assurance data for a configurable period (typically 90 days).
Three Assurance domains — network, client, application — all use a 1–10 scale; 8–10 is healthy, 4–7 is fair, 1–3 is poor.
Device scores use a weakest-link model: MIN(system, data plane, control plane) — one degraded subsystem pulls the whole score down.
Client health is tracked separately for wired and wireless to prevent healthy wired scores from masking wireless degradation.
Application health CVD thresholds (latency, loss, jitter) are per-traffic-class and customizable via API.
Historical queries use epoch-millisecond timestamp parameter on standard health endpoints; data retained ~90 days.
Post-Quiz — Section 2: Network Health Monitoring
1. An individual device has System Health = 9, Data Plane Connectivity = 3, and Control Plane Connectivity = 8. What is its Device Health Score?
A. 9 (the highest component score)B. 6.67 (the average of all three)C. 3 (the minimum of all three)D. 20 (the sum of all three)
2. Catalyst Center's overall Network Health Score (%) is calculated as:
A. Average of all individual device scoresB. Percentage of devices with a score in the 8–10 healthy range divided by total devicesC. Number of healthy devices minus number of unhealthy devicesD. Percentage of devices reachable via SNMP polling
3. Which Catalyst Center Assurance API endpoint returns per-device and aggregate infrastructure health scores?
A. GET /dna/intent/api/v1/client-healthB. GET /dna/intent/api/v1/application-healthC. GET /dna/intent/api/v1/network-healthD. GET /dna/intent/api/v1/device-health
4. For Application Health scoring, which three KPIs are evaluated against CVD thresholds?
A. CPU utilization, memory usage, and interface errorsB. Packet loss, network latency, and jitterC. Uptime, reachability, and SNMP response timeD. Throughput, VLAN count, and spanning-tree convergence time
5. How do you retrieve Catalyst Center Assurance health data for a specific point in the past (e.g., during a reported outage)?
A. Query a separate historical archive API at /dna/intent/api/v1/historyB. Pass a timestamp query parameter (epoch milliseconds) to the standard health APIC. Historical data is only accessible through the Catalyst Center GUI, not via APID. Use the startTime and endTime parameters on the inventory API
Section 3: Monitoring with Meraki and SD-WAN
Pre-Quiz — Section 3: Meraki and SD-WAN Monitoring
1. How does authentication work with the Meraki Dashboard API?
A. OAuth 2.0 bearer token obtained from a token endpointB. Session cookie obtained from POST /j_security_checkC. An API key passed in the X-Cisco-Meraki-API-Key headerD. Basic authentication with username and password on every request
2. Which Meraki API endpoint provides online/offline/alerting status for all devices across an entire organization in a single call?
A. GET /networks/{networkId}/devicesB. GET /organizations/{orgId}/devices/statusesC. GET /organizations/{orgId}/inventory/devicesD. GET /organizations/{orgId}/health/summary
3. What authentication mechanism does vManage (Cisco SD-WAN) use for its REST API?
A. API key in the X-Auth-Token headerB. Session cookie from POST /j_security_checkC. Bearer token from OAuth 2.0 flowD. Client certificate mutual TLS
3.1 Meraki API-Based Health Monitoring
Unlike Catalyst Center (on-premises), Meraki monitoring is cloud-native. All telemetry flows to the Meraki cloud dashboard and is accessible via REST API using an API key in the X-Cisco-Meraki-API-Key header at base URL https://api.meraki.com/api/v1/.
Endpoint
Description
GET /organizations/{orgId}/devices/statuses
Online/offline/alerting status for all org devices
GET /networks/{networkId}/devices/{serial}/lossAndLatencyHistory
Per-device loss and latency time-series
GET /organizations/{orgId}/summary/top/devices/byUsage
Top devices by traffic volume
GET /organizations/{orgId}/uplinks/statuses
WAN uplink status for all MX appliances
3.2 SD-WAN (vManage) Health Monitoring
vManage REST API uses session cookie auth from POST /j_security_check. Key endpoints include GET /dataservice/device for inventory/status, GET /dataservice/device/counters for OMP/BFD counters, and GET /dataservice/alarms for active fabric alarms.
3.3 Cross-Platform Health Aggregation
Large enterprises span multiple controllers: Catalyst Center (campus), vManage (SD-WAN), and Meraki (branches). A normalization layer translates controller-specific schemas into a common format:
# Catalyst Center device → common schema
{"source": "Catalyst Center", "health_score": device.overallHealth,
"status": "healthy" if health >= 8 else "degraded"}
# Meraki device → common schema
{"source": "Meraki", "health_score": 10 if status=="online" else 1}
# SD-WAN device → common schema
{"source": "SD-WAN", "health_score": 10 if reachability=="reachable" else 1}
flowchart LR
subgraph SOURCES["Controller Data Sources"]
CC["Catalyst Center\nAssurance API\nX-Auth-Token header"]
VM["vManage API\nSD-WAN Fabric\nSession cookie auth"]
MK["Meraki Dashboard API\nCloud-Managed Branches\nX-Cisco-Meraki-API-Key header"]
end
subgraph NORM["Normalization Layer\n(Python Service)"]
N1["normalize_to_common_schema()\nsource: catalyst_center\nhealth_score, status"]
N2["normalize_to_common_schema()\nsource: sdwan\nreachability to score"]
N3["normalize_to_common_schema()\nsource: meraki\nonline status to score"]
end
subgraph DEDUP["Alert Processing"]
AD["Correlate by 60s window"]
AR["De-duplicate by root cause"]
AE["Enrich with topology context"]
AS["Suppress during maintenance"]
AD --> AR --> AE --> AS
end
subgraph OUTPUTS["Downstream Systems"]
G["Grafana Dashboard"]
P["PagerDuty Escalation"]
S["ServiceNow Ticketing"]
end
CC --> N1
VM --> N2
MK --> N3
N1 --> DEDUP
N2 --> DEDUP
N3 --> DEDUP
DEDUP --> G
DEDUP --> P
DEDUP --> S
style SOURCES fill:#1a2a4a,color:#fff,stroke:#0d1a2d
style NORM fill:#2a1a4a,color:#fff,stroke:#1a0d2d
style DEDUP fill:#3a2a1a,color:#fff,stroke:#2d1a0d
style OUTPUTS fill:#1a3a2a,color:#fff,stroke:#0d2018
3.4 Alert Aggregation and Deduplication
Alert storms occur when a single upstream failure (a WAN circuit going down) generates dozens of downstream alerts across multiple controllers simultaneously. An effective aggregation layer must:
Correlate by time window — group alerts arriving within a 60-second window affecting the same network segment
De-duplicate by root cause — create one "WAN circuit failure" alert rather than 30 individual device alerts
Suppress during maintenance — suppress alerts for devices in scheduled maintenance windows
Key Points — Section 3: Meraki and SD-WAN Monitoring
Meraki is cloud-native; authentication uses an API key in the X-Cisco-Meraki-API-Key header — no token exchange required.
The /organizations/{orgId}/devices/statuses endpoint is the Meraki equivalent of Catalyst Center's network-health API — org-wide status in one paginated call.
vManage (SD-WAN) uses session cookie authentication from POST /j_security_check, unlike the token-based Catalyst Center flow.
Cross-platform environments require a normalization layer mapping controller-specific health models to a common schema for unified dashboarding.
Alert deduplication and root-cause correlation are essential to prevent alert storms from a single upstream failure generating hundreds of tickets.
Post-Quiz — Section 3: Meraki and SD-WAN Monitoring
1. How does authentication work with the Meraki Dashboard API?
A. OAuth 2.0 bearer token obtained from a token endpointB. Session cookie obtained from POST /j_security_checkC. An API key passed in the X-Cisco-Meraki-API-Key headerD. Basic authentication with username and password on every request
2. Which Meraki API endpoint provides online/offline/alerting status for all devices across an entire organization in a single call?
A. GET /networks/{networkId}/devicesB. GET /organizations/{orgId}/devices/statusesC. GET /organizations/{orgId}/inventory/devicesD. GET /organizations/{orgId}/health/summary
3. What authentication mechanism does vManage (Cisco SD-WAN) use for its REST API?
A. API key in the X-Auth-Token headerB. Session cookie from POST /j_security_checkC. Bearer token from OAuth 2.0 flowD. Client certificate mutual TLS
Section 4: Automated Alerting and Remediation
Pre-Quiz — Section 4: Automated Alerting and Remediation
1. At which tier of the Self-Healing Maturity Model does ENAUTO automation skill — building Python services and Ansible playbooks that detect issues and execute corrective actions — primarily apply?
2. When registering a Catalyst Center webhook subscription, what HTTP status code must the receiver return to acknowledge successful receipt of an event?
A. 201 CreatedB. 204 No ContentC. 200 OKD. 202 Accepted
3. Which Catalyst Center API enriches a raw event ID with root cause analysis, recommended actions, affected hosts, and historical occurrence count?
A. GET /dna/intent/api/v1/event/subscriptionB. GET /dna/intent/api/v1/issues/{issue_id}C. GET /dna/intent/api/v1/network-healthD. GET /dna/intent/api/v1/event/webhook
4. What is the critical differentiator of NSO (Network Services Orchestrator) for multi-device remediation compared to a simple Python script?
A. NSO can push changes faster than REST API callsB. NSO provides atomic multi-device transactions with rollback — either all changes apply or none doC. NSO generates Ansible playbooks automaticallyD. NSO eliminates the need for device authentication
5. What percentage of network alerts does Cisco IT's production self-healing automation handle without human intervention?
A. 75%B. 95%C. 99%D. 99.998%
4.1 The Self-Healing Maturity Model
Tier
Name
Description
Technology
1
Auto-Detection
Real-time visibility through continuous monitoring
Catalyst Center Assurance, Meraki alerts
2
Auto-Correlation
Intelligent grouping to identify root causes
Catalyst Center AI analytics
3
Auto-Remediation
Automated evaluation and execution of corrective actions
Python + Catalyst Center APIs, Ansible AWX
4
Autonomous Operation
Full closed-loop AI-driven autonomy
Emerging (LLM-based, 2025–2026)
Cisco IT's production automation handles 99.998% of all network alerts without human intervention, processing millions of daily events.
4.2 Catalyst Center Event Notifications and Webhooks
Instead of polling health APIs every five minutes, subscribe to specific events — Catalyst Center pushes notifications via HTTPS POST the moment conditions change. Event domains include Assurance (health degradation, AI anomalies), SWIM (distribution/activation completion), and Network (reachability changes, interface transitions).
Two-step setup:
Register a webhook destination via POST /dna/intent/api/v1/event/webhook
Subscribe to specific event IDs via POST /dna/intent/api/v1/event/subscription
Before executing remediation, enrich the raw event. The Issue Enrichment API returns root cause analysis, recommended actions, affected hosts, and historical occurrence count. Pass the issue ID in both the URL path and the entity_value header:
4.4 Flask Webhook Receiver and REMEDIATION_MAP Pattern
The central orchestration pattern is a REMEDIATION_MAP dict mapping event IDs to handler functions. Each handler receives enriched context and decides whether to auto-fix, escalate, or log:
The webhook endpoint must always return HTTP 200 — Catalyst Center expects acknowledgement regardless of internal processing outcome.
flowchart TD
subgraph DETECT["Detection Layer"]
CA["Catalyst Center Assurance\nHealth scores + AI anomaly detection\nIssue correlation every 5 min"]
CA --> EN["Event Notification System\nSubscribe per event ID\nDomains: Assurance, SWIM, Network"]
end
subgraph ORCHESTRATE["Orchestration Layer"]
WR["Flask/FastAPI\nWebhook Receiver\nHTTPS POST /webhook"]
IE["Issue Enrichment API\n/dna/intent/api/v1/issues/{id}\nRoot cause + occurrence count"]
CE["Context Evaluation\nOccurrence threshold\nSeverity classification"]
RD["REMEDIATION_MAP\nDispatch to handler\nby event ID"]
WR --> IE --> CE --> RD
end
subgraph ACTIONS["Action Layer"]
AF["Auto-Fix\nAnsible AWX runbook\nor NSO atomic transaction"]
ES["Escalate\nPagerDuty / Webex / Slack"]
TK["Ticket + Audit Log\nServiceNow / Splunk"]
end
subgraph FEEDBACK["Feedback Layer"]
FB["Remediation outcomes\nRefine thresholds\nUpdate alert rules via GitOps"]
end
EN -- "HTTPS POST\n(eventId, deviceId, issueId)" --> WR
RD --> AF
RD --> ES
RD --> TK
AF --> FB
ES --> FB
TK --> FB
FB --> CA
style DETECT fill:#1a2a4a,color:#fff,stroke:#0d1a2d
style ORCHESTRATE fill:#2a1a4a,color:#fff,stroke:#1a0d2d
style ACTIONS fill:#1a3a2a,color:#fff,stroke:#0d2018
style FEEDBACK fill:#3a2a1a,color:#fff,stroke:#2d1a0d
4.5 NSO for Multi-Device Remediation
NSO's MAAPI Python API provides atomic multi-device transactions with rollback. Changes to two devices either both commit together or neither does — preventing partial failure states that leave the network worse than before:
with ncs.maapi.single_write_trans("admin", "python") as t:
try:
primary.config.ios__interface.GigabitEthernet[iface].shutdown = True
backup.config.ios__interface.GigabitEthernet["0/1"].shutdown = False
t.apply() # Atomic: both commit or neither does
except Exception as e:
t.revert() # Roll back both devices
raise
4.6 Notification Integrations
Webex: POST to https://webexapis.com/v1/messages with Authorization: Bearer {token} and {"roomId": ..., "text": ...}.
Slack: POST to an incoming webhook URL with an attachments payload. Color-code by severity: green (info), orange (warning), red (critical).
PagerDuty: POST to https://events.pagerduty.com/v2/enqueue with routing key and severity.
Key Points — Section 4: Alerting and Remediation
Tier 3 (Auto-Remediation) is the primary ENAUTO exam focus: Python/Ansible systems that detect, evaluate context, and execute fixes.
Catalyst Center webhooks require two setup steps: register a destination, then subscribe to specific event IDs by domain.
Always return HTTP 200 from a webhook receiver — Catalyst Center expects acknowledgement regardless of internal processing.
Issue Enrichment API (GET /dna/intent/api/v1/issues/{id}) provides root cause, recommended actions, and occurrence count — use this to drive intelligent (not hardcoded) remediation decisions.
NSO's transaction model (t.apply() / t.revert()) is the critical differentiator for multi-device remediation — atomic all-or-nothing with rollback.
Post-Quiz — Section 4: Automated Alerting and Remediation
1. At which tier of the Self-Healing Maturity Model does ENAUTO automation skill — building Python services and Ansible playbooks that detect issues and execute corrective actions — primarily apply?
2. When registering a Catalyst Center webhook subscription, what HTTP status code must the receiver return to acknowledge successful receipt of an event?
A. 201 CreatedB. 204 No ContentC. 200 OKD. 202 Accepted
3. Which Catalyst Center API enriches a raw event ID with root cause analysis, recommended actions, affected hosts, and historical occurrence count?
A. GET /dna/intent/api/v1/event/subscriptionB. GET /dna/intent/api/v1/issues/{issue_id}C. GET /dna/intent/api/v1/network-healthD. GET /dna/intent/api/v1/event/webhook
4. What is the critical differentiator of NSO (Network Services Orchestrator) for multi-device remediation compared to a simple Python script?
A. NSO can push changes faster than REST API callsB. NSO provides atomic multi-device transactions with rollback — either all changes apply or none doC. NSO generates Ansible playbooks automaticallyD. NSO eliminates the need for device authentication
5. What percentage of network alerts does Cisco IT's production self-healing automation handle without human intervention?