Chapter 17

Testing, Validation, and Network Simulation

Learning Objectives

Section 1: Testing and Validation Frameworks

Pre-Quiz — Section 1: Test Your Prior Knowledge

1. What does the "shift-left" principle mean in network automation testing?

2. Which component provides network-specific parsers for over 2,000 Cisco show commands?

3. In pyATS, what is the purpose of the testbed YAML file?

4. What are the three structural sections of a pyATS aetest test script, in order?

5. What key advantage does Genie's diff operation provide over a simple text diff?

1.1 The Shift-Left Testing Philosophy

The core principle of automated network testing is the shift-left approach: move validation as early as possible in the change lifecycle. Each testing gate eliminates a class of errors before they reach the next, more expensive stage. A typo in a Jinja2 template caught by a linter costs nothing; the same typo reaching a production core router can cost hours of downtime.

flowchart LR A([Lint]) --> B([Schema\nValidate]) B --> C([Unit\nTest]) C --> D([Virtual\nLab]) D --> E([Pre-Change\nSnapshot]) E --> F([Deploy]) F --> G([Post-Change\nVerify]) style A fill:#d4edda,stroke:#28a745,color:#000 style B fill:#d4edda,stroke:#28a745,color:#000 style C fill:#fff3cd,stroke:#ffc107,color:#000 style D fill:#fff3cd,stroke:#ffc107,color:#000 style E fill:#fde8d8,stroke:#fd7e14,color:#000 style F fill:#f8d7da,stroke:#dc3545,color:#000 style G fill:#f8d7da,stroke:#dc3545,color:#000 subgraph cost["Cheapest to Fail → Most Expensive to Fail"] A B C D E F G end

1.2 pyATS and Genie

pyATS (Python Automated Test Systems) is Cisco's open-source network test framework. Genie is the network-specific library built on top of pyATS — if pyATS is the chassis, Genie is the purpose-built body kit with the right instruments for network environments.

ComponentRole
pyATS FrameworkGeneric, pluggable Python test framework (aetest, topology, datastructures)
Genie LibraryNetwork-specific parsers, device models, diff engine, testbed definitions
XPRESSOWeb dashboard for managing test suites, testbeds, results, and insights
BindingsIntegrations with Robot Framework, pytest, Jenkins, and third-party tools
flowchart TD subgraph pyats["pyATS / Genie Stack"] direction TB A["XPRESSO Dashboard\n(Web UI & Insights)"] B["Bindings\n(pytest · Robot · Jenkins)"] C["Genie Library\n(Parsers · Device Models · Diff Engine · Testbed)"] D["pyATS Framework\n(aetest · Topology · Datastructures · Connections)"] end E["Network Devices\n(IOS-XE · NX-OS · IOS-XR · ASA)"] F["CML Virtual Lab"] G["CI/CD Pipeline\n(GitLab / GitHub Actions)"] A --> C B --> D C --> D D --> E D --> F B --> G style A fill:#cce5ff,stroke:#004085,color:#000 style B fill:#cce5ff,stroke:#004085,color:#000 style C fill:#d4edda,stroke:#155724,color:#000 style D fill:#d4edda,stroke:#155724,color:#000 style E fill:#f8d7da,stroke:#721c24,color:#000 style F fill:#fff3cd,stroke:#856404,color:#000 style G fill:#e2e3e5,stroke:#383d41,color:#000

1.3 Testbed YAML and aetest

The testbed YAML file is the entry point for any pyATS/Genie workflow — the equivalent of an Ansible inventory file. It defines every device in scope including OS type, connection protocol, IP address, and credentials. The aetest module provides the three-phase structure: CommonSetup (connect to devices), Testcase (one or more test classes), and CommonCleanup (disconnect and restore state).

1.4 Pre/Post Change Validation with Genie Learn and Diff

Genie's learn and diff capability captures structured snapshots of device state before and after a change, then produces a machine-readable comparison. Unlike text diff, Genie diff identifies that "prefix 10.100.0.0/24 was added to VRF CUSTOMER-A via BGP neighbor 192.168.1.2" — semantic meaning, not just line changes.

sequenceDiagram participant E as Engineer / Pipeline participant G as Genie CLI / API participant D as Network Device participant S as Snapshot Store Note over E,S: Pre-Change Phase E->>G: genie learn ospf bgp routing interface G->>D: SSH — show ospf / bgp / routing commands D-->>G: Structured CLI output G-->>S: Save snapshots/pre/ (JSON) Note over E,S: Change Deployment E->>D: Deploy configuration change (Ansible / NAPALM) D-->>E: Change applied Note over E,S: Post-Change Phase E->>G: genie learn ospf bgp routing interface G->>D: SSH — same show commands D-->>G: Structured CLI output G-->>S: Save snapshots/post/ (JSON) Note over E,S: Diff and Validate E->>G: genie diff snapshots/pre/ snapshots/post/ G-->>E: Structured diff report (added / removed / changed) alt No unexpected changes E->>E: PASS — pipeline continues else Unexpected state change E->>E: FAIL — pipeline aborts, rollback triggered end

Interactive: Genie Pre/Post Change Diff

Click "Run Diff" to see how Genie compares structured state snapshots before and after a BGP change.

PRE-CHANGE (snapshots/pre/)
bgp.vrf.default.neighbor.10.0.0.1:
session_state: Established
prefixes_received: 12
bgp.vrf.default.neighbor.10.0.0.2:
session_state: Established
prefixes_received: 8
POST-CHANGE (snapshots/post/)
bgp.vrf.default.neighbor.10.0.0.1:
session_state: Established
prefixes_received: 12
bgp.vrf.default.neighbor.10.0.0.2:
session_state: Established
prefixes_received: 8

Key Points — Section 1

Section 2: Cisco Platform APIs for Validation

Pre-Quiz — Section 2: Test Your Prior Knowledge

1. In a testing context, what is a "test oracle"?

2. Which Catalyst Center Assurance API endpoint would you query to detect whether a change introduced new network issues?

3. After deploying a change to an SD-WAN edge device, which API endpoint confirms BFD session health?

4. What makes the Meraki Dashboard change log endpoint particularly valuable for post-deployment audit validation?

2.1 Platform APIs as Test Oracles

Every Cisco platform covered in this study guide exposes an API. In a testing context, these APIs serve as test oracles — authoritative sources that confirm whether a deployed change produced the expected outcome. Rather than relying solely on CLI parsing, modern validation workflows query platform APIs for structured, machine-readable state information.

2.2 Catalyst Center Assurance API

The Catalyst Center Assurance API aggregates device health, client health, and issue data across the entire fabric. After deploying a change, the Assurance API becomes the validation endpoint.

API EndpointValidation Use Case
GET /dna/intent/api/v1/network-healthVerify overall health score did not degrade
GET /dna/intent/api/v1/device-healthCheck per-device health after config push
GET /dna/intent/api/v1/issuesIdentify new issues introduced by the change
GET /dna/intent/api/v1/topology/physical-topologyConfirm topology matches expected state

2.3 SD-WAN vManage Monitoring APIs

The Cisco Catalyst SD-WAN vManage REST API provides monitoring endpoints for validating SD-WAN changes. Post-deployment validation should confirm BFD session health, OMP route distribution, and application-aware routing policy application.

EndpointPurpose
GET /dataservice/device/bfd/summaryVerify BFD sessions are UP after tunnel changes
GET /dataservice/device/omp/routes/receivedConfirm OMP routes are being received
GET /dataservice/device/control/connections/summaryCheck control-plane connections
GET /dataservice/device/app-route/statisticsValidate application-aware routing is functioning

2.4 Meraki Change Log for Audit Validation

The Meraki Dashboard API records every configuration change — including those made by automation scripts — in a change log. This enables post-deployment audit validation: confirming your automation script applied exactly the changes it was supposed to, and nothing more.

2.5 ISE for Policy Validation

After deploying network access policies through ISE automation, the pxGrid or ERS APIs can confirm that policy elements are correctly configured and that authentication/authorization is functioning as expected: pushed Network Access Policies are active, endpoint groups are applied, and live session data confirms authentication is succeeding post-change.

Key Takeaway: Every Cisco platform API doubles as a validation endpoint. Design automation workflows to query APIs post-deployment for health, state, and change audit data — transforming platform APIs into a closed-loop validation system.

Key Points — Section 2

Section 3: Network Topology Simulation

Pre-Quiz — Section 3: Test Your Prior Knowledge

1. What is the official Python client library for programmatic interaction with Cisco Modeling Labs?

2. What method does a CML lab object expose to automatically generate a pyATS-compatible testbed YAML from a running lab?

3. In a CI/CD pipeline using CML, why is the lab teardown placed in a finally block?

4. What advantage does storing CML topology definitions as YAML in Git provide?

3.1 Why Simulate? The Case for Virtual Labs

Testing automation code against production devices introduces risk. Testing against physical lab hardware requires dedicated equipment, physical access, and scheduling. Virtual network simulation — running actual device images in software — provides a safe, disposable, on-demand environment. Like flight simulators that allow testing autopilot software without putting passengers at risk, virtual labs provide a production-identical environment where failure is safe and reproducible.

3.2 Cisco Modeling Labs (CML) Overview

CML is Cisco's enterprise-grade network simulation platform, built from the ground up as API-first. Every operation — creating labs, starting nodes, generating testbeds — is available through a RESTful API and documented via OpenAPI/Swagger at https://<cml-server>/api/v0/ui/.

CapabilityDescription
Full REST APICreate, manage, and tear down labs programmatically
Real Cisco imagesRun actual IOS-XE, NX-OS, IOS-XR, ASA, and other images
Topology YAMLVersion-control lab definitions as code
pyATS integrationAuto-generate testbed YAML from running labs
Dynamic modificationAdd nodes and links to a running simulation

3.3 CML API with virl2-client

The virl2-client library wraps the CML REST API in a Pythonic interface. The client version must match the CML controller version. Basic workflow: ClientLibrary()create_lab()create_node()create_link()lab.start().

3.4 CML in CI/CD: Full Lifecycle Management

In a pipeline, CML enables programmatic lab lifecycle management: spin up a fresh lab per CI run, run tests, tear down. Each run gets a clean environment — no state bleeding between test runs. The finally block guarantees teardown even on test failure, preventing resource leaks.

sequenceDiagram participant P as CI Pipeline participant C as virl2-client participant CML as CML Server participant L as Virtual Lab Nodes participant T as pyATS / pytest P->>C: import_lab(topology_yaml, title="CI-Test-42") C->>CML: POST /api/v0/import (topology YAML) CML-->>C: lab_id created P->>C: lab.start() C->>CML: PUT /api/v0/labs/{id}/start CML->>L: Boot IOS-XE / NX-OS node images L-->>CML: Nodes reach BOOTED state P->>C: lab.wait_until_lab_converged(timeout=600) CML-->>P: All nodes converged P->>C: lab.get_pyats_testbed() CML-->>P: testbed.yaml (auto-generated) P->>T: pytest tests/integration/ --testbed testbed.yaml T->>L: SSH — run show commands / apply configs L-->>T: Parsed structured output T-->>P: PASS / FAIL results Note over P,L: Finally block — always executes P->>C: lab.stop() → lab.wipe() → lab.remove() CML-->>P: Lab destroyed, resources freed

Interactive: CML Lab Lifecycle in a CI Pipeline

Watch each pipeline stage activate in sequence. Click "Run Pipeline" to start.

Import Lab YAML
virl2-client
Start Lab
CML Server
Wait Converged
~10 min
Get Testbed YAML
auto-generated
Run Tests
pyATS / pytest
Teardown Lab
finally block
Key Takeaway: CML transforms virtual lab management from a manual, click-driven exercise into a fully programmable, API-driven workflow. The same pyATS test suite runs identically against the virtual lab during development and production during deployment — only the testbed changes.

Key Points — Section 3

Section 4: Automated Testing Pipelines

Pre-Quiz — Section 4: Test Your Prior Knowledge

1. In a pytest network test, what does scope="session" on a fixture accomplish?

2. What is the primary advantage of Robot Framework over pytest for network testing in some organizations?

3. In a GitLab CI/CD pipeline, where should device credentials be stored?

4. In Test-Driven Automation (TDA), when should you write the test?

5. Which pipeline stages run on every commit/pull request (not just on merge to main)?

4.1 CI/CD Principles for Network Automation

Continuous Integration means every change to automation code triggers an automated pipeline that lints, validates, and tests the change. Continuous Deployment means after passing all automated tests, changes are deployed automatically or with a single human approval. The GitOps model treats the Git repository as the single source of truth for network state: any merge to main triggers a pipeline that reconciles the live network to match the repository.

4.2 Pipeline Stages

StagePurposeTools
LintSyntax and style checkingyamllint, ansible-lint, pylint, black
Schema ValidateEnforce data models and policy constraintsYANG validators, Cerberus, custom scripts
Unit TestTest automation logic in isolation (no devices)pytest, unittest
Integration TestDeploy to virtual lab and run testsCML + pyATS, GNS3 + pytest
Pre-change SnapshotCapture production state before deploymentgenie learn
DeployPush configs to production devicesAnsible, NAPALM, Terraform
Post-change ValidateVerify change succeeded, no regressionsgenie diff, pyATS test suite
NotifyReport results to stakeholdersSlack, email, PagerDuty, ticketing
flowchart TD PR([Git Push /\nPull Request]) --> L subgraph merge_request["On Every Commit / PR"] L["Lint\nyamllint · ansible-lint · black · pylint"] SV["Schema Validate\nYANG · Cerberus · custom checks"] UT["Unit Test\npytest — no devices needed"] L --> SV --> UT end subgraph main_branch["On Merge to main"] IT["Integration Test\nCML virtual lab + pyATS"] PC["Pre-Change Snapshot\ngenie learn — production devices"] DEP["Deploy\nAnsible / NAPALM / Terraform"] PV["Post-Change Validate\ngenie diff · pyATS regression suite"] NT["Notify\nSlack · PagerDuty · ticketing"] IT --> PC --> DEP --> PV --> NT end UT -->|"merge approved"| IT PV -->|"diff clean"| SUCCESS([Change Complete]) PV -->|"unexpected diff"| ROLLBACK([Rollback and Alert]) style L fill:#d4edda,stroke:#28a745,color:#000 style SV fill:#d4edda,stroke:#28a745,color:#000 style UT fill:#d4edda,stroke:#28a745,color:#000 style IT fill:#fff3cd,stroke:#ffc107,color:#000 style PC fill:#cce5ff,stroke:#004085,color:#000 style DEP fill:#f8d7da,stroke:#dc3545,color:#000 style PV fill:#cce5ff,stroke:#004085,color:#000 style NT fill:#e2e3e5,stroke:#383d41,color:#000 style SUCCESS fill:#d4edda,stroke:#155724,color:#000 style ROLLBACK fill:#f8d7da,stroke:#721c24,color:#000

4.3 pytest for Network Testing

pytest is the de facto standard Python testing framework for network automation. Its fixture system manages expensive shared resources like device connections. Key features for network testing:

4.4 Robot Framework for Keyword-Driven Testing

Robot Framework integrates with pyATS through pyats.contrib. Its natural-language keyword syntax makes tests readable by non-developers — valuable when change advisory boards or compliance teams need to review test results. Output is HTML reports that can be published to GitLab/GitHub Pages.

4.5 Test-Driven Automation (TDA)

TDA applies the TDD philosophy to network automation: write the test before writing the automation code. This forces clear thinking about desired state before implementation begins.

sequenceDiagram participant E as Engineer participant T as Test Suite (pytest / pyATS) participant D as Network Device participant A as Automation Code Note over E,A: Step 1 — Write the test first E->>T: Write test_vlan_100_exists_on_all_switches() E->>T: pytest tests/ T->>D: SSH — show vlan brief D-->>T: VLAN 100 not present T-->>E: FAIL (expected — desired state not yet deployed) Note over E,A: Step 2 — Write the automation E->>A: Author Ansible playbook / Nornir script A->>D: Configure VLAN 100 on all access switches D-->>A: Configuration applied Note over E,A: Step 3 — Verify the test passes E->>T: pytest tests/ T->>D: SSH — show vlan brief D-->>T: VLAN 100 present, name = SALES T-->>E: PASS Note over E,A: Step 4 — Refactor E->>A: Clean up playbook / script structure E->>T: pytest tests/ (regression check) T-->>E: PASS — test stays green

4.6 Secrets Management in Pipelines

Never store credentials in pipeline YAML files or testbed YAML directly. Use your CI/CD platform's secrets management facility. In pyATS testbeds, reference environment variables using %ENV{VAR_NAME}:

PlatformSecret Storage Mechanism
GitLab CICI/CD Variables (masked, protected)
GitHub ActionsRepository Secrets
JenkinsCredentials Store
HashiCorp VaultVault secrets engine with dynamic credentials
Key Takeaway: CI/CD pipelines bring software engineering discipline to network automation. By combining linting, unit tests, virtual lab integration tests, pre/post change snapshots, and structured test frameworks, teams deliver network changes that are validated, auditable, and reversible — transforming change management from a high-risk event into a routine automated workflow.

Key Points — Section 4

Post-Quiz — Test Your Understanding

Post-Quiz — Section 1: Testing and Validation Frameworks

1. Which pyATS command-line tool generates structured snapshots of network features like OSPF, BGP, and routing from real devices?

2. According to the shift-left pipeline model, which stage is cheapest to fail?

3. In a Genie testbed YAML, how do you securely reference a device password stored in an environment variable?

4. What does the aetest.CommonSetup section typically do in a pyATS test script?

5. After running ospf_diff = Diff(pre_ospf, post_ospf) and calling ospf_diff.findDiff(), how do you check in Python if any OSPF state change was detected?

Post-Quiz — Section 2: Cisco Platform APIs for Validation

1. You deploy an Ansible playbook that adds VLANs to a Meraki network. Post-deployment, you query the change log and find event types vlan_created, vlan_updated, and policy_rule_deleted. What should your validation script do?

2. To validate that OMP routes are being received after an SD-WAN vEdge change, which vManage endpoint do you query?

3. A Catalyst Center Assurance validation shows the network health score dropped from 92% to 78% after deployment. What should the pipeline do?

Post-Quiz — Section 3: Network Topology Simulation

1. You are writing a CI/CD integration that starts a CML lab. After calling lab.start(), what method should you call before running pyATS tests?

2. In the CML topology YAML format, what does the node_definition field specify?

3. A CML lab is used for CI testing. After test failures, engineers discover the CML server is running out of resources. The most likely cause is:

Post-Quiz — Section 4: Automated Testing Pipelines

1. You want to run the same BGP neighbor test against 20 different devices without duplicating test code. Which pytest feature enables this?

2. In TDA (Test-Driven Automation), you write a test for test_ospf_neighbor_count_equals_3() and run it immediately. What result do you expect and why?

3. A GitLab CI pipeline has an integration test stage that fails. The pipeline is configured with when: always for artifact collection. What happens to test result artifacts?

4. Which GitOps concept describes the Git repository as the single source of truth, where merging to main triggers reconciliation of live network state?

Your Progress

Answer Explanations