Chapter 7: Network Automation and Orchestration Design

Learning Objectives

Pre-Study Assessment

Answer these questions before studying the material to gauge your current understanding.

Pre-Quiz: API-Driven Network Management

1. An architect needs to implement real-time streaming telemetry from network devices with minimal polling overhead. Which protocol is best suited for this requirement?

NETCONF RESTCONF gNMI SNMP

2. What is the primary advantage of NETCONF's candidate datastore over direct configuration changes?

It supports JSON encoding for easier parsing It allows streaming telemetry subscriptions It enables atomic commit-or-rollback transactions on staged changes It uses HTTP methods for CRUD operations

3. A network team managing a multi-vendor environment needs vendor-neutral YANG models that reflect real-world operator requirements. Which model category best fits?

Native (Vendor) models IETF models OpenConfig models Custom proprietary models

4. What common capability do NETCONF, RESTCONF, and gNMI all share?

They all use XML encoding They all use YANG as their data modeling language They all support native streaming telemetry They all operate over SSH

5. What is a key function of an API gateway in a network automation architecture?

Replacing YANG models with custom schemas Providing protocol translation, rate limiting, and centralized authentication Eliminating the need for any southbound protocols Directly replacing network controllers
Pre-Quiz: Controller-Based Automation

6. How does Cisco NSO determine what configuration changes to push to devices?

It pushes full configuration templates regardless of current state It calculates the difference between desired and current state, applying only the delta It relies on device-side scripts to self-configure It replays the entire CLI command history

7. What distinguishes intent-based networking (IBN) from traditional network automation?

IBN uses CLI commands instead of APIs IBN eliminates the need for network monitoring IBN includes a continuous assurance layer that verifies declared intent is being met IBN only works with single-vendor environments

8. In a Software-Defined Access (SDA) fabric, which component maintains the Host Tracking Database using LISP?

Edge Nodes Border Nodes Control Nodes Catalyst Center
Pre-Quiz: CI/CD for Network Infrastructure

9. In a network CI/CD pipeline, what is the purpose of the validation stage using tools like Batfish and Open Policy Agent?

To check YAML syntax and formatting only To simulate routing behavior and enforce policy compliance without touching a live network To deploy configurations to production devices To monitor post-deployment health

10. What is the key difference between push-style and pull-style GitOps for networks?

Push-style uses YANG models while pull-style does not Pull-style controllers continuously detect and correct drift; push-style relies on pipeline triggers Push-style is always more secure than pull-style Pull-style cannot use Git as a source of truth

11. An organization wants to begin adopting network automation but has no existing automation infrastructure. What is the recommended first phase?

Implement full closed-loop automation with intent-based networking Deploy a complete CI/CD pipeline with all five validation stages Start with inventory and read-only API operations to build familiarity Replace all CLI access with NETCONF immediately

12. Why is RESTCONF particularly well-suited for integration with cloud-native tooling like Terraform?

It supports full transaction rollback with candidate datastores It uses stateless HTTP with JSON, compatible with standard web infrastructure like load balancers and API gateways It natively supports streaming telemetry subscriptions It operates over SSH for maximum security

13. What differentiates a CI/CD pipeline from a simple automation script for network changes?

CI/CD pipelines can only use Ansible while scripts use Python CI/CD pipelines include pre-deployment validation, testing, and post-deployment verification stages Automation scripts are always faster than CI/CD pipelines CI/CD pipelines do not support rollback mechanisms

7.1 API-Driven Network Management

The evolution from CLI-based network management to programmatic interfaces represents one of the most significant shifts in network engineering. Modern networks expose structured, machine-readable interfaces that allow software to configure, monitor, and optimize infrastructure at scale.

7.1.1 REST, gRPC, and NETCONF API Design Patterns

NETCONF operates over SSH (port 830), uses XML encoding, and provides transaction-like capabilities with built-in validation and rollback. It introduces configuration datastores (running, candidate, startup), enabling engineers to stage changes in a candidate configuration, validate, and commit atomically.

sequenceDiagram participant Operator as Automation Client participant NC as NETCONF Server (Device) participant Cand as Candidate Datastore participant Run as Running Datastore Operator->>NC: Open SSH Session (port 830) NC-->>Operator: Hello (capabilities exchange) Operator->>NC: lock(candidate) Operator->>Cand: edit-config (staged changes) Operator->>NC: validate(candidate) NC-->>Operator: Validation OK Operator->>NC: commit Cand->>Run: Atomic apply NC-->>Operator: Commit OK Operator->>NC: unlock(candidate) Operator->>NC: close-session

Figure 7.1: NETCONF session lifecycle with candidate datastore commit workflow

RESTCONF (RFC 8040) maps HTTP methods to CRUD operations on YANG-modeled data. It supports JSON and XML, and its stateless nature makes it well-suited for cloud-native tooling.

HTTP MethodRESTCONF OperationDescription
GETReadRetrieve configuration or state data
POSTCreateCreate a new configuration resource
PUTCreate/ReplaceCreate or replace an entire resource
PATCHUpdateMerge changes into existing configuration
DELETEDeleteRemove a configuration resource

gNMI uses gRPC with Protocol Buffers over HTTP/2. It excels at streaming telemetry -- clients subscribe to YANG data model paths and receive updates on change, eliminating polling overhead.

Animation: Side-by-side comparison of SNMP polling vs. gNMI streaming telemetry, showing polling intervals with gaps versus continuous push-based updates

Protocol Comparison

FeatureNETCONFRESTCONFgNMI
TransportSSH (port 830)HTTPSgRPC over HTTP/2
EncodingXMLJSON or XMLProtocol Buffers
Data ModelYANGYANGYANG
Streaming TelemetryLimitedNoNative support
Transaction SupportFull (candidate datastore)PartialSet operations
Ideal Use CaseConfiguration managementWeb/cloud integrationReal-time telemetry

7.1.2 YANG Data Models and Their Role in Automation

YANG defines the structure, constraints, and semantics of network configuration and state data. Three categories of YANG models exist:

CategorySourceBest For
Native (Vendor)Cisco, Juniper, AristaFull platform coverage, vendor-specific features
IETFIETF standards bodyLearning, basic cross-vendor interoperability
OpenConfigOperator consortium (Google, Microsoft, AT&T)Production multi-vendor environments

A practical design approach: use OpenConfig models as the primary abstraction for multi-vendor environments, falling back to native models only for platform-specific features.

7.1.3 API Gateway and Abstraction Layer Design

API gateways serve as intermediaries providing authentication, rate limiting, protocol translation, and request aggregation. Model-Driven Telemetry (MDT) uses a push model where devices stream operational data based on YANG subscriptions.

flowchart TB Consumers["External Consumers\n(Terraform, Ansible, Custom Apps)"] GW["API Gateway / Abstraction Layer\n- Auth & Rate Limiting\n- Protocol Translation\n- Request Aggregation"] NC["NETCONF\n(SSH/830)"] RC["RESTCONF\n(HTTPS)"] GNMI["gNMI\n(gRPC/HTTP2)"] D1["Router A"] D2["Switch B"] D3["Firewall C"] Consumers -->|REST API calls| GW GW --> NC GW --> RC GW --> GNMI NC --> D1 RC --> D2 GNMI --> D3 D1 -.->|Streaming Telemetry| GW D2 -.->|Streaming Telemetry| GW D3 -.->|Streaming Telemetry| GW

Figure 7.2: API gateway abstracting protocol diversity between consumers and network devices

Key Points: API-Driven Network Management

7.2 Controller-Based Automation

Controllers abstract the complexity of multi-device, multi-vendor environments behind a unified management plane, transforming individual device interactions into coordinated network-wide operations.

7.2.1 Cisco Catalyst Center

Catalyst Center provides four core automation capabilities: Visibility (discovery, topology mapping), Intent (business policy translation), Deployment (zero-touch provisioning, templates), and Management (monitoring, assurance analytics). It exposes a REST API enabling integration with CI/CD pipelines via Ansible, Python, or Terraform.

7.2.2 Cisco NSO Design Patterns

NSO addresses multi-vendor, multi-domain orchestration using two key concepts:

NSO uses NETCONF as its primary southbound protocol but supports CLI via Network Element Drivers (NEDs) for legacy devices.

flowchart TB Op["Operator Request\n'Create L3VPN Service'"] SM["NSO Service Model\n(YANG)"] SC["State Convergence Engine\nDiff: Desired vs Current"] DM1["Device Model\n(Cisco IOS-XR)"] DM2["Device Model\n(Juniper JunOS)"] DM3["Device Model\n(Legacy CLI)"] R1["PE Router 1\nvia NETCONF"] R2["PE Router 2\nvia NETCONF"] R3["CE Router 3\nvia NED/CLI"] Op --> SM SM --> SC SC --> DM1 SC --> DM2 SC --> DM3 DM1 -->|Minimal delta config| R1 DM2 -->|Minimal delta config| R2 DM3 -->|Minimal delta config| R3

Figure 7.3: NSO service-to-device translation with state convergence

Animation: NSO state convergence engine receiving a service request, computing diff between desired and current device state, then pushing only the minimal delta configuration to each device
Design AspectCatalyst CenterNSO
Primary DomainEnterprise campus/branchMulti-vendor, multi-domain
Service ModelingTemplate-basedYANG service models
Multi-Vendor SupportCisco-focusedExtensive multi-vendor
Change StrategyTemplate pushState convergence (diff-based)
Ideal ScaleSingle enterpriseSP / large multi-domain enterprise

7.2.3 Intent-Based Networking and Closed-Loop Automation

IBN operates through three building blocks:

  1. Translation -- Captures business requirements and converts them into enforceable network policies (e.g., SGTs and access control)
  2. Activation -- Deploys policies consistently across all relevant devices in the fabric
  3. Assurance -- Continuously monitors and verifies that declared intent is being met; detects drift and triggers remediation

Software-Defined Access (SDA) is the primary enabler for IBN on campus networks:

flowchart TB CC["Catalyst Center\n(Policy & Assurance)"] CN["Control Node\nLISP Map Server\nHost Tracking DB"] BN["Border Node\nFabric-to-External\n(WAN / DC / Internet)"] EN1["Edge Node 1\nVXLAN Encap/Decap"] EN2["Edge Node 2\nVXLAN Encap/Decap"] UL["IP Underlay\n(IS-IS Routed)"] EP1["Endpoints\n(802.1X / MAB)"] EP2["Endpoints\n(802.1X / MAB)"] CC ---|Intent & Policy| CN CC ---|Assurance| BN CN ---|LISP Registration| EN1 CN ---|LISP Registration| EN2 EN1 --- UL EN2 --- UL BN --- UL EP1 ---|SGT Assignment| EN1 EP2 ---|SGT Assignment| EN2

Figure 7.4: Software-Defined Access fabric architecture

IBN is not just about automating configuration pushes. The critical differentiator is the assurance layer -- the ability to continuously verify that the network operates according to declared business intent and to take corrective action when drift occurs.

Key Points: Controller-Based Automation

7.3 CI/CD for Network Infrastructure

The same CI/CD principles that transformed software development apply to network infrastructure, where configuration changes are the "code" and the production network is the deployment target.

7.3.1 Infrastructure as Code (IaC) for Networks

IaC means expressing network configurations in declarative, version-controlled files. Key benefits include:

7.3.2 Testing and Validation Pipelines

A well-designed network CI/CD pipeline has five quality-gate stages:

  1. Lint -- Syntax and format checks (ansible-lint, pyang, yamllint)
  2. Validate -- Policy compliance via Batfish (offline routing simulation) and OPA (Rego policy enforcement)
  3. Test -- Sandbox deployment using CML/GNS3, smoke tests for connectivity and routing
  4. Deploy -- Apply to production with dry-run previews and approval gates
  5. Verify -- Post-deployment health checks; automated rollback on failure
Animation: Configuration change flowing through all five CI/CD pipeline stages (Lint, Validate, Test, Deploy, Verify) with green checkmarks at each gate, showing a failed validation triggering rejection before reaching production
Rollback MethodDescriptionSpeed
Configuration snapshotsPre-change config stored in Git or on deviceFast
Device-native rollbackCisco configure replace, Juniper rollbackVery fast
IaC state revertTerraform apply with previous stateModerate
Full pipeline re-runRe-execute pipeline with previous Git commitSlower but thorough

7.3.3 GitOps Workflows for Network Configuration

GitOps makes Git the single source of truth and uses automated agents to reconcile actual state with declared state. Four principles:

  1. Git as Source of Truth -- All configs live in Git repositories
  2. Declarative Configuration -- Describe desired end state, not steps
  3. Automated State Reconciliation -- Agents compare live network against Git and correct drift
  4. Push-Based or Pull-Based Deployment
flowchart LR subgraph Push["Push Model"] direction LR Dev1["Engineer\nCommits to Git"] --> MR1["Merge to Main"] MR1 --> CICD1["CI/CD Pipeline\nTriggered"] CICD1 --> Net1["Network\nDevices"] end subgraph Pull["Pull Model"] direction LR Dev2["Engineer\nCommits to Git"] --> MR2["Merge to Main"] Ctrl["Controller / Agent\nPolls Git Repo"] --> MR2 Ctrl -->|Reconcile State| Net2["Network\nDevices"] Net2 -.->|Drift Detected| Ctrl end

Figure 7.5: GitOps push-based vs pull-based deployment models

The push model is simpler and more common today. The pull model (like ArgoCD/Flux for Kubernetes) provides stronger drift-correction guarantees but requires custom tooling or platforms like NSO.

7.3.4 Evolution from CLI to Model-Driven Operations

DimensionCLI-BasedModel-Driven
ConfigurationManual CLI commandsDeclarative YANG via APIs
Change TrackingAd-hoc notesGit version control
Validation"Show" commands afterPre-deployment simulation
RollbackManual re-entryAutomated, transactional
MonitoringSNMP pollingModel-driven telemetry (push)
Multi-VendorVendor-specific CLIStandardized YANG (OpenConfig)

The four-phase maturity journey:

flowchart LR P1["Phase 1\nInventory &\nRead Operations"] P2["Phase 2\nStandardized\nTemplates"] P3["Phase 3\nCI/CD Pipeline-\nDriven Changes"] P4["Phase 4\nClosed-Loop\nAutomation"] P1 -->|Build API familiarity| P2 P2 -->|Programmatic execution| P3 P3 -->|Add assurance layer| P4 style P1 fill:#e8f4f8,stroke:#2196F3 style P2 fill:#e8f4f8,stroke:#2196F3 style P3 fill:#e8f4f8,stroke:#2196F3 style P4 fill:#e8f4f8,stroke:#2196F3

Figure 7.6: Network automation maturity journey

Adopting network automation is a journey, not a destination. Start with read-only operations to build confidence, progress to template-driven changes, and mature into full CI/CD pipelines. Attempting to jump directly to closed-loop automation without building foundational practices will likely fail.

Key Points: CI/CD for Network Infrastructure

Post-Study Assessment

Now that you have studied the material, answer the same questions again to measure your improvement.

Post-Quiz: API-Driven Network Management

1. An architect needs to implement real-time streaming telemetry from network devices with minimal polling overhead. Which protocol is best suited for this requirement?

NETCONF RESTCONF gNMI SNMP

2. What is the primary advantage of NETCONF's candidate datastore over direct configuration changes?

It supports JSON encoding for easier parsing It allows streaming telemetry subscriptions It enables atomic commit-or-rollback transactions on staged changes It uses HTTP methods for CRUD operations

3. A network team managing a multi-vendor environment needs vendor-neutral YANG models that reflect real-world operator requirements. Which model category best fits?

Native (Vendor) models IETF models OpenConfig models Custom proprietary models

4. What common capability do NETCONF, RESTCONF, and gNMI all share?

They all use XML encoding They all use YANG as their data modeling language They all support native streaming telemetry They all operate over SSH

5. What is a key function of an API gateway in a network automation architecture?

Replacing YANG models with custom schemas Providing protocol translation, rate limiting, and centralized authentication Eliminating the need for any southbound protocols Directly replacing network controllers
Post-Quiz: Controller-Based Automation

6. How does Cisco NSO determine what configuration changes to push to devices?

It pushes full configuration templates regardless of current state It calculates the difference between desired and current state, applying only the delta It relies on device-side scripts to self-configure It replays the entire CLI command history

7. What distinguishes intent-based networking (IBN) from traditional network automation?

IBN uses CLI commands instead of APIs IBN eliminates the need for network monitoring IBN includes a continuous assurance layer that verifies declared intent is being met IBN only works with single-vendor environments

8. In a Software-Defined Access (SDA) fabric, which component maintains the Host Tracking Database using LISP?

Edge Nodes Border Nodes Control Nodes Catalyst Center
Post-Quiz: CI/CD for Network Infrastructure

9. In a network CI/CD pipeline, what is the purpose of the validation stage using tools like Batfish and Open Policy Agent?

To check YAML syntax and formatting only To simulate routing behavior and enforce policy compliance without touching a live network To deploy configurations to production devices To monitor post-deployment health

10. What is the key difference between push-style and pull-style GitOps for networks?

Push-style uses YANG models while pull-style does not Pull-style controllers continuously detect and correct drift; push-style relies on pipeline triggers Push-style is always more secure than pull-style Pull-style cannot use Git as a source of truth

11. An organization wants to begin adopting network automation but has no existing automation infrastructure. What is the recommended first phase?

Implement full closed-loop automation with intent-based networking Deploy a complete CI/CD pipeline with all five validation stages Start with inventory and read-only API operations to build familiarity Replace all CLI access with NETCONF immediately

12. Why is RESTCONF particularly well-suited for integration with cloud-native tooling like Terraform?

It supports full transaction rollback with candidate datastores It uses stateless HTTP with JSON, compatible with standard web infrastructure like load balancers and API gateways It natively supports streaming telemetry subscriptions It operates over SSH for maximum security

13. What differentiates a CI/CD pipeline from a simple automation script for network changes?

CI/CD pipelines can only use Ansible while scripts use Python CI/CD pipelines include pre-deployment validation, testing, and post-deployment verification stages Automation scripts are always faster than CI/CD pipelines CI/CD pipelines do not support rollback mechanisms

Your Progress

Answer Explanations