Chapter 8: Software-Defined Architecture and SD-WAN Design
Learning Objectives
Design SD-WAN overlay and underlay architectures for enterprise WANs
Evaluate fabric-based architectures including VXLAN and LISP for campus and data center networks
Compare controller-based solution designs across SD-WAN, SD-Access, and ACI
Plan multi-domain integration strategies that maintain end-to-end segmentation
Pre-Study Knowledge Check
Answer these questions before studying the material to establish your baseline understanding.
Pre-Quiz
1. An enterprise is deploying Cisco SD-WAN. Which component is responsible for distributing OMP routes and enforcing centralized routing policy across the overlay?
vManagevBondvSmartvEdge
2. A TLOC in Cisco SD-WAN is uniquely identified by which tuple?
3. An architect needs to connect 200 branch sites with centralized security inspection at the data center. Which SD-WAN overlay topology is most appropriate?
Full meshHub-and-spokePartial mesh with no regional hubsPoint-to-point tunnels
4. Application-Aware Routing in SD-WAN uses which mechanism to measure tunnel quality in real time?
SNMP polling of interface countersNetFlow export analysisBFD probes measuring loss, latency, and jitterICMP echo requests to the vSmart controller
5. In SD-Access, what protocol provides the control plane by mapping Endpoint Identifiers (EIDs) to Routing Locators (RLOCs)?
VXLANIS-ISLISPBGP EVPN
6. What is the primary benefit of the anycast gateway in an SD-Access fabric?
It provides WAN failover between transport linksIt enables seamless endpoint mobility without IP reconfiguration by presenting the same gateway IP and MAC on every fabric edgeIt encrypts all east-west traffic within the fabricIt replaces the need for a LISP control plane node
7. An enterprise wants to enforce different access policies for employees, contractors, and IoT devices within the same Virtual Network. Which SD-Access mechanism provides this granular control?
VPN segmentation via VRF instancesScalable Group Tags (SGTs) with SGACLsVXLAN Network Identifiers (VNIs)Access port VLAN assignments
8. In ACI, what is the default behavior for traffic between two Endpoint Groups (EPGs) that have no Contract defined between them?
Traffic is permitted with loggingTraffic is rate-limited to 1 MbpsTraffic is deniedTraffic is permitted but unencrypted
9. An organization has two geographically distributed data centers and needs independent fault domains with the ability to stretch policies across sites. Which ACI extension model should they use?
ACI Multi-PodACI Multi-Site with Nexus Dashboard OrchestratorACI Single-Pod with stretched VLANsVXLAN flood-and-learn between sites
10. When integrating SD-Access and SD-WAN, what is the preferred approach for new deployments that provides end-to-end SGT propagation?
IP Transit with SXP for SGT exchangeGRE tunnels between border nodesIntegrated Domain consolidating SD-Access border and SD-WAN edge functionsStatic VRF-to-VPN mapping with no SGT propagation
11. During an SD-WAN migration, why should data center and hub sites be migrated before remote branches?
Hub sites require less testing than branch sitesHub sites serve as transit points routing traffic between SD-WAN and legacy sites during the transitionBranch sites cannot connect to the overlay until all hub sites are decommissionedvManage can only be installed at data center sites
12. What happens to forwarding in an ACI fabric if all three APIC controllers fail simultaneously?
All traffic stops immediately because APIC is in the data pathThe fabric continues forwarding using its last-known configurationOnly spine-to-spine traffic continues; leaf-to-leaf failsThe fabric reverts to a default allow-all policy
13. Why are odd-numbered controller clusters recommended for SD-WAN deployments?
Odd numbers provide better CPU load distributionEven-numbered clusters cannot replicate the configuration databaseOdd numbers avoid split-brain scenarios during network partitionsLicensing requires an odd number of controllers
14. In SD-Access, which component serves as the common policy anchor for SGT assignment and propagation across all three domains (SD-Access, SD-WAN, ACI)?
15. A fabric edge node in SD-Access detects a new endpoint. What is the correct sequence of events?
The edge floods the endpoint's MAC to all other edges, then registers with the control planeThe edge registers the EID-to-RLOC mapping with the control plane node (Map-Server), and other edges query the Map-Resolver when they need to reach that endpointThe edge sends the endpoint's IP to vSmart, which distributes it via OMPThe edge creates a static ARP entry and pushes it to all border nodes
Cisco SD-WAN centralizes WAN intelligence through four functional planes, each served by a dedicated component. Think of vSmart as a route reflector that distributes not just routing information, but also security keys and policy directives through the Overlay Management Protocol (OMP).
Device authentication, NAT traversal, Zero Touch Provisioning (ZTP)
vSmart
Control
Route distribution via OMP, policy enforcement, crypto key orchestration
vEdge / cEdge
Data
Tunnel endpoints forwarding encrypted traffic between sites
flowchart LR
subgraph Management Plane
vManage["vManage\nConfig, Monitoring,\nREST API, RBAC"]
end
subgraph Orchestration Plane
vBond["vBond\nAuthentication,\nNAT Traversal, ZTP"]
end
subgraph Control Plane
vSmart["vSmart\nOMP Route Distribution,\nPolicy, Crypto Keys"]
end
subgraph Data Plane
Edge1["WAN Edge 1\nIPsec Tunnels"]
Edge2["WAN Edge 2\nIPsec Tunnels"]
end
vManage <-->|"Management"| vSmart
vManage <-->|"Management"| vBond
vBond -->|"Auth and Discovery"| Edge1
vBond -->|"Auth and Discovery"| Edge2
vSmart <-->|"OMP"| Edge1
vSmart <-->|"OMP"| Edge2
Edge1 <-->|"IPsec Data Plane"| Edge2
Figure 8.1: Cisco SD-WAN four-plane architecture with controller and edge components
OMP runs on TCP port 12346 and distributes five categories of information: TLOCs (transport locators identifying each WAN circuit), routes (VPN reachability), service routes (for firewall and load balancer chaining), security keys (IPsec key rotation), and policies (traffic engineering and segmentation rules).
Animation: OMP distributing TLOCs, routes, and security keys from vSmart to WAN Edge devices, showing the unified control-plane protocol flow
Key Points -- SD-WAN Components
Four planes: management (vManage), orchestration (vBond), control (vSmart), data (vEdge/cEdge)
OMP is the unified control-plane protocol carrying routes, TLOCs, security keys, service routes, and policies
vSmart functions like a BGP route reflector but for the entire SD-WAN overlay
vBond handles initial device authentication and NAT traversal for Zero Touch Provisioning
8.1.2 Overlay and Underlay Design
The underlay provides IP reachability between TLOCs. SD-WAN treats all transports (MPLS, broadband, LTE, 5G, satellite) as equivalent pipes differentiated by color labels. This transport independence lets organizations mix carriers and technologies without redesigning the overlay.
The overlay is a virtual IP fabric built with IPsec tunnels between TLOCs. All tunnels are encrypted by default with AES-256-GCM. Topology selection is a critical design decision:
Topology
Tunnel Count
Latency
Best For
Full Mesh
O(n²)
Optimal (direct path)
Small-to-medium deployments with site-to-site traffic
Hub-and-Spoke
O(n)
Higher (transit via hub)
Centralized applications, security inspection at hub
VPN Segmentation provides isolated routing domains within the overlay. Each VPN is functionally equivalent to a VRF. Inter-VPN traffic requires explicit service insertion (e.g., a firewall) -- VPNs are isolated by default.
Key Points -- Overlay and Underlay
The underlay's only job is IP reachability between TLOCs; all transports are color-labeled and treated equally
Full mesh scales as O(n²) tunnels, hub-and-spoke as O(n), partial mesh balances between the two
All overlay tunnels encrypted by default with AES-256-GCM
8.1.3 Transport Independence and Path Selection Policies
Application-Aware Routing (AAR) continuously monitors overlay tunnel quality using BFD probes that measure loss, latency, and jitter. When a transport violates SLA thresholds, traffic is automatically rerouted to the best alternative path.
graph TD
A["Traffic Arrives at WAN Edge"] --> B["NBAR2 Classifies Application\n(L3-L7 DPI)"]
B --> C["Match to SLA Class\n(Loss / Latency / Jitter)"]
C --> D["BFD Probes Measure\nAll Tunnel Paths"]
D --> E{"Path Meets\nSLA Thresholds?"}
E -->|"Yes"| F["Forward on\nPreferred Path"]
E -->|"No"| G["Reroute to Best\nAlternative Path"]
G --> F
The AAR workflow: (1) define SLA classes with thresholds, (2) NBAR2 classifies traffic via deep packet inspection, (3) BFD probes measure all paths, (4) traffic dynamically steers to compliant paths. QoS integration maps DSCP markings between overlay and underlay, configured centrally in vManage.
Animation: Voice traffic flowing over MPLS, then automatically switching to broadband when MPLS latency exceeds 150ms threshold -- demonstrating AAR in action
Key Points -- AAR and Path Selection
AAR uses BFD probes to measure loss, latency, and jitter on all overlay tunnels in real time
NBAR2 provides L3-L7 deep packet inspection for application classification
QoS configuration is centralized in vManage and pushed to all edges
8.1.4 SD-WAN High Availability and Redundancy
Controller redundancy: vManage clusters of 3+ nodes on the same L2 subnet; vSmart/vBond minimum of 2 (ideally 3+) with DNS round-robin. Odd-numbered clusters avoid split-brain during partitions.
WAN Edge redundancy: Dual routers per site with multiple transports (MPLS + broadband + LTE) for N+1 or N+2 path diversity.
BFD tuning: Default 1000ms/7x for most deployments. Aggressive 300ms/3x for high-quality MPLS. Conservative timers for lossy broadband to prevent false tunnel flaps.
Migration sequence: Controllers first, then DC/hub sites, then remote branches -- ensuring hub sites can route between SD-WAN and legacy during transition.
Key Points -- HA and Redundancy
Deploy controllers in odd-numbered clusters (3+) to avoid split-brain scenarios
BFD timer tuning depends on transport quality: aggressive for MPLS, conservative for broadband
Migration order: controllers, then hubs/DCs, then branches (core-outward)
8.2 Software-Defined Access Design
8.2.1 VXLAN Fabric Overlay Design
VXLAN provides the SD-Access data plane, encapsulating Layer 2 frames inside IP/UDP headers (port 4789) for transport over a Layer 3 routed underlay. The 24-bit VNI supports up to ~16 million segments, far exceeding the 4,096 VLAN limit.
Construct
Function
Scale
VTEP
Encapsulates/decapsulates VXLAN frames
One per fabric edge/border
VNI
24-bit segment ID replacing VLANs
~16 million segments
L2 VNI
Maps VLANs to overlay segments
Per-VLAN basis
L3 VNI
Maps VRFs to overlay segments
Per-VRF basis
SGT in VXLAN Header
Carries Scalable Group Tag for policy
Up to 64,000 groups
Fabric device roles:
Fabric Edge Node -- Access-layer switch (Catalyst 9000) acting as LISP xTR. Provides anycast gateway, 802.1X authentication, and VXLAN encapsulation.
Fabric Border Node -- Connects fabric to external networks as LISP PxTR. Variants: Internal (known routes), External (default route), Combined.
Fabric Control Plane Node -- Runs LISP Map-Server/Map-Resolver; recommended as a dedicated pair per site.
Figure 8.4: SD-Access fabric device roles and their relationships
Animation: An endpoint connecting to a fabric edge, triggering 802.1X authentication, VXLAN encapsulation, and EID-to-RLOC registration with the control plane
Key Points -- VXLAN Fabric
VXLAN provides MAC-in-IP encapsulation on UDP port 4789; 24-bit VNI supports ~16 million segments
SD-Access extends the VXLAN header to carry SGT for inline group-based policy
Anycast gateway presents identical IP/MAC on every fabric edge, enabling seamless endpoint mobility
Four device roles: Edge (access), Border (external connectivity), Control Plane (LISP), Intermediate (underlay only)
8.2.2 LISP-Based Control Plane
LISP separates endpoint identity from network location. An EID (Endpoint Identifier) is the endpoint's IP address (identity), while an RLOC (Routing Locator) is the loopback of the fabric node where the endpoint attaches (location). The Mapping System (Map-Server/Map-Resolver) resolves EID-to-RLOC lookups -- analogous to DNS.
sequenceDiagram
participant EP as Endpoint
participant FE1 as Fabric Edge 1 (Source xTR)
participant MS as Control Plane Node (Map-Server/Resolver)
participant FE2 as Fabric Edge 2 (Destination xTR)
participant DST as Destination Host
EP->>FE1: Connects and Authenticates
FE1->>MS: Register EID-to-RLOC Mapping
Note over FE1,MS: EID=10.1.1.5 to RLOC=192.168.1.1
FE1->>MS: Map-Request (where is DST?)
MS->>FE1: Map-Reply (RLOC=192.168.2.1)
FE1->>FE2: VXLAN-Encapsulated Traffic
FE2->>DST: Decapsulated Original Frame
Figure 8.5: LISP control plane resolution and VXLAN data forwarding
Key benefits: (1) underlay routing tables contain only RLOC entries (fabric node loopbacks), not individual host routes, keeping FIBs compact; (2) subnets can stretch across multiple fabric edges without Layer 2 flooding or spanning-tree complexity. LISP Instance IDs map to VXLAN VNIs for network virtualization.
Key Points -- LISP Control Plane
LISP separates identity (EID) from location (RLOC); the mapping system resolves lookups like DNS
When endpoints move, only the EID-to-RLOC mapping updates -- no IP reconfiguration needed
LISP Instance IDs map directly to VXLAN VNIs for per-tenant network virtualization
8.2.3 Macro and Micro-Segmentation with SGTs
Macro-segmentation uses Virtual Networks (VRF + LISP Instance ID + unique L3 VNI) for complete traffic isolation between user communities (e.g., Corporate, IoT, Guest). Traffic between VNs requires a fusion device (typically a firewall).
Micro-segmentation uses SGTs (16-bit values assigned via Cisco ISE during authentication) for granular access control within a VN. SGACLs define source-to-destination SGT policies. The critical advantage: policies follow the user, not the port or VLAN.
SGT propagation methods: VXLAN header (inline, primary in fabric), CMD header (L2 links between TrustSec devices), SXP (TCP-based, for non-TrustSec devices).
Key Points -- Segmentation
Two tiers: macro-segmentation (Virtual Networks/VRFs) for broad isolation, micro-segmentation (SGTs/SGACLs) for granular policy
SGTs are 16-bit values assigned at authentication; policies follow the user across wired, wireless, and VPN
Inter-VN traffic must traverse a fusion firewall; intra-VN policy enforced via SGACL source-destination matrix
ACI uses a spine-leaf (Clos) topology on Nexus 9000 switches. Every leaf connects to every spine; no direct leaf-to-leaf or spine-to-spine links. This delivers predictable latency (one spine hop for any server-to-server path) and bandwidth scaling via ECMP.
APIC Controller Cluster: Typically 3 APICs attached to leaf switches. APIC is the single source of truth for configuration but is not in the data-forwarding path -- if all APICs fail, the fabric continues forwarding with its last-known configuration.
Construct
Purpose
Analogy
Tenant
Top-level isolation container
A building in a campus
VRF
Layer 3 forwarding domain
A floor within the building
Bridge Domain
Layer 2 forwarding domain
A wing on the floor
EPG
Logical grouping sharing policy
A department in the wing
Contract
Allowed communication between EPGs
A service agreement between departments
Application Profile
Groups related EPGs
An organizational chart
Critical design principle: In ACI, everything is denied by default. Communication between EPGs requires an explicit Contract. This whitelist model is the inverse of traditional networking.
Animation: ACI spine-leaf topology showing VXLAN encapsulation from leaf VTEP, traversing a spine, and arriving at destination leaf VTEP -- with the APIC cluster managing policy from the side
Key Points -- ACI
Spine-leaf Clos topology: every leaf to every spine, predictable one-hop latency, ECMP scaling
APIC is out-of-band: fabric survives APIC failure with last-known configuration
Policy hierarchy: Tenant, VRF, Bridge Domain, EPG, Contract -- deny-all by default between EPGs
ACI uses iVXLAN (proprietary extension) within the fabric for data plane forwarding
SD-Access + SD-WAN: Integrated Domain (preferred) consolidates border/edge functions on Catalyst 8500 with end-to-end SGT propagation. IP Transit is simpler but less seamless.
SD-WAN + ACI: Tenant VRFs map to SD-WAN service VPNs; ACI border leafs use L3Outs with BGP peering per tenant. EPG-to-SGT requires ISE/pxGrid.
SD-Access + ACI: Border nodes peer with ACI border leafs; CTS inline tagging propagates SGTs; VRF stitching through fusion firewalls.
flowchart LR
subgraph Campus["SD-Access Domain"]
CC["Catalyst Center"]
ISE["Cisco ISE\n(Policy Anchor)"]
SDA_Border["SD-Access\nBorder Node"]
end
subgraph WAN["SD-WAN Domain"]
vManage["vManage"]
WAN_Edge["WAN Edge /\nCatalyst 8500"]
end
subgraph DC["ACI Domain"]
APIC["APIC Cluster"]
NDO["Nexus Dashboard\nOrchestrator"]
ACI_Border["ACI Border\nLeaf"]
end
CC <-->|"VN-to-VPN\nMapping"| vManage
ISE <-->|"SGT Policy"| CC
ISE <-->|"pxGrid\nEPG-to-SGT"| APIC
SDA_Border <-->|"VXLAN + SGT"| WAN_Edge
WAN_Edge <-->|"L3Out / BGP\nper Tenant VRF"| ACI_Border
NDO <-->|"Policy\nOrchestration"| APIC
Figure 8.6: Multi-domain integration across SD-Access, SD-WAN, and ACI
Key Points -- Multi-Domain Integration
Multi-Pod shares one APIC cluster (same metro); Multi-Site uses independent clusters with Nexus Dashboard Orchestrator (geo-distributed)
Integrated Domain is the preferred SD-Access + SD-WAN approach: end-to-end SGT via VXLAN headers across WAN
Cisco ISE is the common policy anchor across all three domains, using pxGrid for ACI EPG-to-SGT translation
Controller coordination (Catalyst Center to vManage) automates VPN-to-VN mapping for consistent segmentation
8.3.3 Migration Strategies
All three architectures follow the same migration principles: deploy incrementally, maintain coexistence with legacy networks, use border/aggregation layers as integration points, and validate monitoring tools before migrating production traffic.
SD-WAN: Controllers first, then DC/hubs, then branches. Decommission legacy only after full validation.
SD-Access: Deploy Catalyst Center + ISE, build IS-IS underlay alongside existing network, migrate building-by-building, enable SGACLs incrementally (monitor-only first).
ACI: Deploy spine-leaf alongside legacy DC, use L3Outs for BGP peering with legacy, migrate EPG-by-EPG, use migration mode initially then tighten policies.
Key Points -- Migration
Always migrate incrementally with rollback plans; never forklift-replace
Border/aggregation layers bridge legacy and fabric infrastructure during transition
Start segmentation in monitor-only mode, then enforce after validating assignments
Validate assurance tools (vAnalytics, Catalyst Center Assurance, APIC health scores) before production migration
Post-Study Knowledge Check
Now that you have studied the material, answer the same questions again to measure your learning progress.
Post-Quiz
1. An enterprise is deploying Cisco SD-WAN. Which component is responsible for distributing OMP routes and enforcing centralized routing policy across the overlay?
vManagevBondvSmartvEdge
2. A TLOC in Cisco SD-WAN is uniquely identified by which tuple?
3. An architect needs to connect 200 branch sites with centralized security inspection at the data center. Which SD-WAN overlay topology is most appropriate?
Full meshHub-and-spokePartial mesh with no regional hubsPoint-to-point tunnels
4. Application-Aware Routing in SD-WAN uses which mechanism to measure tunnel quality in real time?
SNMP polling of interface countersNetFlow export analysisBFD probes measuring loss, latency, and jitterICMP echo requests to the vSmart controller
5. In SD-Access, what protocol provides the control plane by mapping Endpoint Identifiers (EIDs) to Routing Locators (RLOCs)?
VXLANIS-ISLISPBGP EVPN
6. What is the primary benefit of the anycast gateway in an SD-Access fabric?
It provides WAN failover between transport linksIt enables seamless endpoint mobility without IP reconfiguration by presenting the same gateway IP and MAC on every fabric edgeIt encrypts all east-west traffic within the fabricIt replaces the need for a LISP control plane node
7. An enterprise wants to enforce different access policies for employees, contractors, and IoT devices within the same Virtual Network. Which SD-Access mechanism provides this granular control?
VPN segmentation via VRF instancesScalable Group Tags (SGTs) with SGACLsVXLAN Network Identifiers (VNIs)Access port VLAN assignments
8. In ACI, what is the default behavior for traffic between two Endpoint Groups (EPGs) that have no Contract defined between them?
Traffic is permitted with loggingTraffic is rate-limited to 1 MbpsTraffic is deniedTraffic is permitted but unencrypted
9. An organization has two geographically distributed data centers and needs independent fault domains with the ability to stretch policies across sites. Which ACI extension model should they use?
ACI Multi-PodACI Multi-Site with Nexus Dashboard OrchestratorACI Single-Pod with stretched VLANsVXLAN flood-and-learn between sites
10. When integrating SD-Access and SD-WAN, what is the preferred approach for new deployments that provides end-to-end SGT propagation?
IP Transit with SXP for SGT exchangeGRE tunnels between border nodesIntegrated Domain consolidating SD-Access border and SD-WAN edge functionsStatic VRF-to-VPN mapping with no SGT propagation
11. During an SD-WAN migration, why should data center and hub sites be migrated before remote branches?
Hub sites require less testing than branch sitesHub sites serve as transit points routing traffic between SD-WAN and legacy sites during the transitionBranch sites cannot connect to the overlay until all hub sites are decommissionedvManage can only be installed at data center sites
12. What happens to forwarding in an ACI fabric if all three APIC controllers fail simultaneously?
All traffic stops immediately because APIC is in the data pathThe fabric continues forwarding using its last-known configurationOnly spine-to-spine traffic continues; leaf-to-leaf failsThe fabric reverts to a default allow-all policy
13. Why are odd-numbered controller clusters recommended for SD-WAN deployments?
Odd numbers provide better CPU load distributionEven-numbered clusters cannot replicate the configuration databaseOdd numbers avoid split-brain scenarios during network partitionsLicensing requires an odd number of controllers
14. In SD-Access, which component serves as the common policy anchor for SGT assignment and propagation across all three domains (SD-Access, SD-WAN, ACI)?
15. A fabric edge node in SD-Access detects a new endpoint. What is the correct sequence of events?
The edge floods the endpoint's MAC to all other edges, then registers with the control planeThe edge registers the EID-to-RLOC mapping with the control plane node (Map-Server), and other edges query the Map-Resolver when they need to reach that endpointThe edge sends the endpoint's IP to vSmart, which distributes it via OMPThe edge creates a static ARP entry and pushes it to all border nodes