Chapter 8: Software-Defined Architecture and SD-WAN Design

Learning Objectives

Pre-Study Knowledge Check

Answer these questions before studying the material to establish your baseline understanding.

Pre-Quiz

1. An enterprise is deploying Cisco SD-WAN. Which component is responsible for distributing OMP routes and enforcing centralized routing policy across the overlay?

vManage vBond vSmart vEdge

2. A TLOC in Cisco SD-WAN is uniquely identified by which tuple?

System-IP, interface-name, encapsulation System-IP, color, encapsulation Site-ID, hostname, transport-type VPN-ID, color, interface-name

3. An architect needs to connect 200 branch sites with centralized security inspection at the data center. Which SD-WAN overlay topology is most appropriate?

Full mesh Hub-and-spoke Partial mesh with no regional hubs Point-to-point tunnels

4. Application-Aware Routing in SD-WAN uses which mechanism to measure tunnel quality in real time?

SNMP polling of interface counters NetFlow export analysis BFD probes measuring loss, latency, and jitter ICMP echo requests to the vSmart controller

5. In SD-Access, what protocol provides the control plane by mapping Endpoint Identifiers (EIDs) to Routing Locators (RLOCs)?

VXLAN IS-IS LISP BGP EVPN

6. What is the primary benefit of the anycast gateway in an SD-Access fabric?

It provides WAN failover between transport links It enables seamless endpoint mobility without IP reconfiguration by presenting the same gateway IP and MAC on every fabric edge It encrypts all east-west traffic within the fabric It replaces the need for a LISP control plane node

7. An enterprise wants to enforce different access policies for employees, contractors, and IoT devices within the same Virtual Network. Which SD-Access mechanism provides this granular control?

VPN segmentation via VRF instances Scalable Group Tags (SGTs) with SGACLs VXLAN Network Identifiers (VNIs) Access port VLAN assignments

8. In ACI, what is the default behavior for traffic between two Endpoint Groups (EPGs) that have no Contract defined between them?

Traffic is permitted with logging Traffic is rate-limited to 1 Mbps Traffic is denied Traffic is permitted but unencrypted

9. An organization has two geographically distributed data centers and needs independent fault domains with the ability to stretch policies across sites. Which ACI extension model should they use?

ACI Multi-Pod ACI Multi-Site with Nexus Dashboard Orchestrator ACI Single-Pod with stretched VLANs VXLAN flood-and-learn between sites

10. When integrating SD-Access and SD-WAN, what is the preferred approach for new deployments that provides end-to-end SGT propagation?

IP Transit with SXP for SGT exchange GRE tunnels between border nodes Integrated Domain consolidating SD-Access border and SD-WAN edge functions Static VRF-to-VPN mapping with no SGT propagation

11. During an SD-WAN migration, why should data center and hub sites be migrated before remote branches?

Hub sites require less testing than branch sites Hub sites serve as transit points routing traffic between SD-WAN and legacy sites during the transition Branch sites cannot connect to the overlay until all hub sites are decommissioned vManage can only be installed at data center sites

12. What happens to forwarding in an ACI fabric if all three APIC controllers fail simultaneously?

All traffic stops immediately because APIC is in the data path The fabric continues forwarding using its last-known configuration Only spine-to-spine traffic continues; leaf-to-leaf fails The fabric reverts to a default allow-all policy

13. Why are odd-numbered controller clusters recommended for SD-WAN deployments?

Odd numbers provide better CPU load distribution Even-numbered clusters cannot replicate the configuration database Odd numbers avoid split-brain scenarios during network partitions Licensing requires an odd number of controllers

14. In SD-Access, which component serves as the common policy anchor for SGT assignment and propagation across all three domains (SD-Access, SD-WAN, ACI)?

Catalyst Center APIC Cisco ISE Nexus Dashboard Orchestrator

15. A fabric edge node in SD-Access detects a new endpoint. What is the correct sequence of events?

The edge floods the endpoint's MAC to all other edges, then registers with the control plane The edge registers the EID-to-RLOC mapping with the control plane node (Map-Server), and other edges query the Map-Resolver when they need to reach that endpoint The edge sends the endpoint's IP to vSmart, which distributes it via OMP The edge creates a static ARP entry and pushes it to all border nodes

8.1 SD-WAN Architecture Design

8.1.1 Cisco SD-WAN (Viptela) Architecture Components

Cisco SD-WAN centralizes WAN intelligence through four functional planes, each served by a dedicated component. Think of vSmart as a route reflector that distributes not just routing information, but also security keys and policy directives through the Overlay Management Protocol (OMP).

ComponentPlanePrimary Function
vManageManagementCentralized configuration, monitoring, policy authoring, REST API, RBAC
vBondOrchestrationDevice authentication, NAT traversal, Zero Touch Provisioning (ZTP)
vSmartControlRoute distribution via OMP, policy enforcement, crypto key orchestration
vEdge / cEdgeDataTunnel endpoints forwarding encrypted traffic between sites
flowchart LR subgraph Management Plane vManage["vManage\nConfig, Monitoring,\nREST API, RBAC"] end subgraph Orchestration Plane vBond["vBond\nAuthentication,\nNAT Traversal, ZTP"] end subgraph Control Plane vSmart["vSmart\nOMP Route Distribution,\nPolicy, Crypto Keys"] end subgraph Data Plane Edge1["WAN Edge 1\nIPsec Tunnels"] Edge2["WAN Edge 2\nIPsec Tunnels"] end vManage <-->|"Management"| vSmart vManage <-->|"Management"| vBond vBond -->|"Auth and Discovery"| Edge1 vBond -->|"Auth and Discovery"| Edge2 vSmart <-->|"OMP"| Edge1 vSmart <-->|"OMP"| Edge2 Edge1 <-->|"IPsec Data Plane"| Edge2

Figure 8.1: Cisco SD-WAN four-plane architecture with controller and edge components

OMP runs on TCP port 12346 and distributes five categories of information: TLOCs (transport locators identifying each WAN circuit), routes (VPN reachability), service routes (for firewall and load balancer chaining), security keys (IPsec key rotation), and policies (traffic engineering and segmentation rules).

Animation: OMP distributing TLOCs, routes, and security keys from vSmart to WAN Edge devices, showing the unified control-plane protocol flow

Key Points -- SD-WAN Components

8.1.2 Overlay and Underlay Design

The underlay provides IP reachability between TLOCs. SD-WAN treats all transports (MPLS, broadband, LTE, 5G, satellite) as equivalent pipes differentiated by color labels. This transport independence lets organizations mix carriers and technologies without redesigning the overlay.

The overlay is a virtual IP fabric built with IPsec tunnels between TLOCs. All tunnels are encrypted by default with AES-256-GCM. Topology selection is a critical design decision:

TopologyTunnel CountLatencyBest For
Full MeshO(n²)Optimal (direct path)Small-to-medium deployments with site-to-site traffic
Hub-and-SpokeO(n)Higher (transit via hub)Centralized applications, security inspection at hub
Partial MeshBetween O(n) and O(n²)BalancedLarge deployments with regional hubs
flowchart LR subgraph Full Mesh A1["Site A"] <--> B1["Site B"] A1 <--> C1["Site C"] B1 <--> C1 end subgraph Hub-and-Spoke Hub["Hub Site"] <--> S1["Spoke 1"] Hub <--> S2["Spoke 2"] Hub <--> S3["Spoke 3"] end subgraph Partial Mesh R1["Regional Hub 1"] <--> R2["Regional Hub 2"] R1 <--> P1["Spoke A"] R1 <--> P2["Spoke B"] R2 <--> P3["Spoke C"] end

Figure 8.2: SD-WAN overlay topology options

VPN Segmentation provides isolated routing domains within the overlay. Each VPN is functionally equivalent to a VRF. Inter-VPN traffic requires explicit service insertion (e.g., a firewall) -- VPNs are isolated by default.

Key Points -- Overlay and Underlay

8.1.3 Transport Independence and Path Selection Policies

Application-Aware Routing (AAR) continuously monitors overlay tunnel quality using BFD probes that measure loss, latency, and jitter. When a transport violates SLA thresholds, traffic is automatically rerouted to the best alternative path.

graph TD A["Traffic Arrives at WAN Edge"] --> B["NBAR2 Classifies Application\n(L3-L7 DPI)"] B --> C["Match to SLA Class\n(Loss / Latency / Jitter)"] C --> D["BFD Probes Measure\nAll Tunnel Paths"] D --> E{"Path Meets\nSLA Thresholds?"} E -->|"Yes"| F["Forward on\nPreferred Path"] E -->|"No"| G["Reroute to Best\nAlternative Path"] G --> F

Figure 8.3: Application-Aware Routing decision flow

The AAR workflow: (1) define SLA classes with thresholds, (2) NBAR2 classifies traffic via deep packet inspection, (3) BFD probes measure all paths, (4) traffic dynamically steers to compliant paths. QoS integration maps DSCP markings between overlay and underlay, configured centrally in vManage.

Animation: Voice traffic flowing over MPLS, then automatically switching to broadband when MPLS latency exceeds 150ms threshold -- demonstrating AAR in action

Key Points -- AAR and Path Selection

8.1.4 SD-WAN High Availability and Redundancy

Controller redundancy: vManage clusters of 3+ nodes on the same L2 subnet; vSmart/vBond minimum of 2 (ideally 3+) with DNS round-robin. Odd-numbered clusters avoid split-brain during partitions.

WAN Edge redundancy: Dual routers per site with multiple transports (MPLS + broadband + LTE) for N+1 or N+2 path diversity.

BFD tuning: Default 1000ms/7x for most deployments. Aggressive 300ms/3x for high-quality MPLS. Conservative timers for lossy broadband to prevent false tunnel flaps.

Migration sequence: Controllers first, then DC/hub sites, then remote branches -- ensuring hub sites can route between SD-WAN and legacy during transition.

Key Points -- HA and Redundancy

8.2 Software-Defined Access Design

8.2.1 VXLAN Fabric Overlay Design

VXLAN provides the SD-Access data plane, encapsulating Layer 2 frames inside IP/UDP headers (port 4789) for transport over a Layer 3 routed underlay. The 24-bit VNI supports up to ~16 million segments, far exceeding the 4,096 VLAN limit.

ConstructFunctionScale
VTEPEncapsulates/decapsulates VXLAN framesOne per fabric edge/border
VNI24-bit segment ID replacing VLANs~16 million segments
L2 VNIMaps VLANs to overlay segmentsPer-VLAN basis
L3 VNIMaps VRFs to overlay segmentsPer-VRF basis
SGT in VXLAN HeaderCarries Scalable Group Tag for policyUp to 64,000 groups

Fabric device roles:

graph TD CPN["Fabric Control Plane Node\n(LISP Map-Server/Resolver)"] BN_Int["Internal Border Node\n(Known Routes)"] BN_Ext["External Border Node\n(Default Route)"] IntNode["Intermediate Nodes\n(IS-IS Underlay Only)"] FE1["Fabric Edge Node 1\n(Anycast GW, 802.1X, VXLAN)"] FE2["Fabric Edge Node 2\n(Anycast GW, 802.1X, VXLAN)"] CPN <-->|"EID-RLOC\nRegistration"| FE1 CPN <-->|"EID-RLOC\nRegistration"| FE2 BN_Int -->|"To DC / Firewall"| ExtNet["Data Center and\nInternal Networks"] BN_Ext -->|"Default Route"| WAN["Internet / WAN"] FE1 <-->|"VXLAN\nOverlay"| IntNode FE2 <-->|"VXLAN\nOverlay"| IntNode IntNode <--> BN_Int IntNode <--> BN_Ext FE1 --- EP1["Wired and Wireless\nEndpoints"] FE2 --- EP2["Wired and Wireless\nEndpoints"]

Figure 8.4: SD-Access fabric device roles and their relationships

Animation: An endpoint connecting to a fabric edge, triggering 802.1X authentication, VXLAN encapsulation, and EID-to-RLOC registration with the control plane

Key Points -- VXLAN Fabric

8.2.2 LISP-Based Control Plane

LISP separates endpoint identity from network location. An EID (Endpoint Identifier) is the endpoint's IP address (identity), while an RLOC (Routing Locator) is the loopback of the fabric node where the endpoint attaches (location). The Mapping System (Map-Server/Map-Resolver) resolves EID-to-RLOC lookups -- analogous to DNS.

sequenceDiagram participant EP as Endpoint participant FE1 as Fabric Edge 1 (Source xTR) participant MS as Control Plane Node (Map-Server/Resolver) participant FE2 as Fabric Edge 2 (Destination xTR) participant DST as Destination Host EP->>FE1: Connects and Authenticates FE1->>MS: Register EID-to-RLOC Mapping Note over FE1,MS: EID=10.1.1.5 to RLOC=192.168.1.1 FE1->>MS: Map-Request (where is DST?) MS->>FE1: Map-Reply (RLOC=192.168.2.1) FE1->>FE2: VXLAN-Encapsulated Traffic FE2->>DST: Decapsulated Original Frame

Figure 8.5: LISP control plane resolution and VXLAN data forwarding

Key benefits: (1) underlay routing tables contain only RLOC entries (fabric node loopbacks), not individual host routes, keeping FIBs compact; (2) subnets can stretch across multiple fabric edges without Layer 2 flooding or spanning-tree complexity. LISP Instance IDs map to VXLAN VNIs for network virtualization.

Key Points -- LISP Control Plane

8.2.3 Macro and Micro-Segmentation with SGTs

Macro-segmentation uses Virtual Networks (VRF + LISP Instance ID + unique L3 VNI) for complete traffic isolation between user communities (e.g., Corporate, IoT, Guest). Traffic between VNs requires a fusion device (typically a firewall).

Micro-segmentation uses SGTs (16-bit values assigned via Cisco ISE during authentication) for granular access control within a VN. SGACLs define source-to-destination SGT policies. The critical advantage: policies follow the user, not the port or VLAN.

SGT propagation methods: VXLAN header (inline, primary in fabric), CMD header (L2 links between TrustSec devices), SXP (TCP-based, for non-TrustSec devices).

Key Points -- Segmentation

8.3 Fabric and Overlay Integration

8.3.1 ACI Fabric Design for Data Center Networks

ACI uses a spine-leaf (Clos) topology on Nexus 9000 switches. Every leaf connects to every spine; no direct leaf-to-leaf or spine-to-spine links. This delivers predictable latency (one spine hop for any server-to-server path) and bandwidth scaling via ECMP.

APIC Controller Cluster: Typically 3 APICs attached to leaf switches. APIC is the single source of truth for configuration but is not in the data-forwarding path -- if all APICs fail, the fabric continues forwarding with its last-known configuration.

ConstructPurposeAnalogy
TenantTop-level isolation containerA building in a campus
VRFLayer 3 forwarding domainA floor within the building
Bridge DomainLayer 2 forwarding domainA wing on the floor
EPGLogical grouping sharing policyA department in the wing
ContractAllowed communication between EPGsA service agreement between departments
Application ProfileGroups related EPGsAn organizational chart

Critical design principle: In ACI, everything is denied by default. Communication between EPGs requires an explicit Contract. This whitelist model is the inverse of traditional networking.

Animation: ACI spine-leaf topology showing VXLAN encapsulation from leaf VTEP, traversing a spine, and arriving at destination leaf VTEP -- with the APIC cluster managing policy from the side

Key Points -- ACI

8.3.2 Multi-Site and Multi-Domain Integration

FeatureACI Multi-PodACI Multi-Site
APIC ClusterShared (single)Independent per site
Fault DomainSharedIsolated per site
Policy ManagementSingle APICNexus Dashboard Orchestrator
IPN RequirementsLow-latency, losslessStandard IP connectivity
Use CaseCo-located pods, same metroGeo-distributed data centers

Multi-domain integration connects SD-Access (campus), SD-WAN (WAN), and ACI (data center):

flowchart LR subgraph Campus["SD-Access Domain"] CC["Catalyst Center"] ISE["Cisco ISE\n(Policy Anchor)"] SDA_Border["SD-Access\nBorder Node"] end subgraph WAN["SD-WAN Domain"] vManage["vManage"] WAN_Edge["WAN Edge /\nCatalyst 8500"] end subgraph DC["ACI Domain"] APIC["APIC Cluster"] NDO["Nexus Dashboard\nOrchestrator"] ACI_Border["ACI Border\nLeaf"] end CC <-->|"VN-to-VPN\nMapping"| vManage ISE <-->|"SGT Policy"| CC ISE <-->|"pxGrid\nEPG-to-SGT"| APIC SDA_Border <-->|"VXLAN + SGT"| WAN_Edge WAN_Edge <-->|"L3Out / BGP\nper Tenant VRF"| ACI_Border NDO <-->|"Policy\nOrchestration"| APIC

Figure 8.6: Multi-domain integration across SD-Access, SD-WAN, and ACI

Key Points -- Multi-Domain Integration

8.3.3 Migration Strategies

All three architectures follow the same migration principles: deploy incrementally, maintain coexistence with legacy networks, use border/aggregation layers as integration points, and validate monitoring tools before migrating production traffic.

Key Points -- Migration

Post-Study Knowledge Check

Now that you have studied the material, answer the same questions again to measure your learning progress.

Post-Quiz

1. An enterprise is deploying Cisco SD-WAN. Which component is responsible for distributing OMP routes and enforcing centralized routing policy across the overlay?

vManage vBond vSmart vEdge

2. A TLOC in Cisco SD-WAN is uniquely identified by which tuple?

System-IP, interface-name, encapsulation System-IP, color, encapsulation Site-ID, hostname, transport-type VPN-ID, color, interface-name

3. An architect needs to connect 200 branch sites with centralized security inspection at the data center. Which SD-WAN overlay topology is most appropriate?

Full mesh Hub-and-spoke Partial mesh with no regional hubs Point-to-point tunnels

4. Application-Aware Routing in SD-WAN uses which mechanism to measure tunnel quality in real time?

SNMP polling of interface counters NetFlow export analysis BFD probes measuring loss, latency, and jitter ICMP echo requests to the vSmart controller

5. In SD-Access, what protocol provides the control plane by mapping Endpoint Identifiers (EIDs) to Routing Locators (RLOCs)?

VXLAN IS-IS LISP BGP EVPN

6. What is the primary benefit of the anycast gateway in an SD-Access fabric?

It provides WAN failover between transport links It enables seamless endpoint mobility without IP reconfiguration by presenting the same gateway IP and MAC on every fabric edge It encrypts all east-west traffic within the fabric It replaces the need for a LISP control plane node

7. An enterprise wants to enforce different access policies for employees, contractors, and IoT devices within the same Virtual Network. Which SD-Access mechanism provides this granular control?

VPN segmentation via VRF instances Scalable Group Tags (SGTs) with SGACLs VXLAN Network Identifiers (VNIs) Access port VLAN assignments

8. In ACI, what is the default behavior for traffic between two Endpoint Groups (EPGs) that have no Contract defined between them?

Traffic is permitted with logging Traffic is rate-limited to 1 Mbps Traffic is denied Traffic is permitted but unencrypted

9. An organization has two geographically distributed data centers and needs independent fault domains with the ability to stretch policies across sites. Which ACI extension model should they use?

ACI Multi-Pod ACI Multi-Site with Nexus Dashboard Orchestrator ACI Single-Pod with stretched VLANs VXLAN flood-and-learn between sites

10. When integrating SD-Access and SD-WAN, what is the preferred approach for new deployments that provides end-to-end SGT propagation?

IP Transit with SXP for SGT exchange GRE tunnels between border nodes Integrated Domain consolidating SD-Access border and SD-WAN edge functions Static VRF-to-VPN mapping with no SGT propagation

11. During an SD-WAN migration, why should data center and hub sites be migrated before remote branches?

Hub sites require less testing than branch sites Hub sites serve as transit points routing traffic between SD-WAN and legacy sites during the transition Branch sites cannot connect to the overlay until all hub sites are decommissioned vManage can only be installed at data center sites

12. What happens to forwarding in an ACI fabric if all three APIC controllers fail simultaneously?

All traffic stops immediately because APIC is in the data path The fabric continues forwarding using its last-known configuration Only spine-to-spine traffic continues; leaf-to-leaf fails The fabric reverts to a default allow-all policy

13. Why are odd-numbered controller clusters recommended for SD-WAN deployments?

Odd numbers provide better CPU load distribution Even-numbered clusters cannot replicate the configuration database Odd numbers avoid split-brain scenarios during network partitions Licensing requires an odd number of controllers

14. In SD-Access, which component serves as the common policy anchor for SGT assignment and propagation across all three domains (SD-Access, SD-WAN, ACI)?

Catalyst Center APIC Cisco ISE Nexus Dashboard Orchestrator

15. A fabric edge node in SD-Access detects a new endpoint. What is the correct sequence of events?

The edge floods the endpoint's MAC to all other edges, then registers with the control plane The edge registers the EID-to-RLOC mapping with the control plane node (Map-Server), and other edges query the Map-Resolver when they need to reach that endpoint The edge sends the endpoint's IP to vSmart, which distributes it via OMP The edge creates a static ARP entry and pushes it to all border nodes

Your Progress

Answer Explanations