Design scalable data center network architectures using spine-leaf and fabric topologies
Evaluate data center interconnect (DCI) options for multi-site architectures
Design data center networks that support workload mobility and application requirements
Pre-Study Assessment
1. Why has the spine-leaf architecture replaced the traditional three-tier data center design?
It reduces the total number of switches needed
It provides predictable single-hop latency and ECMP paths optimized for east-west traffic
It eliminates the need for any routing protocols
It supports only north-south traffic patterns more efficiently
2. What is the primary advantage of EVPN over flood-and-learn in a VXLAN fabric?
EVPN reduces the VXLAN encapsulation overhead from 50 bytes to 20 bytes
EVPN replaces BGP with a simpler OSPF-based control plane
EVPN distributes MAC/IP reachability via MP-BGP, eliminating inefficient flooding
EVPN allows VXLAN to operate without any underlay network
3. Why is symmetric IRB preferred over asymmetric IRB in large VXLAN EVPN fabrics?
Symmetric IRB is faster because it skips the routing step entirely
Symmetric IRB uses a transit L3 VNI so each leaf only needs locally attached VLANs, improving scalability
Symmetric IRB eliminates the need for VRFs in the fabric
Symmetric IRB requires fewer spine switches in the topology
4. When should you choose ACI Multi-Site over ACI Multi-Pod?
When you need seamless L2 extension within a metro area
When a single APIC cluster is sufficient for management
When strict fault domain isolation and geographic distance require independent APIC clusters per site
When the deployment has fewer than 200 leaf switches total
5. What is the role of DWDM in a data center interconnect design?
It replaces VXLAN as the overlay encapsulation protocol
It provides Layer 1 optical transport, multiplexing multiple wavelengths on a single fiber pair
It provides the control plane for MAC learning between sites
It is an alternative to spine-leaf for intra-DC connectivity
6. What is the biggest risk of extending Layer 2 between data centers without proper mitigation?
Increased north-south bandwidth consumption
A failure in one site (broadcast storm, STP miscalculation) can propagate and take down all connected sites
Loss of VXLAN encapsulation capability
Inability to use OSPF as the underlay routing protocol
7. In an active-active data center design, what prevents traffic from hairpinning across the DCI link when a VM migrates?
Static routes pointing to the nearest data center
Distributed anycast gateways with the same virtual IP and MAC at both sites
Disabling all L2 extension between sites
Using OTV instead of VXLAN for encapsulation
8. Why must FCoE traffic receive special handling in a VXLAN EVPN fabric?
FCoE uses a different UDP port than VXLAN
Fibre Channel demands lossless transport, requiring PFC and DCB with CoS-to-DSCP mapping across the routed fabric
FCoE is incompatible with spine-leaf topologies
FCoE requires dedicated spine switches separate from data traffic
9. Why is the one-arm routed model preferred for load balancer placement in EVPN fabrics?
It allows the load balancer to inspect all Layer 2 headers
It aligns with the L3-everywhere philosophy and enables optimal ECMP paths without L2 dependencies
It requires no IP address configuration on the load balancer
It eliminates the need for GSLB in multi-site deployments
10. What is the maximum RTT latency supported by ACI Multi-Pod between pods?
10 ms
50 ms
150 ms
500 ms
11. What is a key advantage of OTV over raw VLAN trunking for DCI?
OTV provides higher bandwidth than VLAN trunks
OTV natively isolates broadcast/flooding and prevents L2 loops between sites
OTV is a multi-vendor standard supported by all switch platforms
OTV eliminates the need for any IP connectivity between sites
12. Which EVPN route type is considered the workhorse for advertising host MAC and IP between VTEPs?
Type 1 (Ethernet Auto-Discovery)
Type 2 (MAC/IP Advertisement)
Type 3 (Inclusive Multicast Tag)
Type 5 (IP Prefix)
13. Why should all leaf-to-spine links in a Clos fabric use the same link speed?
Mixed speeds require additional spine switches
Equal link speeds enable proper ECMP load balancing; mismatched speeds cause uneven traffic distribution
BGP cannot advertise routes over links with different speeds
Mixed speeds prevent VXLAN encapsulation from functioning
14. What is the recommended underlay MTU for a VXLAN fabric, and why?
1500 bytes, the standard Ethernet MTU
9198 bytes (jumbo), to accommodate the 50-byte VXLAN overhead without fragmentation
4096 bytes, matching the maximum VLAN count
16000 bytes, to support the maximum VNI addressing space
15. In a combined ACI deployment, what is the common pattern for using Multi-Pod and Multi-Site together?
Multi-Site within a campus, Multi-Pod across regions
Multi-Pod within a metro area for unified management, Multi-Site across regions for fault isolation
Multi-Pod for storage traffic only, Multi-Site for compute traffic
They cannot be combined in the same deployment
11.1 Data Center Fabric Architecture
11.1.1 Spine-Leaf Topology Design and Scaling
The modern data center has shifted from the traditional three-tier design (access, aggregation, core) to the spine-leaf fabric architecture. This shift is driven by the explosion of east-west traffic from virtualization, microservices, and distributed storage. The spine-leaf design, rooted in Clos network theory from 1953, provides many parallel paths between any two endpoints.
The design consists of two layers: spine switches (the high-speed backbone) and leaf switches (host connectivity at the edge). The fundamental rules are strict: leaf switches connect only to spines, spines connect only to leaves. Every payload traverses exactly one spine hop, producing consistent and predictable latency.
Figure 11.1: Spine-Leaf Fabric Topology -- every leaf connects to every spine, providing ECMP paths and predictable single-hop latency
Animation: Traffic flow through a spine-leaf fabric showing ECMP path selection and linear scaling as new leaves/spines are added
Scaling: Add leaf switches for more server ports (each new leaf connects to every spine). Add spine switches for more bandwidth per path (every leaf-to-leaf path gains an ECMP path). All leaf-to-spine links must use the same speed for proper ECMP load balancing.
Underlay design: OSPF (single area, point-to-point interfaces) or eBGP (unique AS per device for fault isolation). Best practices include unnumbered interfaces and jumbo MTU (9198 bytes) to accommodate the 50-byte VXLAN overhead.
Key Points: Spine-Leaf Topology
Two-tier Clos design: every leaf connects to every spine, providing predictable single-hop latency and ECMP paths
Scale out by adding leaves (more ports) or spines (more bandwidth) -- no existing wiring changes needed
Underlay uses OSPF (single area) or eBGP (per-device AS), with unnumbered interfaces and 9198-byte jumbo MTU
All leaf-to-spine links must use the same speed for proper ECMP load balancing
11.1.2 VXLAN EVPN Fabric Design
VXLAN encapsulates Layer 2 Ethernet frames inside Layer 3 UDP packets (port 4789), allowing L2 segments to stretch across the routed underlay. Each virtual network is identified by a 24-bit VNI, supporting approximately 16 million segments (vs. 4,096 VLANs in 802.1Q). VTEPs on leaf switches handle encapsulation/decapsulation, adding approximately 50 bytes of overhead.
Figure 11.2: VXLAN EVPN Fabric -- MP-BGP distributes MAC/IP reachability between VTEPs across the routed spine-leaf underlay
Animation: EVPN Type 2 route advertisement flow -- host connects to leaf, MAC/IP advertised via MP-BGP to all VTEPs, eliminating flood-and-learn
EVPN Route Types: Type 1 (Ethernet Auto-Discovery) for multi-homing and fast convergence. Type 2 (MAC/IP Advertisement) -- the workhorse, carrying MAC, IP, and VNI info. Type 3 (Inclusive Multicast Tag) for BUM flooding trees. Type 5 (IP Prefix) for external routes into the fabric.
Asymmetric vs. Symmetric IRB: Asymmetric IRB requires every VLAN/VNI on every leaf (poor scalability). Symmetric IRB uses a transit L3 VNI per tenant VRF -- each leaf only needs locally attached VLANs. Symmetric is the recommended and only model supporting Type 5 routes.
Topology Models: Bridged Overlay (entry-level, no inter-VLAN routing). Centralized Route Bridging (routing on spines, cost-sensitive). Edge Route Bridging (ERB) -- recommended for most deployments, distributes forwarding to leaf switches.
Key Points: VXLAN EVPN
VXLAN provides MAC-in-UDP encapsulation with 24-bit VNI addressing (~16M segments), while EVPN provides BGP-based control plane replacing flood-and-learn
EVPN Type 2 routes are the workhorse -- they carry MAC, IP, and VNI so any leaf knows which VTEP hosts a given endpoint
Symmetric IRB with a transit L3 VNI scales far better than asymmetric IRB and is required for Type 5 routes
Edge Route Bridging (ERB) is the recommended deployment model, distributing forwarding intelligence to every leaf
11.1.3 ACI Architecture and Multi-Pod/Multi-Site
Cisco ACI uses a declarative policy model managed by the APIC. Administrators define application profiles and endpoint groups (EPGs); the fabric provisions connectivity automatically. ACI runs IS-IS internally with VXLAN overlay, but uses a proprietary control plane.
Multi-Pod: 2-12 pods under a single APIC cluster via an IP-routed IPN. 50 ms RTT maximum latency. Native L2 extension. Single management domain (config errors propagate to all pods).
Animation: Multi-Pod vs. Multi-Site comparison showing shared APIC domain (Multi-Pod) versus independent APIC clusters with orchestrator (Multi-Site)
Key Points: ACI Multi-Pod / Multi-Site
Multi-Pod: single APIC cluster, 50 ms RTT max, native L2 extension -- best for metro/regional with unified management
Multi-Site: independent APIC per site, orchestrator-based policy, strict blast-radius containment -- best for geo-dispersed deployments
Common pattern: Multi-Pod within a metro area + Multi-Site across regions for the best of both approaches
11.2 Data Center Interconnect
11.2.1 DCI Options: Dark Fiber, DWDM, OTV, VXLAN
DCI technologies evolved through three generations:
Generation 1 (Pre-2008): Raw VLAN extension via L2 trunks, QinQ, or EoMPLS. Suffered from STP dependencies, single points of failure, and the full risk of a stretched L2 domain.
Generation 2 (2008+) -- OTV: MAC-in-IP encapsulation (42-byte header) with built-in IS-IS control plane. Native flood isolation, loop prevention, and multi-homing. Cisco proprietary, max 12 sites.
Generation 3 (2014+) -- VXLAN EVPN DCI: Extends the full fabric paradigm between sites. Standards-based (RFC 7348), control-plane MAC learning via MP-BGP, 16M VNI segments. Always pair VXLAN DCI with EVPN -- never deploy flood-and-learn across a WAN.
flowchart LR
subgraph DC1["Data Center 1"]
L1["Leaf / VTEP Border"]
S1["Spine"]
L1a["Leaf Compute"]
end
subgraph Transport["DCI Transport"]
DWDM1["DWDM Mux"]
DF["Dark Fiber"]
DWDM2["DWDM Mux"]
end
subgraph DC2["Data Center 2"]
L2["Leaf / VTEP Border"]
S2["Spine"]
L2a["Leaf Compute"]
end
L1a --- S1 --- L1
L1 --- DWDM1 --- DF --- DWDM2
DWDM2 --- L2
L2 --- S2 --- L2a
Figure 11.4: DCI Architecture -- VXLAN EVPN overlay rides DWDM wavelengths over dark fiber
DWDM operates at Layer 1, multiplexing multiple optical wavelengths onto a single fiber pair. It is the transport underlay upon which overlay technologies ride -- not an alternative to OTV or VXLAN.
Key Points: DCI Technologies
OTV provides safe, backward-compatible L2 extension with native flood isolation and loop prevention (Cisco proprietary)
VXLAN EVPN DCI is the standard for new multi-site fabrics -- always pair with EVPN, never use flood-and-learn over WAN
DWDM is a Layer 1 optical transport underlay, not a competing overlay -- it carries VXLAN, FC, and replication traffic on separate wavelengths
50 bytes of VXLAN overhead must be accommodated by the WAN transport MTU
11.2.2 Layer 2 Extension Risks and Mitigation
L2 extension between data centers carries significant risks: failure domain expansion (broadcast storms propagate across sites), suboptimal traffic paths (hairpinning), split-brain scenarios, and STP propagation.
Mitigations: Use OTV or EVPN with flood suppression (never raw VLANs). Deploy distributed anycast gateways to prevent hairpinning. Implement site-aware routing, BFD failover, and orchestrated MAC withdrawal for split-brain. OTV and VXLAN isolate STP domains by design.
Golden rule: Extend Layer 2 only when applications demand it, and always through technology providing flood isolation and loop prevention. Prefer L3 interconnection when possible.
Key Points: L2 Extension Risks
Never extend raw VLANs between DCs -- always use OTV or EVPN with flood suppression
Distributed anycast gateways prevent traffic hairpinning after VM migration
Prefer L3 interconnection whenever the application can tolerate it -- smaller blast radius
11.2.3 Active-Active vs. Active-Standby Data Center Design
Active-Standby: One DC handles production; the second is warm/cold standby. Simpler DCI, but the standby site is underutilized and failover takes minutes to hours.
Active-Active: Both DCs serve production simultaneously. Requires L2 extension or distributed anycast gateways, distributed default gateways, GSLB, and synchronized state for stateful services. EVPN provides active-active multihoming, MAC mobility tracking, and mass MAC withdrawal for sub-second convergence.
flowchart LR
GSLB["Global Server Load Balancer DNS"]
subgraph SiteA["Site A -- Active"]
GWA["Anycast Gateway VIP: 10.1.1.1"]
LBA["Local ADC"]
SVRA["App Servers"]
GWA --- LBA --- SVRA
end
subgraph DCI["DCI Link"]
EVPN_DCI["VXLAN EVPN MAC Mobility"]
end
subgraph SiteB["Site B -- Active"]
GWB["Anycast Gateway VIP: 10.1.1.1"]
LBB["Local ADC"]
SVRB["App Servers"]
GWB --- LBB --- SVRB
end
GSLB -.-> SiteA
GSLB -.-> SiteB
SiteA --- EVPN_DCI --- SiteB
Figure 11.5: Active-Active Data Center Design with distributed anycast gateways and GSLB
Animation: Active-active failover sequence -- site failure triggers mass MAC withdrawal, GSLB redirects traffic, anycast gateway at surviving site handles all requests
Key Points: Active-Active vs. Active-Standby
Active-active maximizes utilization and minimizes failover time but requires distributed gateways, GSLB, and state synchronization
Active-standby is simpler with less DCI bandwidth needed, but wastes capacity and has longer recovery times
EVPN provides critical active-active capabilities: multihoming, MAC mobility tracking, and mass MAC withdrawal
11.3 Data Center Services Design
11.3.1 Load Balancing and Application Delivery
One-arm (routed): ADC connects to a single leaf; traffic is source-NAT'd. Simple, no L2 dependency, but may hide client IP.
Two-arm (inline): ADC bridges between client/server VLANs. Full visibility but creates L2 dependency and potential bottleneck.
DSR (Direct Server Return): ADC handles inbound only; servers respond directly. High throughput but complex troubleshooting.
In EVPN fabrics, one-arm routed is preferred -- it aligns with L3-everywhere and enables ECMP. For multi-site, GSLB at the DNS layer directs users to the optimal site.
11.3.2 Storage Network Integration (FCoE, iSCSI)
FCoE: Encapsulates Fibre Channel in Ethernet. Requires lossless transport via Priority Flow Control (PFC) and Data Center Bridging (DCB). In VXLAN fabrics, CoS must be mapped to DSCP at leaf ingress. High complexity but lowest latency.
iSCSI: Transports SCSI over standard TCP/IP. Natively compatible with any IP network including VXLAN. Simpler and cheaper, but higher latency due to TCP overhead. QoS marking still recommended.
Guidance: For new VXLAN EVPN deployments, iSCSI is often simpler and more cost-effective. FCoE remains relevant for existing FC investments or lowest-latency requirements.
11.3.3 Compute and Network Convergence
Converged/HCI environments collapse compute, storage, and networking into integrated nodes. Key design considerations: bandwidth planning for aggregate traffic on shared links, QoS segmentation across traffic types, VRF-based multi-tenancy with L3 VNIs, and automation (Ansible, Terraform, APIC) for fabric-wide consistency.
Key Points: Data Center Services
One-arm routed load balancer placement is preferred in EVPN fabrics -- aligns with L3-everywhere, enables ECMP paths
FCoE requires end-to-end lossless behavior (PFC/DCB) with CoS-to-DSCP mapping in VXLAN fabrics -- high complexity
iSCSI runs over standard TCP/IP and is natively VXLAN-compatible -- simpler for new deployments
Converged environments need QoS segmentation, VRF multi-tenancy, and automation for consistency at scale
Post-Study Assessment
1. Why has the spine-leaf architecture replaced the traditional three-tier data center design?
It reduces the total number of switches needed
It provides predictable single-hop latency and ECMP paths optimized for east-west traffic
It eliminates the need for any routing protocols
It supports only north-south traffic patterns more efficiently
2. What is the primary advantage of EVPN over flood-and-learn in a VXLAN fabric?
EVPN reduces the VXLAN encapsulation overhead from 50 bytes to 20 bytes
EVPN replaces BGP with a simpler OSPF-based control plane
EVPN distributes MAC/IP reachability via MP-BGP, eliminating inefficient flooding
EVPN allows VXLAN to operate without any underlay network
3. Why is symmetric IRB preferred over asymmetric IRB in large VXLAN EVPN fabrics?
Symmetric IRB is faster because it skips the routing step entirely
Symmetric IRB uses a transit L3 VNI so each leaf only needs locally attached VLANs, improving scalability
Symmetric IRB eliminates the need for VRFs in the fabric
Symmetric IRB requires fewer spine switches in the topology
4. When should you choose ACI Multi-Site over ACI Multi-Pod?
When you need seamless L2 extension within a metro area
When a single APIC cluster is sufficient for management
When strict fault domain isolation and geographic distance require independent APIC clusters per site
When the deployment has fewer than 200 leaf switches total
5. What is the role of DWDM in a data center interconnect design?
It replaces VXLAN as the overlay encapsulation protocol
It provides Layer 1 optical transport, multiplexing multiple wavelengths on a single fiber pair
It provides the control plane for MAC learning between sites
It is an alternative to spine-leaf for intra-DC connectivity
6. What is the biggest risk of extending Layer 2 between data centers without proper mitigation?
Increased north-south bandwidth consumption
A failure in one site (broadcast storm, STP miscalculation) can propagate and take down all connected sites
Loss of VXLAN encapsulation capability
Inability to use OSPF as the underlay routing protocol
7. In an active-active data center design, what prevents traffic from hairpinning across the DCI link when a VM migrates?
Static routes pointing to the nearest data center
Distributed anycast gateways with the same virtual IP and MAC at both sites
Disabling all L2 extension between sites
Using OTV instead of VXLAN for encapsulation
8. Why must FCoE traffic receive special handling in a VXLAN EVPN fabric?
FCoE uses a different UDP port than VXLAN
Fibre Channel demands lossless transport, requiring PFC and DCB with CoS-to-DSCP mapping across the routed fabric
FCoE is incompatible with spine-leaf topologies
FCoE requires dedicated spine switches separate from data traffic
9. Why is the one-arm routed model preferred for load balancer placement in EVPN fabrics?
It allows the load balancer to inspect all Layer 2 headers
It aligns with the L3-everywhere philosophy and enables optimal ECMP paths without L2 dependencies
It requires no IP address configuration on the load balancer
It eliminates the need for GSLB in multi-site deployments
10. What is the maximum RTT latency supported by ACI Multi-Pod between pods?
10 ms
50 ms
150 ms
500 ms
11. What is a key advantage of OTV over raw VLAN trunking for DCI?
OTV provides higher bandwidth than VLAN trunks
OTV natively isolates broadcast/flooding and prevents L2 loops between sites
OTV is a multi-vendor standard supported by all switch platforms
OTV eliminates the need for any IP connectivity between sites
12. Which EVPN route type is considered the workhorse for advertising host MAC and IP between VTEPs?
Type 1 (Ethernet Auto-Discovery)
Type 2 (MAC/IP Advertisement)
Type 3 (Inclusive Multicast Tag)
Type 5 (IP Prefix)
13. Why should all leaf-to-spine links in a Clos fabric use the same link speed?
Mixed speeds require additional spine switches
Equal link speeds enable proper ECMP load balancing; mismatched speeds cause uneven traffic distribution
BGP cannot advertise routes over links with different speeds
Mixed speeds prevent VXLAN encapsulation from functioning
14. What is the recommended underlay MTU for a VXLAN fabric, and why?
1500 bytes, the standard Ethernet MTU
9198 bytes (jumbo), to accommodate the 50-byte VXLAN overhead without fragmentation
4096 bytes, matching the maximum VLAN count
16000 bytes, to support the maximum VNI addressing space
15. In a combined ACI deployment, what is the common pattern for using Multi-Pod and Multi-Site together?
Multi-Site within a campus, Multi-Pod across regions
Multi-Pod within a metro area for unified management, Multi-Site across regions for fault isolation
Multi-Pod for storage traffic only, Multi-Site for compute traffic