Chapter 11: Data Center Network Design

Learning Objectives

Pre-Study Assessment

1. Why has the spine-leaf architecture replaced the traditional three-tier data center design?

It reduces the total number of switches needed
It provides predictable single-hop latency and ECMP paths optimized for east-west traffic
It eliminates the need for any routing protocols
It supports only north-south traffic patterns more efficiently

2. What is the primary advantage of EVPN over flood-and-learn in a VXLAN fabric?

EVPN reduces the VXLAN encapsulation overhead from 50 bytes to 20 bytes
EVPN replaces BGP with a simpler OSPF-based control plane
EVPN distributes MAC/IP reachability via MP-BGP, eliminating inefficient flooding
EVPN allows VXLAN to operate without any underlay network

3. Why is symmetric IRB preferred over asymmetric IRB in large VXLAN EVPN fabrics?

Symmetric IRB is faster because it skips the routing step entirely
Symmetric IRB uses a transit L3 VNI so each leaf only needs locally attached VLANs, improving scalability
Symmetric IRB eliminates the need for VRFs in the fabric
Symmetric IRB requires fewer spine switches in the topology

4. When should you choose ACI Multi-Site over ACI Multi-Pod?

When you need seamless L2 extension within a metro area
When a single APIC cluster is sufficient for management
When strict fault domain isolation and geographic distance require independent APIC clusters per site
When the deployment has fewer than 200 leaf switches total

5. What is the role of DWDM in a data center interconnect design?

It replaces VXLAN as the overlay encapsulation protocol
It provides Layer 1 optical transport, multiplexing multiple wavelengths on a single fiber pair
It provides the control plane for MAC learning between sites
It is an alternative to spine-leaf for intra-DC connectivity

6. What is the biggest risk of extending Layer 2 between data centers without proper mitigation?

Increased north-south bandwidth consumption
A failure in one site (broadcast storm, STP miscalculation) can propagate and take down all connected sites
Loss of VXLAN encapsulation capability
Inability to use OSPF as the underlay routing protocol

7. In an active-active data center design, what prevents traffic from hairpinning across the DCI link when a VM migrates?

Static routes pointing to the nearest data center
Distributed anycast gateways with the same virtual IP and MAC at both sites
Disabling all L2 extension between sites
Using OTV instead of VXLAN for encapsulation

8. Why must FCoE traffic receive special handling in a VXLAN EVPN fabric?

FCoE uses a different UDP port than VXLAN
Fibre Channel demands lossless transport, requiring PFC and DCB with CoS-to-DSCP mapping across the routed fabric
FCoE is incompatible with spine-leaf topologies
FCoE requires dedicated spine switches separate from data traffic

9. Why is the one-arm routed model preferred for load balancer placement in EVPN fabrics?

It allows the load balancer to inspect all Layer 2 headers
It aligns with the L3-everywhere philosophy and enables optimal ECMP paths without L2 dependencies
It requires no IP address configuration on the load balancer
It eliminates the need for GSLB in multi-site deployments

10. What is the maximum RTT latency supported by ACI Multi-Pod between pods?

10 ms
50 ms
150 ms
500 ms

11. What is a key advantage of OTV over raw VLAN trunking for DCI?

OTV provides higher bandwidth than VLAN trunks
OTV natively isolates broadcast/flooding and prevents L2 loops between sites
OTV is a multi-vendor standard supported by all switch platforms
OTV eliminates the need for any IP connectivity between sites

12. Which EVPN route type is considered the workhorse for advertising host MAC and IP between VTEPs?

Type 1 (Ethernet Auto-Discovery)
Type 2 (MAC/IP Advertisement)
Type 3 (Inclusive Multicast Tag)
Type 5 (IP Prefix)

13. Why should all leaf-to-spine links in a Clos fabric use the same link speed?

Mixed speeds require additional spine switches
Equal link speeds enable proper ECMP load balancing; mismatched speeds cause uneven traffic distribution
BGP cannot advertise routes over links with different speeds
Mixed speeds prevent VXLAN encapsulation from functioning

14. What is the recommended underlay MTU for a VXLAN fabric, and why?

1500 bytes, the standard Ethernet MTU
9198 bytes (jumbo), to accommodate the 50-byte VXLAN overhead without fragmentation
4096 bytes, matching the maximum VLAN count
16000 bytes, to support the maximum VNI addressing space

15. In a combined ACI deployment, what is the common pattern for using Multi-Pod and Multi-Site together?

Multi-Site within a campus, Multi-Pod across regions
Multi-Pod within a metro area for unified management, Multi-Site across regions for fault isolation
Multi-Pod for storage traffic only, Multi-Site for compute traffic
They cannot be combined in the same deployment

11.1 Data Center Fabric Architecture

11.1.1 Spine-Leaf Topology Design and Scaling

The modern data center has shifted from the traditional three-tier design (access, aggregation, core) to the spine-leaf fabric architecture. This shift is driven by the explosion of east-west traffic from virtualization, microservices, and distributed storage. The spine-leaf design, rooted in Clos network theory from 1953, provides many parallel paths between any two endpoints.

The design consists of two layers: spine switches (the high-speed backbone) and leaf switches (host connectivity at the edge). The fundamental rules are strict: leaf switches connect only to spines, spines connect only to leaves. Every payload traverses exactly one spine hop, producing consistent and predictable latency.

graph TD S1["Spine 1"] S2["Spine 2"] S3["Spine 3"] L1["Leaf 1"] L2["Leaf 2"] L3["Leaf 3"] L4["Leaf 4"] H1["Servers"] H2["Servers"] H3["Servers"] H4["Servers"] S1 --- L1 S1 --- L2 S1 --- L3 S1 --- L4 S2 --- L1 S2 --- L2 S2 --- L3 S2 --- L4 S3 --- L1 S3 --- L2 S3 --- L3 S3 --- L4 L1 --- H1 L2 --- H2 L3 --- H3 L4 --- H4

Figure 11.1: Spine-Leaf Fabric Topology -- every leaf connects to every spine, providing ECMP paths and predictable single-hop latency

Animation: Traffic flow through a spine-leaf fabric showing ECMP path selection and linear scaling as new leaves/spines are added

Scaling: Add leaf switches for more server ports (each new leaf connects to every spine). Add spine switches for more bandwidth per path (every leaf-to-leaf path gains an ECMP path). All leaf-to-spine links must use the same speed for proper ECMP load balancing.

Underlay design: OSPF (single area, point-to-point interfaces) or eBGP (unique AS per device for fault isolation). Best practices include unnumbered interfaces and jumbo MTU (9198 bytes) to accommodate the 50-byte VXLAN overhead.

Key Points: Spine-Leaf Topology

11.1.2 VXLAN EVPN Fabric Design

VXLAN encapsulates Layer 2 Ethernet frames inside Layer 3 UDP packets (port 4789), allowing L2 segments to stretch across the routed underlay. Each virtual network is identified by a 24-bit VNI, supporting approximately 16 million segments (vs. 4,096 VLANs in 802.1Q). VTEPs on leaf switches handle encapsulation/decapsulation, adding approximately 50 bytes of overhead.

graph TD subgraph "EVPN Control Plane" BGP["MP-BGP EVPN Route Reflector"] end subgraph "Spine Layer" SP1["Spine 1 - IP Underlay"] SP2["Spine 2 - IP Underlay"] end subgraph "Leaf / VTEP Layer" V1["Leaf 1 / VTEP 1 - VNI 10001, 10002"] V2["Leaf 2 / VTEP 2 - VNI 10001"] V3["Leaf 3 / VTEP 3 - VNI 10002"] end subgraph "Hosts" SRV1["Server A - VNI 10001"] SRV2["Server B - VNI 10001"] SRV3["Server C - VNI 10002"] end BGP -. "MAC/IP routes" .-> V1 BGP -. "MAC/IP routes" .-> V2 BGP -. "MAC/IP routes" .-> V3 SP1 --- V1 SP1 --- V2 SP1 --- V3 SP2 --- V1 SP2 --- V2 SP2 --- V3 V1 --- SRV1 V2 --- SRV2 V3 --- SRV3

Figure 11.2: VXLAN EVPN Fabric -- MP-BGP distributes MAC/IP reachability between VTEPs across the routed spine-leaf underlay

Animation: EVPN Type 2 route advertisement flow -- host connects to leaf, MAC/IP advertised via MP-BGP to all VTEPs, eliminating flood-and-learn

EVPN Route Types: Type 1 (Ethernet Auto-Discovery) for multi-homing and fast convergence. Type 2 (MAC/IP Advertisement) -- the workhorse, carrying MAC, IP, and VNI info. Type 3 (Inclusive Multicast Tag) for BUM flooding trees. Type 5 (IP Prefix) for external routes into the fabric.

Asymmetric vs. Symmetric IRB: Asymmetric IRB requires every VLAN/VNI on every leaf (poor scalability). Symmetric IRB uses a transit L3 VNI per tenant VRF -- each leaf only needs locally attached VLANs. Symmetric is the recommended and only model supporting Type 5 routes.

Topology Models: Bridged Overlay (entry-level, no inter-VLAN routing). Centralized Route Bridging (routing on spines, cost-sensitive). Edge Route Bridging (ERB) -- recommended for most deployments, distributes forwarding to leaf switches.

Key Points: VXLAN EVPN

11.1.3 ACI Architecture and Multi-Pod/Multi-Site

Cisco ACI uses a declarative policy model managed by the APIC. Administrators define application profiles and endpoint groups (EPGs); the fabric provisions connectivity automatically. ACI runs IS-IS internally with VXLAN overlay, but uses a proprietary control plane.

Multi-Pod: 2-12 pods under a single APIC cluster via an IP-routed IPN. 50 ms RTT maximum latency. Native L2 extension. Single management domain (config errors propagate to all pods).

Multi-Site: Independent APIC clusters per site, connected via Nexus Dashboard Orchestrator. Strict fault domain isolation. Relaxed latency requirements (intercontinental). L3 interconnection preferred.

Animation: Multi-Pod vs. Multi-Site comparison showing shared APIC domain (Multi-Pod) versus independent APIC clusters with orchestrator (Multi-Site)

Key Points: ACI Multi-Pod / Multi-Site

11.2 Data Center Interconnect

11.2.1 DCI Options: Dark Fiber, DWDM, OTV, VXLAN

DCI technologies evolved through three generations:

Generation 1 (Pre-2008): Raw VLAN extension via L2 trunks, QinQ, or EoMPLS. Suffered from STP dependencies, single points of failure, and the full risk of a stretched L2 domain.

Generation 2 (2008+) -- OTV: MAC-in-IP encapsulation (42-byte header) with built-in IS-IS control plane. Native flood isolation, loop prevention, and multi-homing. Cisco proprietary, max 12 sites.

Generation 3 (2014+) -- VXLAN EVPN DCI: Extends the full fabric paradigm between sites. Standards-based (RFC 7348), control-plane MAC learning via MP-BGP, 16M VNI segments. Always pair VXLAN DCI with EVPN -- never deploy flood-and-learn across a WAN.

flowchart LR subgraph DC1["Data Center 1"] L1["Leaf / VTEP Border"] S1["Spine"] L1a["Leaf Compute"] end subgraph Transport["DCI Transport"] DWDM1["DWDM Mux"] DF["Dark Fiber"] DWDM2["DWDM Mux"] end subgraph DC2["Data Center 2"] L2["Leaf / VTEP Border"] S2["Spine"] L2a["Leaf Compute"] end L1a --- S1 --- L1 L1 --- DWDM1 --- DF --- DWDM2 DWDM2 --- L2 L2 --- S2 --- L2a

Figure 11.4: DCI Architecture -- VXLAN EVPN overlay rides DWDM wavelengths over dark fiber

DWDM operates at Layer 1, multiplexing multiple optical wavelengths onto a single fiber pair. It is the transport underlay upon which overlay technologies ride -- not an alternative to OTV or VXLAN.

Key Points: DCI Technologies

11.2.2 Layer 2 Extension Risks and Mitigation

L2 extension between data centers carries significant risks: failure domain expansion (broadcast storms propagate across sites), suboptimal traffic paths (hairpinning), split-brain scenarios, and STP propagation.

Mitigations: Use OTV or EVPN with flood suppression (never raw VLANs). Deploy distributed anycast gateways to prevent hairpinning. Implement site-aware routing, BFD failover, and orchestrated MAC withdrawal for split-brain. OTV and VXLAN isolate STP domains by design.

Golden rule: Extend Layer 2 only when applications demand it, and always through technology providing flood isolation and loop prevention. Prefer L3 interconnection when possible.

Key Points: L2 Extension Risks

11.2.3 Active-Active vs. Active-Standby Data Center Design

Active-Standby: One DC handles production; the second is warm/cold standby. Simpler DCI, but the standby site is underutilized and failover takes minutes to hours.

Active-Active: Both DCs serve production simultaneously. Requires L2 extension or distributed anycast gateways, distributed default gateways, GSLB, and synchronized state for stateful services. EVPN provides active-active multihoming, MAC mobility tracking, and mass MAC withdrawal for sub-second convergence.

flowchart LR GSLB["Global Server Load Balancer DNS"] subgraph SiteA["Site A -- Active"] GWA["Anycast Gateway VIP: 10.1.1.1"] LBA["Local ADC"] SVRA["App Servers"] GWA --- LBA --- SVRA end subgraph DCI["DCI Link"] EVPN_DCI["VXLAN EVPN MAC Mobility"] end subgraph SiteB["Site B -- Active"] GWB["Anycast Gateway VIP: 10.1.1.1"] LBB["Local ADC"] SVRB["App Servers"] GWB --- LBB --- SVRB end GSLB -.-> SiteA GSLB -.-> SiteB SiteA --- EVPN_DCI --- SiteB

Figure 11.5: Active-Active Data Center Design with distributed anycast gateways and GSLB

Animation: Active-active failover sequence -- site failure triggers mass MAC withdrawal, GSLB redirects traffic, anycast gateway at surviving site handles all requests

Key Points: Active-Active vs. Active-Standby

11.3 Data Center Services Design

11.3.1 Load Balancing and Application Delivery

One-arm (routed): ADC connects to a single leaf; traffic is source-NAT'd. Simple, no L2 dependency, but may hide client IP.

Two-arm (inline): ADC bridges between client/server VLANs. Full visibility but creates L2 dependency and potential bottleneck.

DSR (Direct Server Return): ADC handles inbound only; servers respond directly. High throughput but complex troubleshooting.

In EVPN fabrics, one-arm routed is preferred -- it aligns with L3-everywhere and enables ECMP. For multi-site, GSLB at the DNS layer directs users to the optimal site.

11.3.2 Storage Network Integration (FCoE, iSCSI)

FCoE: Encapsulates Fibre Channel in Ethernet. Requires lossless transport via Priority Flow Control (PFC) and Data Center Bridging (DCB). In VXLAN fabrics, CoS must be mapped to DSCP at leaf ingress. High complexity but lowest latency.

iSCSI: Transports SCSI over standard TCP/IP. Natively compatible with any IP network including VXLAN. Simpler and cheaper, but higher latency due to TCP overhead. QoS marking still recommended.

Guidance: For new VXLAN EVPN deployments, iSCSI is often simpler and more cost-effective. FCoE remains relevant for existing FC investments or lowest-latency requirements.

11.3.3 Compute and Network Convergence

Converged/HCI environments collapse compute, storage, and networking into integrated nodes. Key design considerations: bandwidth planning for aggregate traffic on shared links, QoS segmentation across traffic types, VRF-based multi-tenancy with L3 VNIs, and automation (Ansible, Terraform, APIC) for fabric-wide consistency.

Key Points: Data Center Services

Post-Study Assessment

1. Why has the spine-leaf architecture replaced the traditional three-tier data center design?

It reduces the total number of switches needed
It provides predictable single-hop latency and ECMP paths optimized for east-west traffic
It eliminates the need for any routing protocols
It supports only north-south traffic patterns more efficiently

2. What is the primary advantage of EVPN over flood-and-learn in a VXLAN fabric?

EVPN reduces the VXLAN encapsulation overhead from 50 bytes to 20 bytes
EVPN replaces BGP with a simpler OSPF-based control plane
EVPN distributes MAC/IP reachability via MP-BGP, eliminating inefficient flooding
EVPN allows VXLAN to operate without any underlay network

3. Why is symmetric IRB preferred over asymmetric IRB in large VXLAN EVPN fabrics?

Symmetric IRB is faster because it skips the routing step entirely
Symmetric IRB uses a transit L3 VNI so each leaf only needs locally attached VLANs, improving scalability
Symmetric IRB eliminates the need for VRFs in the fabric
Symmetric IRB requires fewer spine switches in the topology

4. When should you choose ACI Multi-Site over ACI Multi-Pod?

When you need seamless L2 extension within a metro area
When a single APIC cluster is sufficient for management
When strict fault domain isolation and geographic distance require independent APIC clusters per site
When the deployment has fewer than 200 leaf switches total

5. What is the role of DWDM in a data center interconnect design?

It replaces VXLAN as the overlay encapsulation protocol
It provides Layer 1 optical transport, multiplexing multiple wavelengths on a single fiber pair
It provides the control plane for MAC learning between sites
It is an alternative to spine-leaf for intra-DC connectivity

6. What is the biggest risk of extending Layer 2 between data centers without proper mitigation?

Increased north-south bandwidth consumption
A failure in one site (broadcast storm, STP miscalculation) can propagate and take down all connected sites
Loss of VXLAN encapsulation capability
Inability to use OSPF as the underlay routing protocol

7. In an active-active data center design, what prevents traffic from hairpinning across the DCI link when a VM migrates?

Static routes pointing to the nearest data center
Distributed anycast gateways with the same virtual IP and MAC at both sites
Disabling all L2 extension between sites
Using OTV instead of VXLAN for encapsulation

8. Why must FCoE traffic receive special handling in a VXLAN EVPN fabric?

FCoE uses a different UDP port than VXLAN
Fibre Channel demands lossless transport, requiring PFC and DCB with CoS-to-DSCP mapping across the routed fabric
FCoE is incompatible with spine-leaf topologies
FCoE requires dedicated spine switches separate from data traffic

9. Why is the one-arm routed model preferred for load balancer placement in EVPN fabrics?

It allows the load balancer to inspect all Layer 2 headers
It aligns with the L3-everywhere philosophy and enables optimal ECMP paths without L2 dependencies
It requires no IP address configuration on the load balancer
It eliminates the need for GSLB in multi-site deployments

10. What is the maximum RTT latency supported by ACI Multi-Pod between pods?

10 ms
50 ms
150 ms
500 ms

11. What is a key advantage of OTV over raw VLAN trunking for DCI?

OTV provides higher bandwidth than VLAN trunks
OTV natively isolates broadcast/flooding and prevents L2 loops between sites
OTV is a multi-vendor standard supported by all switch platforms
OTV eliminates the need for any IP connectivity between sites

12. Which EVPN route type is considered the workhorse for advertising host MAC and IP between VTEPs?

Type 1 (Ethernet Auto-Discovery)
Type 2 (MAC/IP Advertisement)
Type 3 (Inclusive Multicast Tag)
Type 5 (IP Prefix)

13. Why should all leaf-to-spine links in a Clos fabric use the same link speed?

Mixed speeds require additional spine switches
Equal link speeds enable proper ECMP load balancing; mismatched speeds cause uneven traffic distribution
BGP cannot advertise routes over links with different speeds
Mixed speeds prevent VXLAN encapsulation from functioning

14. What is the recommended underlay MTU for a VXLAN fabric, and why?

1500 bytes, the standard Ethernet MTU
9198 bytes (jumbo), to accommodate the 50-byte VXLAN overhead without fragmentation
4096 bytes, matching the maximum VLAN count
16000 bytes, to support the maximum VNI addressing space

15. In a combined ACI deployment, what is the common pattern for using Multi-Pod and Multi-Site together?

Multi-Site within a campus, Multi-Pod across regions
Multi-Pod within a metro area for unified management, Multi-Site across regions for fault isolation
Multi-Pod for storage traffic only, Multi-Site for compute traffic
They cannot be combined in the same deployment

Your Progress

Answer Explanations