Chapter 13: Multicast, QoS, and Traffic Engineering Design

Learning Objectives

Pre-Study Assessment

Answer these questions before studying to gauge your current understanding. You will see the same questions again after studying to measure your progress.

Pre-Quiz: IP Multicast Design

1. An enterprise is deploying a live IPTV service where all receivers tune in to known broadcast sources. Which PIM mode is the optimal design choice?

PIM Sparse Mode, because it is the general-purpose default
PIM Source-Specific Multicast, because sources are known and it eliminates RP dependency
PIM Bidirectional, because IPTV involves many-to-many communication
PIM Dense Mode, because it floods to ensure all receivers get the stream

2. What is the primary role of MSDP in an Anycast RP deployment?

To elect a single active RP from among multiple candidates
To synchronize Source Active messages between RPs so each RP knows about all active sources
To replace IGMP for receiver group membership signaling
To distribute multicast routing table entries via BGP

3. In a VXLAN BGP EVPN fabric, why is PIM Bidirectional often recommended for the underlay multicast design?

Because it requires no RP, simplifying the underlay
Because every VTEP both sends and receives BUM traffic, creating a many-to-many pattern that BiDir handles with minimal state
Because PIM-SM cannot operate in VXLAN environments
Because BiDir eliminates the need for IGMP snooping on access switches

4. A network architect must design multicast for a VXLAN EVPN fabric. Which statement correctly describes the two-layer multicast approach?

The underlay uses PIM-SSM for BUM replication, and the overlay uses PIM BiDir for tenant multicast
Both underlay and overlay use the same PIM-SM instance with shared RPs on the spines
The underlay uses PIM BiDir with Anycast RP on spines for BUM replication, and the overlay uses TRM with PIM-SM for tenant multicast
The overlay handles all multicast natively via BGP EVPN type-6 routes with no underlay multicast required
Pre-Quiz: QoS Design

5. Why should WRED never be applied to the EF (Expedited Forwarding) priority queue?

Because WRED only works with TCP traffic, and EF is used for all protocol types
Because the priority queue already drops all excess traffic by design
Because voice traffic is typically UDP-based and cannot respond to early drops, so WRED would destroy packets without benefit
Because WRED requires DSCP values below 32, and EF uses DSCP 46

6. What is the fundamental difference between traffic shaping and traffic policing?

Shaping operates on ingress; policing operates on egress
Shaping buffers excess traffic and introduces delay; policing drops or re-marks excess traffic immediately with no buffering
Shaping works only with MPLS traffic; policing works with all IP traffic
Shaping uses DSCP markings; policing uses IP Precedence

7. In an enterprise QoS design, the voice priority queue (LLQ) is typically capped at what percentage of WAN link bandwidth, and why?

10%, because voice traffic is very low bandwidth
50%, because voice should always receive half the available bandwidth
33%, to prevent the strict priority queue from starving all other traffic classes
100%, because voice must always be prioritized above everything else

8. Where in the network should traffic classification and DSCP marking ideally occur?

At the WAN edge router, where bandwidth is most constrained
At the core switches, where they have the most processing power
As close to the source as possible, ideally at the access layer switch port
At the data center firewall, where deep packet inspection is available
Pre-Quiz: Traffic Engineering

9. What is the fundamental scalability limitation of MPLS-TE with RSVP-TE?

RSVP-TE cannot support more than 256 tunnels per router
Every midpoint router must maintain per-tunnel RSVP state, creating an N-squared scaling problem
MPLS-TE requires a centralized controller that becomes a single point of failure
The MPLS label space is limited to 1,048,576 entries

10. How does SR-TE fundamentally differ from MPLS-TE in terms of path state?

SR-TE uses LDP instead of RSVP-TE for label distribution
SR-TE encodes the entire path as a segment list at the headend, so midpoint routers maintain no tunnel state
SR-TE maintains state at every midpoint but refreshes less frequently than RSVP-TE
SR-TE eliminates the need for a headend router by distributing path computation to all nodes

11. What is On-Demand Next-Hop (ODN) in SR-TE, and what problem does it solve?

ODN pre-provisions backup paths for all possible failures in the network
ODN automatically creates SR policies when a BGP route with a color community is received, eliminating manual tunnel provisioning
ODN replaces BGP for VPN prefix advertisement in SR domains
ODN computes the shortest IGP path and overrides any TE constraints

12. How does TI-LFA differ from MPLS-TE Fast Reroute in providing failure protection?

TI-LFA provides slower convergence but requires less CPU than FRR
TI-LFA computes backup paths automatically from the IGP topology using the SR label stack, with no pre-provisioned backup tunnels required
TI-LFA requires a centralized controller while MPLS-TE FRR is fully distributed
TI-LFA only provides link protection, not node protection, unlike MPLS-TE FRR

13. A service provider wants intent-based traffic steering where paths are selected based on latency constraints without manual per-tunnel CSPF computation. Which SR-TE feature best addresses this?

Adjacency SIDs with static segment lists
Flexible Algorithm (Flex-Algo) with a latency-minimizing algorithm definition
RSVP-TE bandwidth reservation with CSPF
LDP label distribution with IGP metric tuning

14. When is MPLS-TE with RSVP-TE still the better choice over SR-TE?

When the network has more than 500 routers and scalability is critical
When native per-tunnel bandwidth reservation is a hard requirement or legacy devices do not support segment routing
When SDN controller integration is the strategic direction
When the operator wants to minimize operational complexity

15. In an Assured Forwarding QoS design, what is the significance of the drop precedence values (e.g., AF21 vs. AF23)?

Higher drop precedence numbers receive more bandwidth allocation during congestion
Higher drop precedence numbers (e.g., AF23) are dropped first during congestion within the same AF class, enabling differentiated treatment for in-contract vs. out-of-contract traffic
Drop precedence values determine the strict priority ordering between different AF classes
Drop precedence is only relevant for UDP traffic; TCP traffic ignores these markings

Section 1: IP Multicast Design

Multicast enables a single source to efficiently deliver data to multiple receivers without duplicating traffic at the source. The network replicates packets only at branch points where paths diverge toward receivers, dramatically reducing bandwidth consumption compared to unicast replication.

For the CCDE exam, multicast design decisions center on three questions: which PIM mode fits the application, how to provide RP redundancy, and how multicast integrates with modern fabric architectures.

1.1 PIM Mode Selection

Protocol Independent Multicast (PIM) operates in several modes. "Independent" means PIM relies on whatever unicast routing protocol is already running.

PIM Sparse Mode (PIM-SM) is the default choice. It builds distribution trees via a Rendezvous Point (RP). Receivers signal via IGMP, a shared tree (*,G) is built through the RP, and last-hop routers can switch to a source-specific shortest path tree (S,G) for optimal forwarding.

PIM Source-Specific Multicast (PIM-SSM) eliminates the RP entirely. Receivers specify both group and source, joining an (S,G) channel directly. Optimal for one-to-many with known sources (IPTV, financial feeds). Requires IGMPv3.

PIM Bidirectional (PIM BiDir) is for many-to-many with numerous senders (e.g., enterprise videoconferencing). Traffic forwards toward the RP unconditionally on a shared tree with no (S,G) state.

graph TD Source["Source S1"] -->|"1. Register"| RP["Rendezvous Point (RP)"] RP -->|"2. Shared Tree (*,G) Join propagated"| R1["Router R1"] R1 --> R2["Router R2"] R2 --> Rcv1["Receiver A (IGMP Join)"] RP --> R3["Router R3"] R3 --> Rcv2["Receiver B (IGMP Join)"] Source -.->|"3. SPT Switchover (S,G)"| R2 Source -.-> R3 style Source fill:#4a90d9,color:#fff style RP fill:#e07b39,color:#fff style Rcv1 fill:#5cb85c,color:#fff style Rcv2 fill:#5cb85c,color:#fff

Figure 13.1: PIM-SM Shared Tree to SPT Switchover. The source registers with the RP, receivers join via the RP, and last-hop routers can switch to a source-specific shortest path tree (dashed lines).

PIM ModeBest Application PatternRP RequiredState MaintainedIGMPv3 Required
PIM-SMGeneral one-to-many, many-to-manyYes(S,G) and (*,G)No
PIM-SSMOne-to-many with known sourcesNo(S,G) onlyYes
PIM BiDirMany-to-many with dense sendersYes (DF election)(*,G) onlyNo
PIM-DMSmall LAN segments onlyNoFlood-and-pruneNo

Key Points: PIM Mode Selection

Animation: PIM-SM shared tree formation followed by SPT switchover when traffic begins flowing

1.2 Rendezvous Point Placement and Redundancy

For PIM-SM, the RP is the single most critical design element. Three RP discovery mechanisms exist: static configuration, Auto-RP (Cisco proprietary), and Bootstrap Router (BSR, standards-based RFC 5059).

Anycast RP is the preferred redundancy strategy. Multiple RPs share a single common IP address. Each RP also has a unique loopback for MSDP peering. Sources register with the nearest RP (by IGP metric), and that RP propagates Source Active messages to all peers via MSDP. This provides active/active redundancy.

graph TD SrcA["Source A"] -->|"Register (nearest RP)"| RP1["RP1 Anycast: 10.0.0.1 Unique: 10.1.1.1"] SrcB["Source B"] -->|"Register (nearest RP)"| RP2["RP2 Anycast: 10.0.0.1 Unique: 10.2.2.2"] RP1 <-->|"MSDP SA Messages (TCP)"| RP2 RP1 --> DR1["DR / Last-Hop Router"] RP2 --> DR2["DR / Last-Hop Router"] DR1 --> RcvX["Receivers"] DR2 --> RcvY["Receivers"] style RP1 fill:#e07b39,color:#fff style RP2 fill:#e07b39,color:#fff style SrcA fill:#4a90d9,color:#fff style SrcB fill:#4a90d9,color:#fff style RcvX fill:#5cb85c,color:#fff style RcvY fill:#5cb85c,color:#fff

Figure 13.2: Anycast RP with MSDP Synchronization. Two RPs share the same anycast address. MSDP peering synchronizes Source Active messages for active/active redundancy.

Key Points: RP Design

1.3 Multicast in Overlay and Fabric Networks

VXLAN BGP EVPN fabrics require multicast design in two planes:

IGMP Snooping is critical at the access layer. Without it, multicast floods to every port in the VLAN, defeating multicast efficiency.

Key Points: Fabric Multicast

Animation: VXLAN EVPN fabric showing underlay BUM replication via PIM BiDir and overlay TRM forwarding paths

Section 2: End-to-End QoS Design

Quality of Service breaks the "all traffic is equal" assumption. Without QoS, a bulk file transfer and a real-time voice call compete equally -- and the voice call loses. QoS provides classification, prioritization, and protection so applications receive the treatment they require.

2.1 Classification and Marking Strategy

The DiffServ model classifies and marks packets at the network edge and provides consistent per-hop behavior (PHB) at every node along the path.

Classification identifies traffic type using criteria such as source/destination address, port number, or NBAR. Marking sets the DSCP value (6 bits, 64 possible codepoints). The cardinal rule: classify and mark as close to the source as possible.

PHBDSCP ValueIntended Use
Expedited Forwarding (EF)46Voice and ultra-low-latency traffic
Assured Forwarding (AFxy)VariousTiered data with drop precedence
Class Selector (CSx)8, 16, 24, 32, 40, 48Backward-compatible with IP Precedence
Default / Best Effort0Everything else

Assured Forwarding defines four classes, each with three drop precedences. Within the same class, higher drop precedence packets are dropped first during congestion (AF23 before AF22 before AF21). This enables differentiated treatment for in-contract vs. out-of-contract traffic.

Key Points: Classification and Marking

2.2 Queuing, Shaping, and Policing

Low-Latency Queuing (LLQ) combines a strict priority queue with CBWFQ. Voice (EF) enters the priority queue and is always serviced first. Other classes get minimum bandwidth guarantees via CBWFQ.

WRED randomly drops data queue packets before overflow, preventing TCP global synchronization. Never apply WRED to the EF priority queue -- voice is UDP-based and cannot respond to early drops.

Shaping buffers excess egress traffic to smooth bursts (adds latency). Policing drops or re-marks immediately (no buffering). The typical WAN edge pattern: shape outbound to contracted bandwidth, apply H-QoS within, police inbound at the SP edge.

flowchart LR A["Ingress Packet"] --> B["Classify and Mark (DSCP)"] B --> C["Shape to Contracted WAN Rate"] C --> D{"H-QoS Scheduler"} D -->|"EF (Voice)"| E["Priority Queue (LLQ)"] D -->|"AF Classes"| F["CBWFQ Bandwidth Guarantee"] D -->|"Best Effort"| G["Default Queue"] F --> H["WRED Congestion Avoidance"] E --> I["Egress Interface"] H --> I G --> I style B fill:#4a90d9,color:#fff style C fill:#e07b39,color:#fff style E fill:#d9534f,color:#fff style F fill:#f0ad4e,color:#fff style H fill:#5bc0de,color:#fff

Figure 13.3: H-QoS Processing Pipeline at the WAN Edge.

MechanismDirectionHandles ExcessAdds LatencyBest Use Case
ShapingEgress onlyBuffers and delaysYesWAN edge outbound
PolicingIngress or egressDrops or re-marksNoEdge enforcement, SP ingress
LLQ/CBWFQEgressPrioritizes and guarantees BWMinimalAll congestion points
WREDEgress (data queues)Drops early, randomlyNoCore routers, AF data classes

Key Points: Queuing and Congestion Management

Animation: H-QoS pipeline showing packet classification, shaping, priority queuing, and WRED congestion avoidance in action

2.3 QoS Across Network Domains

QoS must be consistent end-to-end:

ParameterVoice (G.711)Interactive VideoStreaming Video
One-way Latency< 150 ms< 150 ms< 4-5 sec (buffered)
Jitter< 30 ms< 30 msTolerant (buffered)
Packet Loss< 1%< 1%< 5%
DSCP MarkingEF (46)AF41 or CS4AF31 or CS5
Queue TreatmentStrict PriorityPriority or Guaranteed BWGuaranteed BW + WRED

Key Points: Cross-Domain QoS

Section 3: Traffic Engineering

Traffic engineering steers traffic along specific paths to optimize resource utilization, meet SLA requirements, or avoid congested links. Without it, IGP shortest-path routing sends all traffic along the same "best" path while parallel links sit underutilized.

3.1 MPLS Traffic Engineering (MPLS-TE)

MPLS-TE uses RSVP-TE to signal explicit Label Switched Paths (LSPs). The headend router computes a path using CSPF (Constrained Shortest Path First), considering available bandwidth, admin groups, and SRLGs. RSVP-TE signals the path hop-by-hop, reserving bandwidth on each link.

Fast Reroute (FRR) provides sub-50ms protection. The PLR switches to a pre-provisioned backup tunnel. Two approaches: Facility Backup (one backup for many LSPs, high scalability) and One-to-One Detour (separate backup per LSP, lower scalability).

flowchart LR HE["Headend (CSPF + RSVP)"] -->|"Primary LSP"| M1["Midpoint R1 (RSVP state)"] M1 -->|"Primary LSP"| M2["Midpoint R2 (RSVP state)"] M2 -->|"Primary LSP"| TE["Tailend"] M1 -.->|"FRR Backup (Facility)"| BK["Bypass Router"] BK -.-> M2 style HE fill:#4a90d9,color:#fff style TE fill:#5cb85c,color:#fff style M1 fill:#f0ad4e,color:#000 style M2 fill:#f0ad4e,color:#000 style BK fill:#d9534f,color:#fff

Figure 13.4: MPLS-TE LSP with Fast Reroute (Facility Backup).

Key Points: MPLS-TE

3.2 Segment Routing Traffic Engineering (SR-TE)

SR-TE encodes the entire path as an ordered list of segments (labels) at the ingress router. No signaling protocol is needed along the path -- LDP and RSVP-TE are eliminated. Label distribution is handled by the IGP or BGP.

Segment TypeScopePurposeExample
Prefix SIDGlobal (SRGB)Shortest path to a prefix/node"Route via Node R5"
Adjacency SIDLocalSpecific link/adjacency"Use link R3-to-R4"
Node SIDGlobalIdentifies a specific router"Route to router R7"

On-Demand Next-Hop (ODN) automatically creates SR policies when a BGP route with a color community arrives. Intent-based, SLA-aware networking without manual provisioning.

Flexible Algorithm (Flex-Algo) defines custom routing algorithms (128-255) with constraints like minimize latency or avoid certain affinities. Replaces per-tunnel CSPF with a declarative model.

TI-LFA provides automatic fast reroute using the SR label stack -- no pre-provisioned backup tunnels needed. Sub-50ms convergence computed from the IGP topology.

flowchart LR HE["Headend PE (Encodes segment list)"] -->|"Prefix SID: 16005"| R2["R2 (label swap only)"] R2 -->|"Adj SID: 24034"| R3["R3 (label swap only)"] R3 -->|"Adj SID forces specific link"| R4["R4"] R4 -->|"Prefix SID pop"| R5["Egress PE (R5)"] ODN["BGP VPN Route + Color Community"] -.->|"ODN triggers auto SR policy"| HE style HE fill:#4a90d9,color:#fff style R5 fill:#5cb85c,color:#fff style ODN fill:#8e44ad,color:#fff style R2 fill:#95a5a6,color:#fff style R3 fill:#95a5a6,color:#fff style R4 fill:#95a5a6,color:#fff

Figure 13.5: SR-TE Path with On-Demand Next-Hop (ODN). Midpoint routers (gray) perform simple label swap with no tunnel state.

Key Points: SR-TE

Animation: SR-TE segment list encoding at headend and label swap operations at each midpoint router along the path

3.3 MPLS-TE vs. SR-TE Design Comparison

Design AspectMPLS-TE (RSVP-TE)SR-TE
Signaling ProtocolRSVP-TE on every hopNone (IGP/BGP distributes SIDs)
Midpoint StatePer-tunnel state on every routerNo midpoint state (headend only)
ScalabilityLimited by RSVP state (N-squared)Highly scalable
Bandwidth ReservationNative per-tunnel reservationRequires external controller (PCE)
Fast RerouteFRR with pre-provisioned tunnelsTI-LFA computed automatically
Operational ComplexityHigh (LDP + RSVP interaction)Low (IGP-based, minimal config)
SDN IntegrationPossible but complexNative via PCE and ODN

Choose MPLS-TE when: native bandwidth reservation is required, legacy devices do not support SR, or regulatory mandates require guaranteed bandwidth.

Choose SR-TE when: scalability and simplicity are priorities, SDN automation is strategic, intent-based steering (ODN, Flex-Algo) is desired, or it is a greenfield deployment.

Migration: SR coexists with LDP and RSVP-TE. Deploy SR-MPLS alongside LDP, use SR-PREFER to shift label distribution, migrate tunnels from RSVP-TE to SR-TE policies, then decommission legacy protocols.

Key Points: TE Technology Selection

Post-Study Assessment

Now that you have studied the material, answer these same questions again. Compare your pre and post scores to measure your learning progress.

Post-Quiz: IP Multicast Design

1. An enterprise is deploying a live IPTV service where all receivers tune in to known broadcast sources. Which PIM mode is the optimal design choice?

PIM Sparse Mode, because it is the general-purpose default
PIM Source-Specific Multicast, because sources are known and it eliminates RP dependency
PIM Bidirectional, because IPTV involves many-to-many communication
PIM Dense Mode, because it floods to ensure all receivers get the stream

2. What is the primary role of MSDP in an Anycast RP deployment?

To elect a single active RP from among multiple candidates
To synchronize Source Active messages between RPs so each RP knows about all active sources
To replace IGMP for receiver group membership signaling
To distribute multicast routing table entries via BGP

3. In a VXLAN BGP EVPN fabric, why is PIM Bidirectional often recommended for the underlay multicast design?

Because it requires no RP, simplifying the underlay
Because every VTEP both sends and receives BUM traffic, creating a many-to-many pattern that BiDir handles with minimal state
Because PIM-SM cannot operate in VXLAN environments
Because BiDir eliminates the need for IGMP snooping on access switches

4. A network architect must design multicast for a VXLAN EVPN fabric. Which statement correctly describes the two-layer multicast approach?

The underlay uses PIM-SSM for BUM replication, and the overlay uses PIM BiDir for tenant multicast
Both underlay and overlay use the same PIM-SM instance with shared RPs on the spines
The underlay uses PIM BiDir with Anycast RP on spines for BUM replication, and the overlay uses TRM with PIM-SM for tenant multicast
The overlay handles all multicast natively via BGP EVPN type-6 routes with no underlay multicast required
Post-Quiz: QoS Design

5. Why should WRED never be applied to the EF (Expedited Forwarding) priority queue?

Because WRED only works with TCP traffic, and EF is used for all protocol types
Because the priority queue already drops all excess traffic by design
Because voice traffic is typically UDP-based and cannot respond to early drops, so WRED would destroy packets without benefit
Because WRED requires DSCP values below 32, and EF uses DSCP 46

6. What is the fundamental difference between traffic shaping and traffic policing?

Shaping operates on ingress; policing operates on egress
Shaping buffers excess traffic and introduces delay; policing drops or re-marks excess traffic immediately with no buffering
Shaping works only with MPLS traffic; policing works with all IP traffic
Shaping uses DSCP markings; policing uses IP Precedence

7. In an enterprise QoS design, the voice priority queue (LLQ) is typically capped at what percentage of WAN link bandwidth, and why?

10%, because voice traffic is very low bandwidth
50%, because voice should always receive half the available bandwidth
33%, to prevent the strict priority queue from starving all other traffic classes
100%, because voice must always be prioritized above everything else

8. Where in the network should traffic classification and DSCP marking ideally occur?

At the WAN edge router, where bandwidth is most constrained
At the core switches, where they have the most processing power
As close to the source as possible, ideally at the access layer switch port
At the data center firewall, where deep packet inspection is available
Post-Quiz: Traffic Engineering

9. What is the fundamental scalability limitation of MPLS-TE with RSVP-TE?

RSVP-TE cannot support more than 256 tunnels per router
Every midpoint router must maintain per-tunnel RSVP state, creating an N-squared scaling problem
MPLS-TE requires a centralized controller that becomes a single point of failure
The MPLS label space is limited to 1,048,576 entries

10. How does SR-TE fundamentally differ from MPLS-TE in terms of path state?

SR-TE uses LDP instead of RSVP-TE for label distribution
SR-TE encodes the entire path as a segment list at the headend, so midpoint routers maintain no tunnel state
SR-TE maintains state at every midpoint but refreshes less frequently than RSVP-TE
SR-TE eliminates the need for a headend router by distributing path computation to all nodes

11. What is On-Demand Next-Hop (ODN) in SR-TE, and what problem does it solve?

ODN pre-provisions backup paths for all possible failures in the network
ODN automatically creates SR policies when a BGP route with a color community is received, eliminating manual tunnel provisioning
ODN replaces BGP for VPN prefix advertisement in SR domains
ODN computes the shortest IGP path and overrides any TE constraints

12. How does TI-LFA differ from MPLS-TE Fast Reroute in providing failure protection?

TI-LFA provides slower convergence but requires less CPU than FRR
TI-LFA computes backup paths automatically from the IGP topology using the SR label stack, with no pre-provisioned backup tunnels required
TI-LFA requires a centralized controller while MPLS-TE FRR is fully distributed
TI-LFA only provides link protection, not node protection, unlike MPLS-TE FRR

13. A service provider wants intent-based traffic steering where paths are selected based on latency constraints without manual per-tunnel CSPF computation. Which SR-TE feature best addresses this?

Adjacency SIDs with static segment lists
Flexible Algorithm (Flex-Algo) with a latency-minimizing algorithm definition
RSVP-TE bandwidth reservation with CSPF
LDP label distribution with IGP metric tuning

14. When is MPLS-TE with RSVP-TE still the better choice over SR-TE?

When the network has more than 500 routers and scalability is critical
When native per-tunnel bandwidth reservation is a hard requirement or legacy devices do not support segment routing
When SDN controller integration is the strategic direction
When the operator wants to minimize operational complexity

15. In an Assured Forwarding QoS design, what is the significance of the drop precedence values (e.g., AF21 vs. AF23)?

Higher drop precedence numbers receive more bandwidth allocation during congestion
Higher drop precedence numbers (e.g., AF23) are dropped first during congestion within the same AF class, enabling differentiated treatment for in-contract vs. out-of-contract traffic
Drop precedence values determine the strict priority ordering between different AF classes
Drop precedence is only relevant for UDP traffic; TCP traffic ignores these markings

Your Progress

Answer Explanations