Chapter 6: Centralized, Decentralized, and Hybrid Control Planes

Learning Objectives

Pre-Study Assessment

Answer these questions before studying the material to establish your baseline understanding.

Pre-Quiz

1. What is the primary mechanism by which decentralized control planes achieve scalability?

Deploying more powerful routers with faster CPUs Hierarchy, summarization, and domain partitioning Using a single global routing table shared by all devices Centralizing all route computation on a dedicated server

2. In OpenFlow, what happens in reactive mode when a switch receives a packet with no matching flow rule?

The switch drops the packet silently The switch forwards it out all ports (flooding) The switch sends a packet-in message to the controller The switch buffers the packet indefinitely until a rule is installed

3. Which consensus algorithm do modern SDN controllers like ONOS and OpenDaylight use for distributed state synchronization?

Paxos RAFT Two-Phase Commit Zab (ZooKeeper Atomic Broadcast)

4. In Cisco SD-Access, what protocol provides the overlay control plane for endpoint-to-location mapping?

OSPF BGP EVPN LISP MPLS LDP

5. What is the single most important design question for a centralized control plane?

How fast is the controller? How many flow rules can the controller install per second? What happens when the controller fails? Which vendor makes the best controller?

6. Which of the following correctly describes the four phases of link-state protocol convergence in order?

SPF computation, failure detection, LSA flooding, FIB update Failure detection, LSA/LSP generation and flooding, SPF computation, RIB/FIB update LSA flooding, failure detection, RIB update, SPF computation Failure detection, SPF computation, LSA flooding, RIB/FIB update

7. Why is IS-IS preferred over OSPF as the underlay routing protocol in Cisco SD-Access?

IS-IS supports more areas than OSPF IS-IS runs over Layer 2 and has simpler extensibility via TLV structures IS-IS has faster convergence than OSPF in all scenarios IS-IS requires fewer CPU resources than OSPF

8. What does the PDLC pattern stand for, and why is it significant?

Protocol Driven Logical Configuration -- a method for automating router configs Physically Distributed, Logically Centralized -- controllers are dispersed but present a unified view Partially Decentralized Layered Control -- a hybrid routing hierarchy Path Distribution and Label Computation -- an MPLS optimization technique

9. In a hybrid control plane architecture, which function should be distributed rather than centralized?

Policy definition and identity management Network assurance and telemetry aggregation Failure detection and fast reroute (BFD, LFA) Configuration provisioning and compliance

10. What is the key advantage of a stateful PCE over a stateless PCE?

Stateful PCE uses less memory Stateful PCE maintains an LSP database enabling global optimization and re-optimization Stateful PCE does not require communication with routers Stateful PCE supports only Segment Routing, not MPLS

11. What happens to an SD-Access fabric when DNA Center (Catalyst Center) becomes temporarily unavailable?

The entire fabric stops forwarding traffic immediately The fabric continues forwarding and enforcing existing policies; new changes are deferred The fabric reverts to a flat Layer 2 topology All VXLAN tunnels are torn down until recovery

12. Which EIGRP feature enables sub-second failover without full route recomputation?

Route summarization at area boundaries Feasible successors (pre-computed backup paths via DUAL) Incremental SPF calculation BFD-assisted hello timers

13. For a leaderless SDN controller cluster (like OpenDaylight), what is the primary performance advantage over a leader-based cluster (like ONOS)?

It supports more switches per controller node It eliminates the need for consensus algorithms entirely It offers less topology discovery and flow installation time in small-to-medium environments It provides stronger consistency guarantees than leader-based clusters

14. Why does Segment Routing TI-LFA (Topology Independent LFA) represent an improvement over basic LFA and Remote LFA?

TI-LFA uses less router memory than LFA TI-LFA guarantees 100% backup path coverage regardless of topology constraints TI-LFA eliminates the need for BFD TI-LFA works only with centralized controllers, avoiding distributed complexity

15. In SD-Access wireless, what architectural change eliminates the traditional data traffic "hairpin" through the WLC?

Moving to autonomous AP mode with no controller Using VXLAN directly from fabric-enabled APs for distributed data plane forwarding Replacing CAPWAP with GRE tunnels Deploying local-mode APs with FlexConnect

1. Decentralized Control Plane Design

A decentralized control plane is the traditional networking model where every router and switch independently runs routing protocols, exchanges topology information with neighbors, and makes autonomous forwarding decisions. There is no single device or software platform that holds a master copy of the network state -- the "truth" emerges from the collective agreement of all participating devices.

Key Points: Decentralized Control Plane Fundamentals

1.1 Distributed Routing Protocol Comparison

The CCDE candidate must understand the design trade-offs among the major distributed routing protocols. Each brings distinct convergence characteristics, scalability limits, and operational models.

ProtocolTypeHierarchy ModelConvergence SpeedScalability ApproachBest Fit
OSPFLink-stateAreas (backbone + non-backbone)Fast (SPF)Area partitioning, stub areasEnterprise campus, mid-size SP
IS-ISLink-stateLevels (L1/L2)Fast (SPF)Level hierarchy, mesh groupsLarge SP, DC underlay, SD-Access
BGPPath-vectorAS hierarchy, confederationsModerate (timer-dependent)Incremental updates, route reflectorsInter-domain, DC fabric, WAN
EIGRPAdv. distance-vectorSummarization boundariesVery fast (feasible successors)Query scoping, stub routingEnterprise branch, campus

IS-IS runs directly over Layer 2 (not IP) and uses TLV structures for simpler extensibility -- this is why Cisco SD-Access selects it as the default underlay protocol. BGP's incremental update mechanism (advertising only changes) gives it inherent scalability for very large topologies. EIGRP maintains feasible successors for sub-second failover without full recomputation.

Animation: Distributed routing protocol convergence comparison -- showing how OSPF area flooding, IS-IS level hierarchy, BGP incremental updates, and EIGRP feasible successors each respond to the same link failure event.

1.2 Convergence Optimization

Convergence -- the time for all devices to agree on a consistent network view after a topology change -- is one of the most critical design considerations. The convergence timeline for link-state protocols involves four sequential phases:

graph LR A["1. Failure Detection\nPhysical layer / BFD /\nHello timer expiry"] --> B["2. LSA/LSP Generation\n& Flooding\nThrottle timers control\npropagation speed"] B --> C["3. SPF Computation\nDijkstra / iSPF on\neach router independently"] C --> D["4. RIB/FIB Update\nForwarding table\nreprogrammed (TCAM)"] style A fill:#4a90d9,stroke:#333,color:#fff style B fill:#f5a623,stroke:#333,color:#fff style C fill:#d0021b,stroke:#333,color:#fff style D fill:#7ed321,stroke:#333,color:#fff

Figure 6.1: Link-State Protocol Convergence Timeline

Key convergence optimization strategies:

Animation: Step-by-step convergence timeline showing how a single link failure propagates through detection, LSA flooding, SPF computation, and FIB update phases with timing annotations.

1.3 Scalability and Hierarchy Design

Without hierarchy, every device must process every topology change, creating O(n) processing load. Hierarchy design patterns include:

The fundamental design question: How large can a single control plane domain be before it must be partitioned? The answer depends on prefix count, topology churn rate, weakest router capacity, and convergence time requirements.

2. Centralized Control Plane Design

A centralized control plane consolidates network intelligence into a single controller or small cluster that maintains a global view and pushes forwarding decisions to data plane devices. This is the foundational concept behind Software-Defined Networking (SDN).

Key Points: Centralized Control Plane

2.1 SDN Controller Architectures

OpenFlow: Reactive vs. Proactive Mode

flowchart TD subgraph Reactive["Reactive Mode"] R1["Packet arrives\nat switch"] --> R2["No matching\nflow rule"] R2 --> R3["Packet-in to\ncontroller"] R3 --> R4["Controller computes\nforwarding decision"] R4 --> R5["Flow rule installed\non switch"] end subgraph Proactive["Proactive Mode"] P1["Controller pre-computes\nflow rules"] --> P2["Rules pre-installed\non switches"] P2 --> P3["Packet arrives\nat switch"] P3 --> P4["Matches existing\nflow rule"] P4 --> P5["Forwarded\nimmediately"] end style Reactive fill:#fff3e0,stroke:#e65100 style Proactive fill:#e8f5e9,stroke:#2e7d32

Figure 6.2: OpenFlow Reactive vs. Proactive Mode

ArchitectureDescriptionScalabilityComplexity
Single ControllerOne controller manages all switchesLimitedLow
Distributed (Flat)Peer controllers share load equallyHighModerate
HierarchicalLocal controllers report to global controllerHighestHigh
HybridCombination of patternsConfigurableVariable

PCEP: Centralized Path Computation

PCEP focuses specifically on path computation for MPLS and Segment Routing TE tunnels. Rather than replacing the entire distributed control plane, it centralizes only where a global view provides the most benefit.

graph TD A["Stateless PCE\nOn-demand path computation\nNo tunnel state retained"] --> B["Stateful PCE\nMaintains LSP database\nGlobal optimization"] B --> C["PCECC\nCentral LSP setup & initiation\nDownloads label entries"] C --> D["SR-PCEP Extensions\nComputes segment lists\nPrograms SR-MPLS TE Policies"] A -.->|"Increasing centralization"| D style A fill:#e3f2fd,stroke:#1565c0 style B fill:#bbdefb,stroke:#1565c0 style C fill:#90caf9,stroke:#1565c0 style D fill:#64b5f6,stroke:#1565c0,color:#fff

Figure 6.3: PCEP Capability Evolution -- from stateless to full SR policy programming

Animation: PCEP path computation flow -- showing a PCC requesting a constrained path, the PCE computing it using global topology, and the resulting label stack being installed on the headend router.

2.2 Controller Redundancy and High Availability

A centralized controller is a single point of failure. Three requirements cannot be met with one controller: efficiency, scalability, and high availability.

flowchart TD subgraph AS["Active/Standby"] AS_A["Active Controller\nHandles all traffic"] --- AS_S["Standby Controller\nIdle replica"] AS_A --> SW1["Switches"] end subgraph LB["Leader-Based (ONOS)"] LB_L["Leader Node\nCoordinates cluster"] --- LB_F1["Follower 1"] LB_L --- LB_F2["Follower 2"] LB_L --> SW2["Switches"] LB_F1 --> SW2 LB_F2 --> SW2 end subgraph LL["Leaderless (ODL)"] LL_1["Node 1"] --- LL_2["Node 2"] LL_2 --- LL_3["Node 3"] LL_1 --- LL_3 LL_1 --> SW3["Switches"] LL_2 --> SW3 LL_3 --> SW3 end style AS fill:#ffebee,stroke:#c62828 style LB fill:#fff3e0,stroke:#e65100 style LL fill:#e8f5e9,stroke:#2e7d32

Figure 6.4: SDN Controller HA Models

AspectActive/StandbyLeader-Based (ONOS)Leaderless (ODL)
Resource efficiencyLow (idle standbys)High (all active)High (all active)
Failover speedModerateFast (leader election)Fast (no leader needed)
Consistency modelStrong (single writer)Strong (RAFT)Strong (RAFT)
Best fitSmall deploymentsLarge-scale SDNSmall-to-medium SDN

The PDLC (Physically Distributed, Logically Centralized) pattern is the dominant production architecture. Controllers are physically dispersed for performance and fault tolerance but present a unified logical view. This introduces CAP theorem trade-offs -- at most two of: Consistency, Availability, Partition tolerance.

2.3 Centralized vs. Distributed Failure Domains

In a decentralized control plane, failure domains are naturally bounded: an OSPF area failure stays within that area. In a centralized control plane, the blast radius can be much larger -- a controller bug or misconfigured policy can affect every managed switch simultaneously.

Strategies for minimizing centralized failure domains:

Key Points: Failure Domain Comparison

3. Hybrid Control Plane Architectures

A hybrid control plane combines centralized and decentralized elements, centralizing what benefits from consistency (policy, analytics, path computation) while distributing what benefits from local autonomy (forwarding, failure detection, fast reroute). The hybrid model dominates modern enterprise design.

Key Points: Hybrid Architecture

3.1 Centralized Policy with Distributed Forwarding

flowchart TD subgraph Centralized["Centralized Functions"] C1["Policy Definition\n& Identity Mgmt"] C2["Path Computation\n(PCEP / SR-TE)"] C3["Assurance &\nAnalytics"] C4["Config Orchestration\n& Compliance"] end subgraph Distributed["Distributed Functions"] D1["Packet\nForwarding"] D2["Failure Detection\n& Fast Reroute"] D3["Topology Discovery\n(Routing Protocols)"] D4["Real-Time Link/Node\nAdaptation"] end Centralized -->|"Policy push /\nintent translation"| Distributed Distributed -->|"Telemetry /\nstate feedback"| Centralized style Centralized fill:#e8eaf6,stroke:#283593 style Distributed fill:#fce4ec,stroke:#b71c1c

Figure 6.5: Hybrid Control Plane Function Separation

FunctionCentralized or DistributedRationale
Identity and access policyCentralizedConsistent enforcement across all access points
Underlay routingDistributedResilience to controller failures, fast convergence
Overlay control (endpoint mapping)Centralized/HybridGlobal view enables optimal forwarding, mobility
Traffic engineeringCentralized (PCEP/SR)Global optimization requires global topology view
Failure detection and fast rerouteDistributed (BFD, LFA)Sub-second response requires local autonomy
Configuration and provisioningCentralized (automation)Consistency, compliance, speed of deployment
Network assurance and telemetryCentralized (analytics)Correlation across devices requires aggregation
Animation: SD-Access packet walk -- showing a frame entering a fabric edge node, LISP map-request/reply to the control plane node, VXLAN encapsulation over IS-IS underlay, and decapsulation at the destination edge node.

3.2 Cisco SD-Access: The Canonical Hybrid Model

SD-Access cleanly separates overlay and underlay control planes:

SD-Access Wireless demonstrates the hybrid model well: the control plane is centralized (CAPWAP to WLC), but the data plane is distributed (VXLAN from fabric-enabled APs), eliminating the traditional WLC data hairpin.

3.3 Trade-offs Between Control Plane Models

DimensionDecentralizedCentralizedHybrid
ResilienceHigh -- independent devicesLow-to-Moderate -- controller dependencyHigh -- distributed forwarding survives outage
ScalabilityModerate -- protocol limitsLimited by controller capacityHigh -- planes scale independently
ConvergenceProtocol-dependent (ms to s)Potentially faster but bottleneck riskDistributed fast reroute + centralized re-optimization
Policy consistencyDifficult -- per-device driftExcellent -- single source of truthExcellent -- centralized policy, distributed enforcement
Failure domainSmall (area/domain)Large (entire controller domain)Mixed (depends on which plane fails)
Vendor dependencyLow (open protocols)High (controller lock-in)Moderate (framework-specific)
When to choose each model: Decentralized for maximum resilience (SP backbone). Centralized for uniform policy with tolerance for controller dependency (small DC). Hybrid for enterprise campus/multi-site where policy consistency and resilience are both critical -- the dominant modern choice.
Animation: Side-by-side failure scenario comparison -- showing how a link failure propagates differently in decentralized (contained to area), centralized (controller-dependent recovery), and hybrid (fast local reroute + deferred re-optimization) architectures.

Post-Study Assessment

Now that you have studied the material, answer the same questions again to measure your learning.

Post-Quiz

1. What is the primary mechanism by which decentralized control planes achieve scalability?

Deploying more powerful routers with faster CPUs Hierarchy, summarization, and domain partitioning Using a single global routing table shared by all devices Centralizing all route computation on a dedicated server

2. In OpenFlow, what happens in reactive mode when a switch receives a packet with no matching flow rule?

The switch drops the packet silently The switch forwards it out all ports (flooding) The switch sends a packet-in message to the controller The switch buffers the packet indefinitely until a rule is installed

3. Which consensus algorithm do modern SDN controllers like ONOS and OpenDaylight use for distributed state synchronization?

Paxos RAFT Two-Phase Commit Zab (ZooKeeper Atomic Broadcast)

4. In Cisco SD-Access, what protocol provides the overlay control plane for endpoint-to-location mapping?

OSPF BGP EVPN LISP MPLS LDP

5. What is the single most important design question for a centralized control plane?

How fast is the controller? How many flow rules can the controller install per second? What happens when the controller fails? Which vendor makes the best controller?

6. Which of the following correctly describes the four phases of link-state protocol convergence in order?

SPF computation, failure detection, LSA flooding, FIB update Failure detection, LSA/LSP generation and flooding, SPF computation, RIB/FIB update LSA flooding, failure detection, RIB update, SPF computation Failure detection, SPF computation, LSA flooding, RIB/FIB update

7. Why is IS-IS preferred over OSPF as the underlay routing protocol in Cisco SD-Access?

IS-IS supports more areas than OSPF IS-IS runs over Layer 2 and has simpler extensibility via TLV structures IS-IS has faster convergence than OSPF in all scenarios IS-IS requires fewer CPU resources than OSPF

8. What does the PDLC pattern stand for, and why is it significant?

Protocol Driven Logical Configuration -- a method for automating router configs Physically Distributed, Logically Centralized -- controllers are dispersed but present a unified view Partially Decentralized Layered Control -- a hybrid routing hierarchy Path Distribution and Label Computation -- an MPLS optimization technique

9. In a hybrid control plane architecture, which function should be distributed rather than centralized?

Policy definition and identity management Network assurance and telemetry aggregation Failure detection and fast reroute (BFD, LFA) Configuration provisioning and compliance

10. What is the key advantage of a stateful PCE over a stateless PCE?

Stateful PCE uses less memory Stateful PCE maintains an LSP database enabling global optimization and re-optimization Stateful PCE does not require communication with routers Stateful PCE supports only Segment Routing, not MPLS

11. What happens to an SD-Access fabric when DNA Center (Catalyst Center) becomes temporarily unavailable?

The entire fabric stops forwarding traffic immediately The fabric continues forwarding and enforcing existing policies; new changes are deferred The fabric reverts to a flat Layer 2 topology All VXLAN tunnels are torn down until recovery

12. Which EIGRP feature enables sub-second failover without full route recomputation?

Route summarization at area boundaries Feasible successors (pre-computed backup paths via DUAL) Incremental SPF calculation BFD-assisted hello timers

13. For a leaderless SDN controller cluster (like OpenDaylight), what is the primary performance advantage over a leader-based cluster (like ONOS)?

It supports more switches per controller node It eliminates the need for consensus algorithms entirely It offers less topology discovery and flow installation time in small-to-medium environments It provides stronger consistency guarantees than leader-based clusters

14. Why does Segment Routing TI-LFA (Topology Independent LFA) represent an improvement over basic LFA and Remote LFA?

TI-LFA uses less router memory than LFA TI-LFA guarantees 100% backup path coverage regardless of topology constraints TI-LFA eliminates the need for BFD TI-LFA works only with centralized controllers, avoiding distributed complexity

15. In SD-Access wireless, what architectural change eliminates the traditional data traffic "hairpin" through the WLC?

Moving to autonomous AP mode with no controller Using VXLAN directly from fabric-enabled APs for distributed data plane forwarding Replacing CAPWAP with GRE tunnels Deploying local-mode APs with FlexConnect

Your Progress

Answer Explanations