Chapter 6: Centralized, Decentralized, and Hybrid Control Planes
Learning Objectives
Compare centralized, decentralized, and hybrid control plane architectures with trade-off analysis across scalability, resiliency, convergence, and operational complexity dimensions.
Design control plane architectures appropriate for different network scales and requirements, from small campus deployments to multi-site enterprise fabrics.
Evaluate controller-based solutions and their impact on network resiliency and scalability, including SDN controllers, PCEP, and Cisco SD-Access.
Pre-Study Assessment
Answer these questions before studying the material to establish your baseline understanding.
Pre-Quiz
1. What is the primary mechanism by which decentralized control planes achieve scalability?
Deploying more powerful routers with faster CPUsHierarchy, summarization, and domain partitioningUsing a single global routing table shared by all devicesCentralizing all route computation on a dedicated server
2. In OpenFlow, what happens in reactive mode when a switch receives a packet with no matching flow rule?
The switch drops the packet silentlyThe switch forwards it out all ports (flooding)The switch sends a packet-in message to the controllerThe switch buffers the packet indefinitely until a rule is installed
3. Which consensus algorithm do modern SDN controllers like ONOS and OpenDaylight use for distributed state synchronization?
4. In Cisco SD-Access, what protocol provides the overlay control plane for endpoint-to-location mapping?
OSPFBGP EVPNLISPMPLS LDP
5. What is the single most important design question for a centralized control plane?
How fast is the controller?How many flow rules can the controller install per second?What happens when the controller fails?Which vendor makes the best controller?
6. Which of the following correctly describes the four phases of link-state protocol convergence in order?
7. Why is IS-IS preferred over OSPF as the underlay routing protocol in Cisco SD-Access?
IS-IS supports more areas than OSPFIS-IS runs over Layer 2 and has simpler extensibility via TLV structuresIS-IS has faster convergence than OSPF in all scenariosIS-IS requires fewer CPU resources than OSPF
8. What does the PDLC pattern stand for, and why is it significant?
Protocol Driven Logical Configuration -- a method for automating router configsPhysically Distributed, Logically Centralized -- controllers are dispersed but present a unified viewPartially Decentralized Layered Control -- a hybrid routing hierarchyPath Distribution and Label Computation -- an MPLS optimization technique
9. In a hybrid control plane architecture, which function should be distributed rather than centralized?
Policy definition and identity managementNetwork assurance and telemetry aggregationFailure detection and fast reroute (BFD, LFA)Configuration provisioning and compliance
10. What is the key advantage of a stateful PCE over a stateless PCE?
Stateful PCE uses less memoryStateful PCE maintains an LSP database enabling global optimization and re-optimizationStateful PCE does not require communication with routersStateful PCE supports only Segment Routing, not MPLS
11. What happens to an SD-Access fabric when DNA Center (Catalyst Center) becomes temporarily unavailable?
The entire fabric stops forwarding traffic immediatelyThe fabric continues forwarding and enforcing existing policies; new changes are deferredThe fabric reverts to a flat Layer 2 topologyAll VXLAN tunnels are torn down until recovery
12. Which EIGRP feature enables sub-second failover without full route recomputation?
Route summarization at area boundariesFeasible successors (pre-computed backup paths via DUAL)Incremental SPF calculationBFD-assisted hello timers
13. For a leaderless SDN controller cluster (like OpenDaylight), what is the primary performance advantage over a leader-based cluster (like ONOS)?
It supports more switches per controller nodeIt eliminates the need for consensus algorithms entirelyIt offers less topology discovery and flow installation time in small-to-medium environmentsIt provides stronger consistency guarantees than leader-based clusters
14. Why does Segment Routing TI-LFA (Topology Independent LFA) represent an improvement over basic LFA and Remote LFA?
TI-LFA uses less router memory than LFATI-LFA guarantees 100% backup path coverage regardless of topology constraintsTI-LFA eliminates the need for BFDTI-LFA works only with centralized controllers, avoiding distributed complexity
15. In SD-Access wireless, what architectural change eliminates the traditional data traffic "hairpin" through the WLC?
Moving to autonomous AP mode with no controllerUsing VXLAN directly from fabric-enabled APs for distributed data plane forwardingReplacing CAPWAP with GRE tunnelsDeploying local-mode APs with FlexConnect
1. Decentralized Control Plane Design
A decentralized control plane is the traditional networking model where every router and switch independently runs routing protocols, exchanges topology information with neighbors, and makes autonomous forwarding decisions. There is no single device or software platform that holds a master copy of the network state -- the "truth" emerges from the collective agreement of all participating devices.
Key Points: Decentralized Control Plane Fundamentals
Each device independently computes forwarding decisions using distributed algorithms -- no master controller exists.
Scalability is achieved through hierarchy, summarization, and domain partitioning, not more powerful hardware.
Convergence optimization spans four sequential phases: failure detection, LSA/LSP flooding, SPF computation, and RIB/FIB update.
Different routing protocols (OSPF, IS-IS, BGP, EIGRP) offer distinct trade-offs in convergence speed, scalability approach, and operational complexity.
Failure domains are naturally bounded by protocol areas and domains, keeping blast radius small.
1.1 Distributed Routing Protocol Comparison
The CCDE candidate must understand the design trade-offs among the major distributed routing protocols. Each brings distinct convergence characteristics, scalability limits, and operational models.
Protocol
Type
Hierarchy Model
Convergence Speed
Scalability Approach
Best Fit
OSPF
Link-state
Areas (backbone + non-backbone)
Fast (SPF)
Area partitioning, stub areas
Enterprise campus, mid-size SP
IS-IS
Link-state
Levels (L1/L2)
Fast (SPF)
Level hierarchy, mesh groups
Large SP, DC underlay, SD-Access
BGP
Path-vector
AS hierarchy, confederations
Moderate (timer-dependent)
Incremental updates, route reflectors
Inter-domain, DC fabric, WAN
EIGRP
Adv. distance-vector
Summarization boundaries
Very fast (feasible successors)
Query scoping, stub routing
Enterprise branch, campus
IS-IS runs directly over Layer 2 (not IP) and uses TLV structures for simpler extensibility -- this is why Cisco SD-Access selects it as the default underlay protocol. BGP's incremental update mechanism (advertising only changes) gives it inherent scalability for very large topologies. EIGRP maintains feasible successors for sub-second failover without full recomputation.
Animation: Distributed routing protocol convergence comparison -- showing how OSPF area flooding, IS-IS level hierarchy, BGP incremental updates, and EIGRP feasible successors each respond to the same link failure event.
1.2 Convergence Optimization
Convergence -- the time for all devices to agree on a consistent network view after a topology change -- is one of the most critical design considerations. The convergence timeline for link-state protocols involves four sequential phases:
graph LR
A["1. Failure Detection\nPhysical layer / BFD /\nHello timer expiry"] --> B["2. LSA/LSP Generation\n& Flooding\nThrottle timers control\npropagation speed"]
B --> C["3. SPF Computation\nDijkstra / iSPF on\neach router independently"]
C --> D["4. RIB/FIB Update\nForwarding table\nreprogrammed (TCAM)"]
style A fill:#4a90d9,stroke:#333,color:#fff
style B fill:#f5a623,stroke:#333,color:#fff
style C fill:#d0021b,stroke:#333,color:#fff
style D fill:#7ed321,stroke:#333,color:#fff
Prefix prioritization: Ensures critical prefixes converge before less important ones.
Hello/dead timer tuning: Reduces detection time, balanced against false positive risk.
Animation: Step-by-step convergence timeline showing how a single link failure propagates through detection, LSA flooding, SPF computation, and FIB update phases with timing annotations.
1.3 Scalability and Hierarchy Design
Without hierarchy, every device must process every topology change, creating O(n) processing load. Hierarchy design patterns include:
OSPF area design: Stub and totally stubby areas reduce LSDB size; NSSA areas allow external route injection at non-backbone locations.
IS-IS level design: L1 routers maintain only intra-area topology; L1/L2 routers summarize between levels.
BGP route reflectors: Eliminate full-mesh iBGP; a cluster can serve hundreds of clients.
EIGRP stub routing: Stub routers do not participate in DUAL diffusing computation, reducing query scope.
Summarization: Aggregating prefixes at hierarchy boundaries reduces routing table size and limits change propagation.
The fundamental design question: How large can a single control plane domain be before it must be partitioned? The answer depends on prefix count, topology churn rate, weakest router capacity, and convergence time requirements.
2. Centralized Control Plane Design
A centralized control plane consolidates network intelligence into a single controller or small cluster that maintains a global view and pushes forwarding decisions to data plane devices. This is the foundational concept behind Software-Defined Networking (SDN).
Key Points: Centralized Control Plane
SDN controllers maintain a global network view and push flow rules to switches via protocols like OpenFlow.
OpenFlow operates in reactive mode (per-flow setup via controller) or proactive mode (pre-installed rules) -- each with distinct latency and scalability trade-offs.
PCEP centralizes only path computation (not all forwarding), representing a pragmatic partial centralization approach.
Controller HA requires distributed clustering with RAFT consensus; the PDLC pattern (Physically Distributed, Logically Centralized) is the production standard.
The critical design question is not controller speed but what happens when the controller fails -- graceful degradation is essential.
PCEP focuses specifically on path computation for MPLS and Segment Routing TE tunnels. Rather than replacing the entire distributed control plane, it centralizes only where a global view provides the most benefit.
graph TD
A["Stateless PCE\nOn-demand path computation\nNo tunnel state retained"] --> B["Stateful PCE\nMaintains LSP database\nGlobal optimization"]
B --> C["PCECC\nCentral LSP setup & initiation\nDownloads label entries"]
C --> D["SR-PCEP Extensions\nComputes segment lists\nPrograms SR-MPLS TE Policies"]
A -.->|"Increasing centralization"| D
style A fill:#e3f2fd,stroke:#1565c0
style B fill:#bbdefb,stroke:#1565c0
style C fill:#90caf9,stroke:#1565c0
style D fill:#64b5f6,stroke:#1565c0,color:#fff
Figure 6.3: PCEP Capability Evolution -- from stateless to full SR policy programming
Animation: PCEP path computation flow -- showing a PCC requesting a constrained path, the PCE computing it using global topology, and the resulting label stack being installed on the headend router.
2.2 Controller Redundancy and High Availability
A centralized controller is a single point of failure. Three requirements cannot be met with one controller: efficiency, scalability, and high availability.
The PDLC (Physically Distributed, Logically Centralized) pattern is the dominant production architecture. Controllers are physically dispersed for performance and fault tolerance but present a unified logical view. This introduces CAP theorem trade-offs -- at most two of: Consistency, Availability, Partition tolerance.
2.3 Centralized vs. Distributed Failure Domains
In a decentralized control plane, failure domains are naturally bounded: an OSPF area failure stays within that area. In a centralized control plane, the blast radius can be much larger -- a controller bug or misconfigured policy can affect every managed switch simultaneously.
Strategies for minimizing centralized failure domains:
Controller placement: Position close to managed switches with redundant paths.
Domain partitioning: Different controllers/clusters for different network segments.
Graceful degradation: Switches continue forwarding on last-known flow tables when controller connectivity is lost.
Physical infrastructure independence: Independent power, cabling, and rack placement.
Key Points: Failure Domain Comparison
Decentralized failure domains are small and bounded by protocol areas -- an OSPF area crash stays in that area.
Centralized failure domains can span the entire controller domain -- a single bug or misconfiguration propagates everywhere.
Graceful degradation (switches forwarding on last-known state) is the essential safety net for centralized designs.
3. Hybrid Control Plane Architectures
A hybrid control plane combines centralized and decentralized elements, centralizing what benefits from consistency (policy, analytics, path computation) while distributing what benefits from local autonomy (forwarding, failure detection, fast reroute). The hybrid model dominates modern enterprise design.
Cisco SD-Access is the canonical example: IS-IS (distributed underlay), LISP (centralized overlay mapping), VXLAN (distributed data plane), DNA Center (centralized intent).
If DNA Center goes down, the fabric continues forwarding and enforcing existing policies -- only new changes are deferred.
The CCDE exam tests your ability to justify which functions you centralize vs. distribute, and what happens when each component fails.
3.1 Centralized Policy with Distributed Forwarding
Figure 6.5: Hybrid Control Plane Function Separation
Function
Centralized or Distributed
Rationale
Identity and access policy
Centralized
Consistent enforcement across all access points
Underlay routing
Distributed
Resilience to controller failures, fast convergence
Overlay control (endpoint mapping)
Centralized/Hybrid
Global view enables optimal forwarding, mobility
Traffic engineering
Centralized (PCEP/SR)
Global optimization requires global topology view
Failure detection and fast reroute
Distributed (BFD, LFA)
Sub-second response requires local autonomy
Configuration and provisioning
Centralized (automation)
Consistency, compliance, speed of deployment
Network assurance and telemetry
Centralized (analytics)
Correlation across devices requires aggregation
Animation: SD-Access packet walk -- showing a frame entering a fabric edge node, LISP map-request/reply to the control plane node, VXLAN encapsulation over IS-IS underlay, and decapsulation at the destination edge node.
3.2 Cisco SD-Access: The Canonical Hybrid Model
SD-Access cleanly separates overlay and underlay control planes:
Underlay (IS-IS): Distributed routing providing resilient IP reachability between all fabric nodes.
Overlay (LISP): Map Server + Map Resolver maintain the Host Tracking Database (HTDB) of EID-to-RLOC bindings. Edge nodes register endpoints and query mappings.
Data Plane (VXLAN): Distributed packet encapsulation/decapsulation transporting Layer 2 frames over the Layer 3 underlay.
Management (DNA Center): Centralized intent-based platform for policy, assurance, provisioning, and lifecycle management.
SD-Access Wireless demonstrates the hybrid model well: the control plane is centralized (CAPWAP to WLC), but the data plane is distributed (VXLAN from fabric-enabled APs), eliminating the traditional WLC data hairpin.
3.3 Trade-offs Between Control Plane Models
Dimension
Decentralized
Centralized
Hybrid
Resilience
High -- independent devices
Low-to-Moderate -- controller dependency
High -- distributed forwarding survives outage
Scalability
Moderate -- protocol limits
Limited by controller capacity
High -- planes scale independently
Convergence
Protocol-dependent (ms to s)
Potentially faster but bottleneck risk
Distributed fast reroute + centralized re-optimization
When to choose each model: Decentralized for maximum resilience (SP backbone). Centralized for uniform policy with tolerance for controller dependency (small DC). Hybrid for enterprise campus/multi-site where policy consistency and resilience are both critical -- the dominant modern choice.
Animation: Side-by-side failure scenario comparison -- showing how a link failure propagates differently in decentralized (contained to area), centralized (controller-dependent recovery), and hybrid (fast local reroute + deferred re-optimization) architectures.
Post-Study Assessment
Now that you have studied the material, answer the same questions again to measure your learning.
Post-Quiz
1. What is the primary mechanism by which decentralized control planes achieve scalability?
Deploying more powerful routers with faster CPUsHierarchy, summarization, and domain partitioningUsing a single global routing table shared by all devicesCentralizing all route computation on a dedicated server
2. In OpenFlow, what happens in reactive mode when a switch receives a packet with no matching flow rule?
The switch drops the packet silentlyThe switch forwards it out all ports (flooding)The switch sends a packet-in message to the controllerThe switch buffers the packet indefinitely until a rule is installed
3. Which consensus algorithm do modern SDN controllers like ONOS and OpenDaylight use for distributed state synchronization?
4. In Cisco SD-Access, what protocol provides the overlay control plane for endpoint-to-location mapping?
OSPFBGP EVPNLISPMPLS LDP
5. What is the single most important design question for a centralized control plane?
How fast is the controller?How many flow rules can the controller install per second?What happens when the controller fails?Which vendor makes the best controller?
6. Which of the following correctly describes the four phases of link-state protocol convergence in order?
7. Why is IS-IS preferred over OSPF as the underlay routing protocol in Cisco SD-Access?
IS-IS supports more areas than OSPFIS-IS runs over Layer 2 and has simpler extensibility via TLV structuresIS-IS has faster convergence than OSPF in all scenariosIS-IS requires fewer CPU resources than OSPF
8. What does the PDLC pattern stand for, and why is it significant?
Protocol Driven Logical Configuration -- a method for automating router configsPhysically Distributed, Logically Centralized -- controllers are dispersed but present a unified viewPartially Decentralized Layered Control -- a hybrid routing hierarchyPath Distribution and Label Computation -- an MPLS optimization technique
9. In a hybrid control plane architecture, which function should be distributed rather than centralized?
Policy definition and identity managementNetwork assurance and telemetry aggregationFailure detection and fast reroute (BFD, LFA)Configuration provisioning and compliance
10. What is the key advantage of a stateful PCE over a stateless PCE?
Stateful PCE uses less memoryStateful PCE maintains an LSP database enabling global optimization and re-optimizationStateful PCE does not require communication with routersStateful PCE supports only Segment Routing, not MPLS
11. What happens to an SD-Access fabric when DNA Center (Catalyst Center) becomes temporarily unavailable?
The entire fabric stops forwarding traffic immediatelyThe fabric continues forwarding and enforcing existing policies; new changes are deferredThe fabric reverts to a flat Layer 2 topologyAll VXLAN tunnels are torn down until recovery
12. Which EIGRP feature enables sub-second failover without full route recomputation?
Route summarization at area boundariesFeasible successors (pre-computed backup paths via DUAL)Incremental SPF calculationBFD-assisted hello timers
13. For a leaderless SDN controller cluster (like OpenDaylight), what is the primary performance advantage over a leader-based cluster (like ONOS)?
It supports more switches per controller nodeIt eliminates the need for consensus algorithms entirelyIt offers less topology discovery and flow installation time in small-to-medium environmentsIt provides stronger consistency guarantees than leader-based clusters
14. Why does Segment Routing TI-LFA (Topology Independent LFA) represent an improvement over basic LFA and Remote LFA?
TI-LFA uses less router memory than LFATI-LFA guarantees 100% backup path coverage regardless of topology constraintsTI-LFA eliminates the need for BFDTI-LFA works only with centralized controllers, avoiding distributed complexity
15. In SD-Access wireless, what architectural change eliminates the traditional data traffic "hairpin" through the WLC?
Moving to autonomous AP mode with no controllerUsing VXLAN directly from fabric-enabled APs for distributed data plane forwardingReplacing CAPWAP with GRE tunnelsDeploying local-mode APs with FlexConnect