Study Guide: Chapter 11 — Cisco UCS Configuration for AI Workloads

Pre-Quiz: Domain Profiles and Service Profiles

1. What is a UCS Domain Profile in Intersight?

A policy that configures a single server's BIOS settings

A top-level construct that configures a Fabric Interconnect pair

A template for creating VLANs across multiple switches

A storage configuration for boot-from-SAN

2. What does "stateless computing" mean in UCS?

Servers do not retain any data after power-off

Server identity is abstracted from physical hardware and can migrate between servers

The server runs without an operating system

Fabric Interconnects operate without configuration

3. Which four policy categories are used in a Server Profile in Intersight Managed Mode?

Compute, Network, Storage, Management

BIOS, Boot, Power, Thermal

LAN, SAN, VLAN, VSAN

Domain, Server, Adapter, QoS

4. Why is template-based provisioning essential for AI clusters?

It reduces the number of VLANs needed

It ensures consistent configuration across all GPU nodes and enables rapid replacement

It eliminates the need for power policies

It automatically enables RoCE on all vNICs

5. When a VLAN policy referenced by multiple domain profiles is updated, what happens?

Only the first domain profile receives the update

All domain profiles referencing it must be manually redeployed

Every domain profile referencing that policy inherits the change automatically

The update is queued until the next maintenance window

Domain Profile Architecture

A UCS Domain Profile is the top-level configuration construct in Cisco Intersight that represents and configures a pair of Fabric Interconnects (FIs). It encapsulates all the policies that define FI behavior: port configurations, port channels, VLANs, VSANs, and network control settings. A single domain policy (such as a VLAN policy) can be assigned to any number of domain profiles -- updating the policy once propagates changes to all referencing profiles automatically.

graph TD DPT["Domain Profile Template"] --> DP1["Domain Profile 1
(AI Cluster A)"] DPT --> DP2["Domain Profile 2
(AI Cluster B)"] DPT --> DP3["Domain Profile 3
(AI Cluster C)"] DP1 --> FI1["Fabric Interconnect Pair A"] DP2 --> FI2["Fabric Interconnect Pair B"] DP3 --> FI3["Fabric Interconnect Pair C"] PP["Port Policy"] -.->|referenced by| DP1 PP -.->|referenced by| DP2 PP -.->|referenced by| DP3 VP["VLAN Policy"] -.->|referenced by| DP1 VP -.->|referenced by| DP2 VP -.->|referenced by| DP3 VSANP["VSAN Policy"] -.->|referenced by| DP1 NCP["Network Control Policy"] -.->|referenced by| DP1 NTPP["NTP Policy"] -.->|referenced by| DP1 QOSP["QoS System Class"] -.->|referenced by| DP1 style DPT fill:#4a90d9,color:#fff style PP fill:#f5a623,color:#fff style VP fill:#f5a623,color:#fff style VSANP fill:#f5a623,color:#fff style NCP fill:#f5a623,color:#fff style NTPP fill:#f5a623,color:#fff style QOSP fill:#f5a623,color:#fff

Domain Profile Component	Purpose	AI Workload Relevance
Port Policy	Defines server, uplink, and FCoE port roles	Ensures sufficient 100G/200G uplinks for GPU traffic
VLAN Policy	Configures L2 broadcast domains	Segregates AI training, storage, and management traffic
VSAN Policy	Configures Fibre Channel domains	Enables boot-from-SAN for stateless AI nodes
Network Control Policy	CDP, LLDP, MAC settings	Required for proper DCBX negotiation with upstream switches
NTP Policy	Time synchronization	Critical for distributed training coordination
QoS System Class	Traffic prioritization	Enables no-drop classes for RoCE/RDMA

Service Profile and Server Profile Design

Cisco UCS implements stateless computing through service profiles (UCS Manager) and server profiles (Intersight Managed Mode). A service profile abstracts the complete server identity -- UUID, MAC addresses, WWNN, WWPN, boot policy, firmware level, and BIOS settings -- from the physical hardware. When migrated to another server, the entire identity moves with it.

Policy Category	Included Policies	AI Configuration Focus
Compute	BIOS, Boot Order, Firmware, Power, Thermal, Persistent Memory	GPU-optimized BIOS settings, UEFI boot, power no-cap
Network	LAN Connectivity, SAN Connectivity, Adapter Policies	vNIC configuration, RoCE enablement, jumbo MTU
Storage	Local disk, SAN storage, Boot-from-SAN	M.2 RAID1 boot, NVMe data drives, SAN boot targets
Management	IPMI, Serial over LAN, SNMP, Syslog	Monitoring, out-of-band access, log collection

Template-Based Provisioning for AI Clusters

For AI clusters where tens or hundreds of identically configured GPU nodes are required, template-based provisioning is essential. Server Profile Templates in Intersight (or Service Profile Templates in UCS Manager) let you define a golden configuration once and derive individual profiles from it. Any modification to the template automatically syncs to all derived profiles.

Worked Example: Creating a GPU Node Server Profile Template in Intersight -- (1) Create template with name AI-GPU-Node-Template, (2) attach Compute policies (GPU-optimized BIOS, UEFI boot, no-cap power), (3) attach Network policies (two RoCE-enabled vNICs, MTU 9000, no-drop QoS), (4) attach Storage (M.2 RAID1 boot), (5) attach Management (SNMP, Syslog), (6) derive individual server profiles and associate each to a physical server.

Post-Quiz: Domain Profiles and Service Profiles

1. A domain profile template is updated to add a new VLAN. What happens to the three domain profiles derived from it?

Nothing -- derived profiles are snapshots at creation time

All three automatically inherit the new VLAN

Only the most recently deployed profile inherits it

The update is rejected because derived profiles are locked

2. Which construct provides stateless computing in UCS Manager?

Domain profile

Service profile

VLAN policy

Power policy

3. In Intersight Managed Mode, which policy category includes BIOS and boot order?

Network

Storage

Compute

Management

4. Where do derived server profiles get unique MAC addresses and WWPNs?

They are manually assigned by the administrator

From identity pools referenced by the template

From the physical server's hardware ROM

From the Fabric Interconnect's MAC table

5. A domain profile configures which level of UCS infrastructure?

Individual server blades

The Fabric Interconnect pair

GPU adapter cards

Storage arrays

Section 2: Power and NTP Policies

Pre-Quiz: Power and NTP Policies

1. How much power can a single NVIDIA H100 GPU draw?

150W

350W

700W

1200W

2. Which power redundancy mode is recommended for AI deployments?

Non-Redundant

N+1 Redundancy

Grid Redundancy

Active-Standby

3. What does "no-cap" power priority mean in UCS?

The server has unlimited power from the grid

The blade is prioritized over others during dynamic power rebalancing

Power capping is disabled for the entire chassis

The PSUs run at maximum output at all times

4. Why is NTP critical for AI training clusters?

It controls GPU clock speeds

Distributed training frameworks rely on synchronized timing for barrier operations

It determines the training batch size

It is required to boot the operating system

5. What does Extended Power Capacity provide on UCS X-Series?

Doubles the number of available PSU slots

Increases total power allocation by 15%

Enables hot-swap of GPU modules

Adds battery backup for uninterruptible operation

Power Policy for GPU Systems

GPU-accelerated AI servers are among the most power-hungry systems in a data center. A single NVIDIA H100 GPU draws up to 700W, and a server with eight GPUs can easily exceed 6,000W total system power. Cisco UCS power policies must ensure GPU nodes receive adequate power under all conditions.

Redundancy Mode	Description	PSU Behavior on Failure	AI Recommendation
Grid Redundancy	Two independent power sources	Surviving PSUs on alternate circuit continue	Recommended for all AI deployments
N+1 Redundancy	One extra PSU beyond minimum	Remaining PSUs share load	Acceptable for non-critical AI dev
Non-Redundant	All PSUs active, no redundancy	Single PSU failure may cause outage	Never use for AI workloads

Power Capping and Dynamic Rebalancing

UCS uses power control policies to manage how power is allocated and borrowed among blades within a chassis. During normal operation, active blades can borrow power from idle blades. When all blades are active and at their power cap, the priority determines which blades get preference. For AI workloads, use no-cap or high priority.

stateDiagram-v2 [*] --> InitialAllocation: Server powers on InitialAllocation: Initial Power Allocation InitialAllocation --> NormalOperation: Power budget assigned NormalOperation: Normal Operation NormalOperation --> BorrowingPower: Blade needs more power BorrowingPower: Borrowing from Idle Blades BorrowingPower --> NormalOperation: Load decreases NormalOperation --> Contention: All blades active at cap Contention: Power Contention Contention --> Throttled: Low-priority blade Contention --> FullPower: No-cap / High-priority blade Throttled: GPU Throttled FullPower: Full Power Maintained FullPower --> NormalOperation: Contention resolves Throttled --> NormalOperation: Contention resolves

Power Feature	Default Setting	AI-Optimized Setting	Impact
Redundancy Mode	Grid	Grid	Protects against full circuit loss
Power Control Priority	Medium	No-Cap	Prevents GPU throttling under load
Extended Power Capacity	Disabled	Enabled	+15% power budget for GPU headroom
Power Save Mode	Enabled	Evaluate per deployment	May turn off unused PSUs to save energy

NTP for AI Clusters

NTP is applied at the Fabric Interconnect level and is common to the FI pair. The NTP policy accepts one to four NTP server addresses. Accurate time synchronization is critical for:

Distributed training coordination -- frameworks like Horovod and NCCL rely on synchronized timing for barrier operations and gradient aggregation
Log correlation -- troubleshooting stalled training jobs across 64+ GPU nodes requires aligned timestamps
Security protocols -- TLS, SSH, and Kerberos depend on acceptable clock skew
Performance benchmarking -- NVIDIA Nsight and DCGM need consistent time references for cross-node comparisons

Best practice: configure at least two NTP servers, preferring internal stratum-1 or stratum-2 sources.

Post-Quiz: Power and NTP Policies

1. An AI chassis has all blades active at maximum load. Which blade gets full power first?

The blade with the most GPUs

The blade with no-cap power priority

The blade that powered on first

All blades share equally regardless of priority

2. Extended Power Capacity on UCS X-Series increases the power budget by what percentage?

10%

15%

25%

3. At which UCS level is NTP configured?

Individual blade BIOS

Fabric Interconnect level

Per-vNIC adapter policy

Storage controller

4. How many NTP servers should be configured as a best practice?

Exactly one for consistency

At least two for redundancy

At least five for accuracy

NTP is not needed if all servers are in the same rack

5. Which power redundancy mode should NEVER be used for AI workloads?

Grid Redundancy

N+1 Redundancy

Non-Redundant

Active-Standby

Section 3: Storage Policies on UCS

Pre-Quiz: Storage Policies

1. What is the recommended boot drive configuration for AI compute nodes on UCS?

Single NVMe drive in RAID0

Two M.2 drives in RAID1

Four SAS drives in RAID5

USB flash drive

2. Why are M.2 boot drives preferred for AI servers?

They are the cheapest storage option

They free up PCIe slots and drive bays for GPUs and NVMe data storage

They provide the highest IOPS for training data

They support RAID5 for better redundancy

3. What is boot-from-SAN?

Booting from a local SAN-attached NVMe drive

Booting an OS from external SAN-based storage rather than a local disk

Using SAN storage as swap space during training

A method to install the OS over the network via PXE

4. What is the default RAID mode of the UCS-M2-HWRAID controller?

RAID1

RAID0

JBOD

RAID5

5. Which FC zoning model is the default recommendation for most deployments?

Single initiator, multiple targets

Multiple initiators, single target

Single initiator, single target

Fabric-wide zoning

Local Disk Policies

The best practice for AI compute nodes is two disks in RAID1 as a boot drive, keeping the OS separate from data storage. M.2 boot drives have become the preferred approach because they free up all PCIe slots and front-panel drive bays for GPU cards and NVMe data storage.

Controller	Model	Supported RAID	Boot Mode	Notes
UCS-M2-HWRAID	SATA M.2 RAID	RAID1 only	UEFI only	Legacy option, widely deployed
UCS-M2-NVRAID	NVMe M.2 RAID	RAID0, RAID1	UEFI only	Higher performance, recommended for new builds

With the OS on M.2 drives, the remaining NVMe slots can be dedicated to high-speed dataset staging, model checkpoint storage, and scratch space. NVMe local storage provides the lowest latency for these operations, critical when training jobs need to load datasets of hundreds of GB to multiple TB quickly.

SAN Connectivity and Boot-from-SAN

Boot from SAN allows servers to boot an OS from external SAN-based storage rather than a local disk. This is central to UCS's stateless computing model. When a service profile migrates, the new server boots from the exact same OS image on the SAN.

Configuring Boot-from-SAN

Open Service Profile / Server Profile storage settings, navigate to vHBAs
Assign a WWNN (static or from pool)
Click Add SAN Boot, specify vHBA name and primary/secondary path
Enter the WWPN of the storage target and the appropriate LUN ID
Configure FC zoning: initiator (vHBA) to target (storage array)

Zoning Model	Description	When to Use
Single Initiator, Single Target	One zone per vHBA-storage port pair; two members per zone	Default for most deployments; clearest troubleshooting
Single Initiator, Multiple Targets	One zone per vHBA containing all its target ports	When zone count may reach or exceed platform limits

graph TD subgraph Server["AI GPU Server"] subgraph Boot["Boot Storage"] M2C["M.2 RAID Controller"] M2A["M.2 Drive A"] --> M2C M2B["M.2 Drive B"] --> M2C M2C -->|RAID1 Mirror| OS["OS Boot Volume (UEFI)"] end subgraph Data["Data Storage (PCIe Slots)"] NV1["NVMe Drive 1"] NV2["NVMe Drive 2"] NV3["NVMe Drive N"] end subgraph SAN["SAN Connectivity"] VHBA0["vHBA0 (Primary Path)"] VHBA1["vHBA1 (Secondary Path)"] end end NV1 -->|"Dataset Staging"| GPU["GPU Training Jobs"] NV2 -->|"Checkpoints"| GPU NV3 -->|"Scratch Space"| GPU VHBA0 -->|FC Fabric A| SA["Storage Array (Boot LUN)"] VHBA1 -->|FC Fabric B| SA SA -.->|"Boot-from-SAN (stateless)"| OS style Boot fill:#e8f4e8 style Data fill:#e8e8f4 style SAN fill:#f4e8e8

Best Practices for AI Storage Configuration

Practice	Rationale
Use M.2 RAID1 for OS boot	Frees PCIe slots for GPUs; provides OS redundancy
Use UEFI boot mode exclusively	Required for M.2 controllers; standard for modern AI servers
Configure boot-from-SAN for stateless nodes	Enables rapid server replacement without OS reinstallation
Dedicate NVMe drives to dataset staging	Minimizes I/O bottleneck during training data loading
Use single-drive RAID0 only when one disk is present	Avoids unnecessary virtual drive creation overhead

Post-Quiz: Storage Policies

1. Why must UCS-M2-HWRAID be explicitly reconfigured for production AI deployments?

It defaults to RAID0, which has no redundancy

It defaults to JBOD mode, which provides no mirroring

It defaults to RAID5, which is too slow

It does not support UEFI boot by default

2. What happens when a service profile with boot-from-SAN migrates to a new physical server?

The OS must be reinstalled on the new server

The new server boots from the same SAN OS image using the migrated identity

The SAN storage is automatically replicated to the new server's local disk

Boot-from-SAN does not support profile migration

3. Which M.2 RAID controller is recommended for new AI server builds?

UCS-M2-HWRAID

UCS-M2-NVRAID

UCS-M2-SATARAID

Any controller works equally well

4. What should NVMe local storage be used for on AI servers?

OS boot volume

Dataset staging, model checkpoints, and scratch space

Backup of the SAN boot LUN

VLAN configuration storage

5. In boot-from-SAN configuration, what identity must be assigned to each vHBA?

IP address and subnet mask

WWNN and WWPN

MAC address and VLAN

UUID and serial number

Section 4: LAN Connectivity and QoS on UCS

Pre-Quiz: LAN Connectivity and QoS

1. What MTU should be configured on vNICs carrying RoCEv2 AI training traffic?

1500

4096

9000

16000

2. Which QoS system class and CoS value are used for RoCEv2 on UCS?

Gold, CoS 4

Platinum, CoS 5, no-drop

Silver, CoS 3, drop

Best Effort, CoS 0

3. What mechanism prevents packet drops for RoCEv2 traffic?

TCP retransmission

Priority Flow Control (PFC)

Link aggregation

VLAN trunking

4. RoCEv2 on UCS cannot coexist with which feature on the same vNIC?

VLAN tagging

NVGRE, NetFlow, or VMQ

Jumbo frames

RSS (Receive Side Scaling)

5. Why must QoS configuration be consistent across UCS and upstream Nexus switches?

Different vendors require different CoS values

A PFC mismatch at any point causes RDMA packet drops

Nexus switches do not support no-drop classes

UCS cannot communicate with Nexus without identical firmware

vNIC Configuration

The LAN Connectivity Policy defines how vNICs connect to the network. For AI workloads, vNIC configuration is where network performance is won or lost.

Parameter	Description	AI-Optimized Setting
VLAN Assignment	Native and allowed VLANs	Dedicated VLANs for AI training, storage, management
MAC Address	Static or from pool	Pool-based for template-driven provisioning
MTU	Maximum Transmission Unit	9000 (jumbo frames) for RDMA/RoCE
Failover	Active/standby behavior	Enabled for resiliency
Adapter Policy	Determines vNIC behavior	RoCE-enabled policy
QoS Policy	Assigns system class to traffic	No-drop class for RDMA interfaces
Network Control Policy	CDP, LLDP, MAC settings	LLDP enabled for DCBX negotiation

LAN Connectivity and VLAN Design

VLAN Purpose	Traffic Type	MTU	QoS Class
AI Training / GPU-to-GPU	RoCEv2 RDMA	9000	Platinum (no-drop, CoS 5)
Storage (NVMeoF/iSCSI)	Storage I/O	9000	Gold or Platinum
Management	IPMI, SSH, Intersight	1500	Best Effort
Provisioning / PXE	OS deployment	1500	Best Effort

QoS System Classes for AI Traffic

Cisco UCS Manager supports multiple QoS system classes configured at LAN > LAN Cloud > QoS System Class. These map to CoS values and determine how the Fabric Interconnect prioritizes and queues traffic. Enabling RoCE requires configuring Platinum with CoS 5 as no-drop, which triggers Priority Flow Control (PFC). A single dropped RDMA packet forces an expensive transport-layer retransmission, destroying RDMA's latency advantage.

flowchart LR subgraph Server["GPU Server"] VNIC["vNIC
(RoCEv2 Enabled)"] AP["Adapter Policy
Queue Pairs, RSS,
Interrupt Coalescing"] QP["QoS Policy
(Platinum)"] end subgraph FI["Fabric Interconnect"] SC["QoS System Class
Platinum = CoS 5
No-Drop"] PFC1["PFC Enabled
on CoS 5"] end subgraph Nexus["Upstream Nexus 9000"] PFC2["PFC Enabled
on CoS 5"] ECN["ECN Configured
for RDMA Class"] end subgraph Dest["Destination GPU Server"] VNIC2["vNIC
(RoCEv2 Enabled)"] end AP --> VNIC QP --> VNIC VNIC -->|"CoS 5 Tagged
MTU 9000"| SC SC --> PFC1 PFC1 -->|"Lossless Path"| PFC2 PFC2 --> ECN ECN -->|"Lossless Path"| VNIC2 style VNIC fill:#4a90d9,color:#fff style VNIC2 fill:#4a90d9,color:#fff style SC fill:#d94a4a,color:#fff style PFC1 fill:#d94a4a,color:#fff style PFC2 fill:#d94a4a,color:#fff style ECN fill:#d94a4a,color:#fff

QoS System Class	CoS Value	Drop Policy	Typical Use
Platinum	5	No-Drop	RoCEv2 / RDMA for AI training
Gold	4	Drop	Storage traffic (FC, iSCSI)
Silver	2	Drop	Standard application traffic
Bronze	1	Drop	Background / bulk transfers
Best Effort	0	Drop	Management, default traffic

Adapter-Level QoS for RoCE

Beyond system-class configuration, the adapter policy on each vNIC must be tuned for RoCEv2. Cisco provides predefined adapter policies, though custom user-defined policies are recommended for Linux RDMA AI training workloads.

Adapter Policy	RoCE Mode	Use Case
Win-HPN-SMBd	RoCEv2 Mode 1	Windows HPN with SMB Direct
MQ-SMBd	RoCEv2 Mode 2	Multi-queue SMB Direct
Custom (user-defined)	Configurable	Linux RDMA for AI training (recommended)

RoCEv2 constraints: Cannot coexist with NVGRE, NetFlow, or VMQ on the same vNIC. Requires VIC 1400 or VIC 15000 series adapters (M5+ servers). Supports up to 2 RoCEv2-enabled vNICs per adapter and 4 virtual ports per adapter interface. Queue pairs: minimum 4, maximum up to 8192 (platform-dependent).

Tuning Parameter	AI Recommendation
Interrupt Coalescing	Static coalescing with tuned intervals for sustained high throughput
Adaptive Interrupt Coalescing	Disable for AI workloads at >80% link utilization
Receive Side Scaling (RSS)	Enable on all vNICs for high-throughput data pipelines
TX/RX Queue Count	Maximize to enable parallel packet processing across CPU cores

RoCEv2 Configuration Workflow

flowchart TD S1["Step 1: Enable No-Drop System Class
LAN > LAN Cloud > QoS System Class
Platinum, CoS 5, No-Drop"] S2["Step 2: Create QoS Policy
LAN > Policies > QoS Policies
AI-RoCE-QoS - Platinum"] S3["Step 3: Create Adapter Policy
Enable RoCEv2, set queue pairs,
enable RSS, max TX/RX queues"] S4["Step 4: Create LAN Connectivity Policy
Add RDMA vNIC: AI VLAN,
MTU 9000, attach QoS + adapter policy"] S5["Step 5: Verify Upstream Switches
Nexus 9000: PFC on CoS 5,
ECN for same traffic class"] S6["Step 6: Attach to Server Profile Template
Reference LAN Connectivity Policy
in AI-GPU-Node-Template"] S1 --> S2 --> S3 --> S4 --> S5 --> S6 S1 -.->|"System-level config"| FI["Fabric Interconnect"] S4 -.->|"Per-server config"| SP["Server Profile"] S5 -.->|"Network-level config"| NX["Nexus 9000"] style S1 fill:#d94a4a,color:#fff style S2 fill:#d97a4a,color:#fff style S3 fill:#d9b34a,color:#fff style S4 fill:#7bc47f,color:#fff style S5 fill:#4a90d9,color:#fff style S6 fill:#7a4ad9,color:#fff

Post-Quiz: LAN Connectivity and QoS

1. What is the first step in configuring RoCEv2 on UCS Manager?

Create the LAN Connectivity Policy

Enable the Platinum no-drop system class with CoS 5

Configure the adapter policy with queue pairs

Verify upstream Nexus switch PFC settings

2. What happens if PFC is configured on UCS but NOT on the upstream Nexus switches?

Traffic falls back to TCP automatically

RDMA packets will be dropped at the mismatch point

The Nexus switches auto-negotiate PFC via DCBX

Only management traffic is affected

3. Why should Adaptive Interrupt Coalescing be disabled for AI workloads at high utilization?

It consumes too much CPU

It provides no latency benefit when link utilization exceeds 80%

It conflicts with RoCEv2

It causes packet drops

4. How many RoCEv2-enabled vNICs can be configured per VIC adapter?

5. Which adapter policy type is recommended for Linux RDMA AI training on UCS?

Win-HPN-SMBd

MQ-SMBd

Custom (user-defined)

Default adapter policy

Chapter 11: Cisco UCS Configuration for AI Workloads

Learning Objectives

Section 1: UCS Domain Profiles and Service Profiles

Key Points

Domain Profile Architecture

Service Profile and Server Profile Design

Template-Based Provisioning for AI Clusters

Section 2: Power and NTP Policies

Key Points

Power Policy for GPU Systems

Power Capping and Dynamic Rebalancing

NTP for AI Clusters

Section 3: Storage Policies on UCS

Key Points

Local Disk Policies

SAN Connectivity and Boot-from-SAN

Configuring Boot-from-SAN

Best Practices for AI Storage Configuration

Section 4: LAN Connectivity and QoS on UCS

Key Points

vNIC Configuration

LAN Connectivity and VLAN Design

QoS System Classes for AI Traffic

Adapter-Level QoS for RoCE

RoCEv2 Configuration Workflow

Your Progress

Answer Explanations