Chapter 11: Cisco UCS Configuration for AI Workloads

Learning Objectives

Section 1: UCS Domain Profiles and Service Profiles

Pre-Quiz: Domain Profiles and Service Profiles

1. What is a UCS Domain Profile in Intersight?

A policy that configures a single server's BIOS settings
A top-level construct that configures a Fabric Interconnect pair
A template for creating VLANs across multiple switches
A storage configuration for boot-from-SAN

2. What does "stateless computing" mean in UCS?

Servers do not retain any data after power-off
Server identity is abstracted from physical hardware and can migrate between servers
The server runs without an operating system
Fabric Interconnects operate without configuration

3. Which four policy categories are used in a Server Profile in Intersight Managed Mode?

Compute, Network, Storage, Management
BIOS, Boot, Power, Thermal
LAN, SAN, VLAN, VSAN
Domain, Server, Adapter, QoS

4. Why is template-based provisioning essential for AI clusters?

It reduces the number of VLANs needed
It ensures consistent configuration across all GPU nodes and enables rapid replacement
It eliminates the need for power policies
It automatically enables RoCE on all vNICs

5. When a VLAN policy referenced by multiple domain profiles is updated, what happens?

Only the first domain profile receives the update
All domain profiles referencing it must be manually redeployed
Every domain profile referencing that policy inherits the change automatically
The update is queued until the next maintenance window

Key Points

Domain Profile Architecture

A UCS Domain Profile is the top-level configuration construct in Cisco Intersight that represents and configures a pair of Fabric Interconnects (FIs). It encapsulates all the policies that define FI behavior: port configurations, port channels, VLANs, VSANs, and network control settings. A single domain policy (such as a VLAN policy) can be assigned to any number of domain profiles -- updating the policy once propagates changes to all referencing profiles automatically.

graph TD DPT["Domain Profile Template"] --> DP1["Domain Profile 1
(AI Cluster A)"] DPT --> DP2["Domain Profile 2
(AI Cluster B)"] DPT --> DP3["Domain Profile 3
(AI Cluster C)"] DP1 --> FI1["Fabric Interconnect Pair A"] DP2 --> FI2["Fabric Interconnect Pair B"] DP3 --> FI3["Fabric Interconnect Pair C"] PP["Port Policy"] -.->|referenced by| DP1 PP -.->|referenced by| DP2 PP -.->|referenced by| DP3 VP["VLAN Policy"] -.->|referenced by| DP1 VP -.->|referenced by| DP2 VP -.->|referenced by| DP3 VSANP["VSAN Policy"] -.->|referenced by| DP1 NCP["Network Control Policy"] -.->|referenced by| DP1 NTPP["NTP Policy"] -.->|referenced by| DP1 QOSP["QoS System Class"] -.->|referenced by| DP1 style DPT fill:#4a90d9,color:#fff style PP fill:#f5a623,color:#fff style VP fill:#f5a623,color:#fff style VSANP fill:#f5a623,color:#fff style NCP fill:#f5a623,color:#fff style NTPP fill:#f5a623,color:#fff style QOSP fill:#f5a623,color:#fff
Domain Profile ComponentPurposeAI Workload Relevance
Port PolicyDefines server, uplink, and FCoE port rolesEnsures sufficient 100G/200G uplinks for GPU traffic
VLAN PolicyConfigures L2 broadcast domainsSegregates AI training, storage, and management traffic
VSAN PolicyConfigures Fibre Channel domainsEnables boot-from-SAN for stateless AI nodes
Network Control PolicyCDP, LLDP, MAC settingsRequired for proper DCBX negotiation with upstream switches
NTP PolicyTime synchronizationCritical for distributed training coordination
QoS System ClassTraffic prioritizationEnables no-drop classes for RoCE/RDMA

Service Profile and Server Profile Design

Cisco UCS implements stateless computing through service profiles (UCS Manager) and server profiles (Intersight Managed Mode). A service profile abstracts the complete server identity -- UUID, MAC addresses, WWNN, WWPN, boot policy, firmware level, and BIOS settings -- from the physical hardware. When migrated to another server, the entire identity moves with it.

Policy CategoryIncluded PoliciesAI Configuration Focus
ComputeBIOS, Boot Order, Firmware, Power, Thermal, Persistent MemoryGPU-optimized BIOS settings, UEFI boot, power no-cap
NetworkLAN Connectivity, SAN Connectivity, Adapter PoliciesvNIC configuration, RoCE enablement, jumbo MTU
StorageLocal disk, SAN storage, Boot-from-SANM.2 RAID1 boot, NVMe data drives, SAN boot targets
ManagementIPMI, Serial over LAN, SNMP, SyslogMonitoring, out-of-band access, log collection

Template-Based Provisioning for AI Clusters

For AI clusters where tens or hundreds of identically configured GPU nodes are required, template-based provisioning is essential. Server Profile Templates in Intersight (or Service Profile Templates in UCS Manager) let you define a golden configuration once and derive individual profiles from it. Any modification to the template automatically syncs to all derived profiles.

flowchart TD SPT["Server Profile Template
(AI-GPU-Node-Template)"] subgraph Policies["Attached Policies"] CP["Compute Policies
BIOS, Boot Order,
Power (no-cap)"] NP["Network Policies
LAN Connectivity,
SAN Connectivity"] STP["Storage Policies
M.2 RAID1 Boot"] MP["Management Policies
SNMP, Syslog"] end CP --> SPT NP --> SPT STP --> SPT MP --> SPT SPT -->|derive| SP1["Server Profile 1"] SPT -->|derive| SP2["Server Profile 2"] SPT -->|derive| SP3["Server Profile 3"] SPT -->|derive| SPN["Server Profile N"] SP1 -->|associate| S1["GPU Server 1
UUID, MAC, WWPN
from pools"] SP2 -->|associate| S2["GPU Server 2"] SP3 -->|associate| S3["GPU Server 3"] SPN -->|associate| SN["GPU Server N"] SPT -.->|"template update
syncs to all"| SP1 SPT -.->|syncs| SP2 SPT -.->|syncs| SP3 style SPT fill:#4a90d9,color:#fff style SP1 fill:#7bc47f,color:#fff style SP2 fill:#7bc47f,color:#fff style SP3 fill:#7bc47f,color:#fff style SPN fill:#7bc47f,color:#fff
Worked Example: Creating a GPU Node Server Profile Template in Intersight -- (1) Create template with name AI-GPU-Node-Template, (2) attach Compute policies (GPU-optimized BIOS, UEFI boot, no-cap power), (3) attach Network policies (two RoCE-enabled vNICs, MTU 9000, no-drop QoS), (4) attach Storage (M.2 RAID1 boot), (5) attach Management (SNMP, Syslog), (6) derive individual server profiles and associate each to a physical server.
Animation: Drag-and-drop domain profile assembly -- attach policies (Port, VLAN, VSAN, NTP, QoS) to a domain profile template, then derive multiple domain profiles for AI clusters.
Post-Quiz: Domain Profiles and Service Profiles

1. A domain profile template is updated to add a new VLAN. What happens to the three domain profiles derived from it?

Nothing -- derived profiles are snapshots at creation time
All three automatically inherit the new VLAN
Only the most recently deployed profile inherits it
The update is rejected because derived profiles are locked

2. Which construct provides stateless computing in UCS Manager?

Domain profile
Service profile
VLAN policy
Power policy

3. In Intersight Managed Mode, which policy category includes BIOS and boot order?

Network
Storage
Compute
Management

4. Where do derived server profiles get unique MAC addresses and WWPNs?

They are manually assigned by the administrator
From identity pools referenced by the template
From the physical server's hardware ROM
From the Fabric Interconnect's MAC table

5. A domain profile configures which level of UCS infrastructure?

Individual server blades
The Fabric Interconnect pair
GPU adapter cards
Storage arrays

Section 2: Power and NTP Policies

Pre-Quiz: Power and NTP Policies

1. How much power can a single NVIDIA H100 GPU draw?

150W
350W
700W
1200W

2. Which power redundancy mode is recommended for AI deployments?

Non-Redundant
N+1 Redundancy
Grid Redundancy
Active-Standby

3. What does "no-cap" power priority mean in UCS?

The server has unlimited power from the grid
The blade is prioritized over others during dynamic power rebalancing
Power capping is disabled for the entire chassis
The PSUs run at maximum output at all times

4. Why is NTP critical for AI training clusters?

It controls GPU clock speeds
Distributed training frameworks rely on synchronized timing for barrier operations
It determines the training batch size
It is required to boot the operating system

5. What does Extended Power Capacity provide on UCS X-Series?

Doubles the number of available PSU slots
Increases total power allocation by 15%
Enables hot-swap of GPU modules
Adds battery backup for uninterruptible operation

Key Points

Power Policy for GPU Systems

GPU-accelerated AI servers are among the most power-hungry systems in a data center. A single NVIDIA H100 GPU draws up to 700W, and a server with eight GPUs can easily exceed 6,000W total system power. Cisco UCS power policies must ensure GPU nodes receive adequate power under all conditions.

Redundancy ModeDescriptionPSU Behavior on FailureAI Recommendation
Grid RedundancyTwo independent power sourcesSurviving PSUs on alternate circuit continueRecommended for all AI deployments
N+1 RedundancyOne extra PSU beyond minimumRemaining PSUs share loadAcceptable for non-critical AI dev
Non-RedundantAll PSUs active, no redundancySingle PSU failure may cause outageNever use for AI workloads

Power Capping and Dynamic Rebalancing

UCS uses power control policies to manage how power is allocated and borrowed among blades within a chassis. During normal operation, active blades can borrow power from idle blades. When all blades are active and at their power cap, the priority determines which blades get preference. For AI workloads, use no-cap or high priority.

stateDiagram-v2 [*] --> InitialAllocation: Server powers on InitialAllocation: Initial Power Allocation InitialAllocation --> NormalOperation: Power budget assigned NormalOperation: Normal Operation NormalOperation --> BorrowingPower: Blade needs more power BorrowingPower: Borrowing from Idle Blades BorrowingPower --> NormalOperation: Load decreases NormalOperation --> Contention: All blades active at cap Contention: Power Contention Contention --> Throttled: Low-priority blade Contention --> FullPower: No-cap / High-priority blade Throttled: GPU Throttled FullPower: Full Power Maintained FullPower --> NormalOperation: Contention resolves Throttled --> NormalOperation: Contention resolves
Power FeatureDefault SettingAI-Optimized SettingImpact
Redundancy ModeGridGridProtects against full circuit loss
Power Control PriorityMediumNo-CapPrevents GPU throttling under load
Extended Power CapacityDisabledEnabled+15% power budget for GPU headroom
Power Save ModeEnabledEvaluate per deploymentMay turn off unused PSUs to save energy

NTP for AI Clusters

NTP is applied at the Fabric Interconnect level and is common to the FI pair. The NTP policy accepts one to four NTP server addresses. Accurate time synchronization is critical for:

Best practice: configure at least two NTP servers, preferring internal stratum-1 or stratum-2 sources.

Animation: Power contention simulation -- show 4 blades in a chassis competing for power, with priority-based allocation and GPU throttling visualization when low-priority blades lose budget.
Post-Quiz: Power and NTP Policies

1. An AI chassis has all blades active at maximum load. Which blade gets full power first?

The blade with the most GPUs
The blade with no-cap power priority
The blade that powered on first
All blades share equally regardless of priority

2. Extended Power Capacity on UCS X-Series increases the power budget by what percentage?

5%
10%
15%
25%

3. At which UCS level is NTP configured?

Individual blade BIOS
Fabric Interconnect level
Per-vNIC adapter policy
Storage controller

4. How many NTP servers should be configured as a best practice?

Exactly one for consistency
At least two for redundancy
At least five for accuracy
NTP is not needed if all servers are in the same rack

5. Which power redundancy mode should NEVER be used for AI workloads?

Grid Redundancy
N+1 Redundancy
Non-Redundant
Active-Standby

Section 3: Storage Policies on UCS

Pre-Quiz: Storage Policies

1. What is the recommended boot drive configuration for AI compute nodes on UCS?

Single NVMe drive in RAID0
Two M.2 drives in RAID1
Four SAS drives in RAID5
USB flash drive

2. Why are M.2 boot drives preferred for AI servers?

They are the cheapest storage option
They free up PCIe slots and drive bays for GPUs and NVMe data storage
They provide the highest IOPS for training data
They support RAID5 for better redundancy

3. What is boot-from-SAN?

Booting from a local SAN-attached NVMe drive
Booting an OS from external SAN-based storage rather than a local disk
Using SAN storage as swap space during training
A method to install the OS over the network via PXE

4. What is the default RAID mode of the UCS-M2-HWRAID controller?

RAID1
RAID0
JBOD
RAID5

5. Which FC zoning model is the default recommendation for most deployments?

Single initiator, multiple targets
Multiple initiators, single target
Single initiator, single target
Fabric-wide zoning

Key Points

Local Disk Policies

The best practice for AI compute nodes is two disks in RAID1 as a boot drive, keeping the OS separate from data storage. M.2 boot drives have become the preferred approach because they free up all PCIe slots and front-panel drive bays for GPU cards and NVMe data storage.

ControllerModelSupported RAIDBoot ModeNotes
UCS-M2-HWRAIDSATA M.2 RAIDRAID1 onlyUEFI onlyLegacy option, widely deployed
UCS-M2-NVRAIDNVMe M.2 RAIDRAID0, RAID1UEFI onlyHigher performance, recommended for new builds

With the OS on M.2 drives, the remaining NVMe slots can be dedicated to high-speed dataset staging, model checkpoint storage, and scratch space. NVMe local storage provides the lowest latency for these operations, critical when training jobs need to load datasets of hundreds of GB to multiple TB quickly.

SAN Connectivity and Boot-from-SAN

Boot from SAN allows servers to boot an OS from external SAN-based storage rather than a local disk. This is central to UCS's stateless computing model. When a service profile migrates, the new server boots from the exact same OS image on the SAN.

Configuring Boot-from-SAN

  1. Open Service Profile / Server Profile storage settings, navigate to vHBAs
  2. Assign a WWNN (static or from pool)
  3. Click Add SAN Boot, specify vHBA name and primary/secondary path
  4. Enter the WWPN of the storage target and the appropriate LUN ID
  5. Configure FC zoning: initiator (vHBA) to target (storage array)
Zoning ModelDescriptionWhen to Use
Single Initiator, Single TargetOne zone per vHBA-storage port pair; two members per zoneDefault for most deployments; clearest troubleshooting
Single Initiator, Multiple TargetsOne zone per vHBA containing all its target portsWhen zone count may reach or exceed platform limits
graph TD subgraph Server["AI GPU Server"] subgraph Boot["Boot Storage"] M2C["M.2 RAID Controller"] M2A["M.2 Drive A"] --> M2C M2B["M.2 Drive B"] --> M2C M2C -->|RAID1 Mirror| OS["OS Boot Volume (UEFI)"] end subgraph Data["Data Storage (PCIe Slots)"] NV1["NVMe Drive 1"] NV2["NVMe Drive 2"] NV3["NVMe Drive N"] end subgraph SAN["SAN Connectivity"] VHBA0["vHBA0 (Primary Path)"] VHBA1["vHBA1 (Secondary Path)"] end end NV1 -->|"Dataset Staging"| GPU["GPU Training Jobs"] NV2 -->|"Checkpoints"| GPU NV3 -->|"Scratch Space"| GPU VHBA0 -->|FC Fabric A| SA["Storage Array (Boot LUN)"] VHBA1 -->|FC Fabric B| SA SA -.->|"Boot-from-SAN (stateless)"| OS style Boot fill:#e8f4e8 style Data fill:#e8e8f4 style SAN fill:#f4e8e8

Best Practices for AI Storage Configuration

PracticeRationale
Use M.2 RAID1 for OS bootFrees PCIe slots for GPUs; provides OS redundancy
Use UEFI boot mode exclusivelyRequired for M.2 controllers; standard for modern AI servers
Configure boot-from-SAN for stateless nodesEnables rapid server replacement without OS reinstallation
Dedicate NVMe drives to dataset stagingMinimizes I/O bottleneck during training data loading
Use single-drive RAID0 only when one disk is presentAvoids unnecessary virtual drive creation overhead
Animation: AI server storage architecture walkthrough -- show M.2 RAID1 boot path, NVMe data path to GPUs, and SAN boot failover between dual vHBAs across two FC fabrics.
Post-Quiz: Storage Policies

1. Why must UCS-M2-HWRAID be explicitly reconfigured for production AI deployments?

It defaults to RAID0, which has no redundancy
It defaults to JBOD mode, which provides no mirroring
It defaults to RAID5, which is too slow
It does not support UEFI boot by default

2. What happens when a service profile with boot-from-SAN migrates to a new physical server?

The OS must be reinstalled on the new server
The new server boots from the same SAN OS image using the migrated identity
The SAN storage is automatically replicated to the new server's local disk
Boot-from-SAN does not support profile migration

3. Which M.2 RAID controller is recommended for new AI server builds?

UCS-M2-HWRAID
UCS-M2-NVRAID
UCS-M2-SATARAID
Any controller works equally well

4. What should NVMe local storage be used for on AI servers?

OS boot volume
Dataset staging, model checkpoints, and scratch space
Backup of the SAN boot LUN
VLAN configuration storage

5. In boot-from-SAN configuration, what identity must be assigned to each vHBA?

IP address and subnet mask
WWNN and WWPN
MAC address and VLAN
UUID and serial number

Section 4: LAN Connectivity and QoS on UCS

Pre-Quiz: LAN Connectivity and QoS

1. What MTU should be configured on vNICs carrying RoCEv2 AI training traffic?

1500
4096
9000
16000

2. Which QoS system class and CoS value are used for RoCEv2 on UCS?

Gold, CoS 4
Platinum, CoS 5, no-drop
Silver, CoS 3, drop
Best Effort, CoS 0

3. What mechanism prevents packet drops for RoCEv2 traffic?

TCP retransmission
Priority Flow Control (PFC)
Link aggregation
VLAN trunking

4. RoCEv2 on UCS cannot coexist with which feature on the same vNIC?

VLAN tagging
NVGRE, NetFlow, or VMQ
Jumbo frames
RSS (Receive Side Scaling)

5. Why must QoS configuration be consistent across UCS and upstream Nexus switches?

Different vendors require different CoS values
A PFC mismatch at any point causes RDMA packet drops
Nexus switches do not support no-drop classes
UCS cannot communicate with Nexus without identical firmware

Key Points

vNIC Configuration

The LAN Connectivity Policy defines how vNICs connect to the network. For AI workloads, vNIC configuration is where network performance is won or lost.

ParameterDescriptionAI-Optimized Setting
VLAN AssignmentNative and allowed VLANsDedicated VLANs for AI training, storage, management
MAC AddressStatic or from poolPool-based for template-driven provisioning
MTUMaximum Transmission Unit9000 (jumbo frames) for RDMA/RoCE
FailoverActive/standby behaviorEnabled for resiliency
Adapter PolicyDetermines vNIC behaviorRoCE-enabled policy
QoS PolicyAssigns system class to trafficNo-drop class for RDMA interfaces
Network Control PolicyCDP, LLDP, MAC settingsLLDP enabled for DCBX negotiation

LAN Connectivity and VLAN Design

VLAN PurposeTraffic TypeMTUQoS Class
AI Training / GPU-to-GPURoCEv2 RDMA9000Platinum (no-drop, CoS 5)
Storage (NVMeoF/iSCSI)Storage I/O9000Gold or Platinum
ManagementIPMI, SSH, Intersight1500Best Effort
Provisioning / PXEOS deployment1500Best Effort

QoS System Classes for AI Traffic

Cisco UCS Manager supports multiple QoS system classes configured at LAN > LAN Cloud > QoS System Class. These map to CoS values and determine how the Fabric Interconnect prioritizes and queues traffic. Enabling RoCE requires configuring Platinum with CoS 5 as no-drop, which triggers Priority Flow Control (PFC). A single dropped RDMA packet forces an expensive transport-layer retransmission, destroying RDMA's latency advantage.

flowchart LR subgraph Server["GPU Server"] VNIC["vNIC
(RoCEv2 Enabled)"] AP["Adapter Policy
Queue Pairs, RSS,
Interrupt Coalescing"] QP["QoS Policy
(Platinum)"] end subgraph FI["Fabric Interconnect"] SC["QoS System Class
Platinum = CoS 5
No-Drop"] PFC1["PFC Enabled
on CoS 5"] end subgraph Nexus["Upstream Nexus 9000"] PFC2["PFC Enabled
on CoS 5"] ECN["ECN Configured
for RDMA Class"] end subgraph Dest["Destination GPU Server"] VNIC2["vNIC
(RoCEv2 Enabled)"] end AP --> VNIC QP --> VNIC VNIC -->|"CoS 5 Tagged
MTU 9000"| SC SC --> PFC1 PFC1 -->|"Lossless Path"| PFC2 PFC2 --> ECN ECN -->|"Lossless Path"| VNIC2 style VNIC fill:#4a90d9,color:#fff style VNIC2 fill:#4a90d9,color:#fff style SC fill:#d94a4a,color:#fff style PFC1 fill:#d94a4a,color:#fff style PFC2 fill:#d94a4a,color:#fff style ECN fill:#d94a4a,color:#fff
QoS System ClassCoS ValueDrop PolicyTypical Use
Platinum5No-DropRoCEv2 / RDMA for AI training
Gold4DropStorage traffic (FC, iSCSI)
Silver2DropStandard application traffic
Bronze1DropBackground / bulk transfers
Best Effort0DropManagement, default traffic

Adapter-Level QoS for RoCE

Beyond system-class configuration, the adapter policy on each vNIC must be tuned for RoCEv2. Cisco provides predefined adapter policies, though custom user-defined policies are recommended for Linux RDMA AI training workloads.

Adapter PolicyRoCE ModeUse Case
Win-HPN-SMBdRoCEv2 Mode 1Windows HPN with SMB Direct
MQ-SMBdRoCEv2 Mode 2Multi-queue SMB Direct
Custom (user-defined)ConfigurableLinux RDMA for AI training (recommended)

RoCEv2 constraints: Cannot coexist with NVGRE, NetFlow, or VMQ on the same vNIC. Requires VIC 1400 or VIC 15000 series adapters (M5+ servers). Supports up to 2 RoCEv2-enabled vNICs per adapter and 4 virtual ports per adapter interface. Queue pairs: minimum 4, maximum up to 8192 (platform-dependent).

Tuning ParameterAI Recommendation
Interrupt CoalescingStatic coalescing with tuned intervals for sustained high throughput
Adaptive Interrupt CoalescingDisable for AI workloads at >80% link utilization
Receive Side Scaling (RSS)Enable on all vNICs for high-throughput data pipelines
TX/RX Queue CountMaximize to enable parallel packet processing across CPU cores

RoCEv2 Configuration Workflow

flowchart TD S1["Step 1: Enable No-Drop System Class
LAN > LAN Cloud > QoS System Class
Platinum, CoS 5, No-Drop"] S2["Step 2: Create QoS Policy
LAN > Policies > QoS Policies
AI-RoCE-QoS - Platinum"] S3["Step 3: Create Adapter Policy
Enable RoCEv2, set queue pairs,
enable RSS, max TX/RX queues"] S4["Step 4: Create LAN Connectivity Policy
Add RDMA vNIC: AI VLAN,
MTU 9000, attach QoS + adapter policy"] S5["Step 5: Verify Upstream Switches
Nexus 9000: PFC on CoS 5,
ECN for same traffic class"] S6["Step 6: Attach to Server Profile Template
Reference LAN Connectivity Policy
in AI-GPU-Node-Template"] S1 --> S2 --> S3 --> S4 --> S5 --> S6 S1 -.->|"System-level config"| FI["Fabric Interconnect"] S4 -.->|"Per-server config"| SP["Server Profile"] S5 -.->|"Network-level config"| NX["Nexus 9000"] style S1 fill:#d94a4a,color:#fff style S2 fill:#d97a4a,color:#fff style S3 fill:#d9b34a,color:#fff style S4 fill:#7bc47f,color:#fff style S5 fill:#4a90d9,color:#fff style S6 fill:#7a4ad9,color:#fff
Animation: End-to-end RoCEv2 packet flow -- trace a tagged CoS 5 packet from GPU server vNIC through the Fabric Interconnect (PFC queuing), across uplink to Nexus 9000 (PFC + ECN), and into the destination GPU server vNIC. Highlight lossless behavior at each hop.
Post-Quiz: LAN Connectivity and QoS

1. What is the first step in configuring RoCEv2 on UCS Manager?

Create the LAN Connectivity Policy
Enable the Platinum no-drop system class with CoS 5
Configure the adapter policy with queue pairs
Verify upstream Nexus switch PFC settings

2. What happens if PFC is configured on UCS but NOT on the upstream Nexus switches?

Traffic falls back to TCP automatically
RDMA packets will be dropped at the mismatch point
The Nexus switches auto-negotiate PFC via DCBX
Only management traffic is affected

3. Why should Adaptive Interrupt Coalescing be disabled for AI workloads at high utilization?

It consumes too much CPU
It provides no latency benefit when link utilization exceeds 80%
It conflicts with RoCEv2
It causes packet drops

4. How many RoCEv2-enabled vNICs can be configured per VIC adapter?

1
2
4
8

5. Which adapter policy type is recommended for Linux RDMA AI training on UCS?

Win-HPN-SMBd
MQ-SMBd
Custom (user-defined)
Default adapter policy

Your Progress

Answer Explanations