Submissions fall into three categories: Available, Preview, and Research/Development/Internal (RDI)
Why Benchmarking Matters
AI workloads stress every infrastructure layer -- compute, networking, storage, and memory. A single poorly performing component can bottleneck an entire distributed training job. Benchmarking establishes quantitative baselines that answer: Can this platform sustain the throughput needed? Where does performance degrade under scale? How does our infrastructure compare to industry peers?
MLPerf Benchmark Suite
graph TD
A["MLPerf Benchmark Suite (MLCommons)"] --> B["MLPerf Training"]
A --> C["MLPerf Inference"]
A --> D["MLPerf Storage"]
B --> B1["Key Metric: Time-to-Train"]
B --> B2["Workloads: LLMs, Text-to-Image, Recommenders, GNNs"]
C --> C1["Key Metrics: Throughput & Latency"]
C --> C2["Scenarios: Single-Stream, Multistream, Server, Offline"]
D --> D1["Key Metric: Storage Throughput (GB/s)"]
D --> D2["Focus: Data Supply Rate to Accelerators"]
style A fill:#1a5276,color:#fff
style B fill:#2e86c1,color:#fff
style C fill:#2e86c1,color:#fff
style D fill:#2e86c1,color:#fff
MLPerf Inference Scenarios
Scenario
Emulates
Primary Metric
Single-stream
Mobile device workloads
Latency per query
Multistream
Autonomous vehicle workloads
Latency across concurrent streams
Server
Cloud-based setups
Throughput and latency (p99)
Offline
Batch processing
Maximum throughput
Key Performance Metrics
Metric
Unit
Significance
Throughput
Queries/sec or tokens/sec
Inference requests or training samples processed per unit time
Latency
Milliseconds or seconds
Time from request to result; server scenario enforces p99
Time-to-train
Minutes or hours
Wall-clock time to reach target model accuracy
Accuracy
Model-specific (mAP, BLEU)
Output quality -- ensures systems don't trade quality for speed
Storage throughput
GB/s
Rate storage delivers training data to accelerators
Cisco UCS and MLPerf
Platform
GPU Configuration
Benchmark
Cisco UCS C845A M8
8x NVIDIA H200 NVL, 8x NVIDIA L40S PCIe
MLPerf Inference 5.1: Datacenter
Cisco UCS C885A M8 HGX
8x NVIDIA H100, 8x NVIDIA H200
MLPerf Inference and Training
Cisco UCS X210c M8
Intel Xeon 6 processors
MLPerf Inference v5.1 Datacenter
Cisco UCS C240 M8
Intel Xeon 6 processors
MLPerf Inference Datacenter
Interpreting Results and Identifying Bottlenecks
Compare within scenarios -- a system excelling in offline throughput may underperform in the server scenario where latency constraints apply
Check scaling efficiency -- if doubling GPUs doesn't nearly double throughput, investigate network bandwidth, PCIe saturation, or NVLink topology
Examine storage results separately -- high GPU throughput is meaningless if storage starves the pipeline
Consider the full stack -- MLPerf results reflect specific software configurations; ensure your production stack is comparable
Key Takeaway: MLPerf provides standardized, reproducible benchmarks across Training, Inference, and Storage that enable objective comparison of AI platforms. Cisco UCS systems demonstrate near-linear multi-server scaling when paired with high-performance network fabrics.
Animation Slot: Interactive MLPerf scenario selector -- choose a scenario and see the metric, deployment pattern, and example workload
Post-Quiz: AI Infrastructure Benchmarking
1. Which MLPerf Inference scenario enforces p99 latency constraints and measures throughput?
A) Single-stream
B) Multistream
C) Server
D) Offline
2. If doubling GPUs does not nearly double throughput, which of these is NOT a likely bottleneck?
A) Network bandwidth
B) PCIe lane saturation
C) Model accuracy threshold
D) NVLink topology
3. What did Cisco demonstrate with the UCS C885A M8 in MLPerf submissions?
A) Lowest power consumption per GPU
B) Near-linear scaling for multi-server, multi-GPU inference
C) Highest single-GPU training speed
D) Best accuracy on text-to-image workloads
4. Why is MLPerf Storage important even when GPU throughput is high?
A) It measures GPU memory allocation efficiency
B) If storage can't feed GPUs fast enough, expensive accelerators sit idle
C) It replaces the need for network benchmarks
D) It only applies to inference workloads
5. An MLPerf "Available" submission means the system contains:
A) Experimental hardware not yet released
B) Only components available for purchase or cloud rental
C) Internal development hardware
D) Components that will be available next quarter
Section 2: Monitoring with Cisco Solutions
Pre-Quiz: Monitoring with Cisco Solutions
1. What is the primary role of Cisco Nexus Dashboard?
A) GPU driver management
B) Unified management console for monitoring, troubleshooting, and automating data center operations
C) Storage provisioning only
D) Cloud application deployment
2. What technology does Nexus Dashboard Insights use to establish performance baselines?
A) Static threshold rules only
B) AI/ML-powered dynamic baselining
C) Manual operator input
D) Vendor-provided default values
3. Which Cisco platform provides cloud-based infrastructure management and enriches Nexus Dashboard with defect/PSIRT data?
A) Cisco DNA Center
B) Cisco Intersight
C) Cisco Meraki
D) Cisco Umbrella
4. What AI-specific monitoring capability does Nexus Dashboard provide?
A) Model training code analysis
B) GPU utilization, memory, temperature, and distributed compute node monitoring
C) Automatic model hyperparameter tuning
D) Dataset quality scoring
5. What is a "threshold band" in Nexus Dashboard Insights?
A) A fixed limit set by the hardware manufacturer
B) The range around a dynamic baseline within which a KPI is considered normal
C) The maximum bandwidth of a network link
D) A frequency range for telemetry collection
Key Points
Nexus Dashboard is a cloud operational platform providing a single pane of glass for data center observability
Nexus Dashboard Insights uses AI/ML to create dynamic baselines per KPI, avoiding static threshold limitations
AI-specific capabilities include GPU performance monitoring, distributed compute node monitoring, AI fabric support, and latency anomaly detection
Cisco Intersight enriches operational data with known defect databases, field notices, PSIRT alerts, and sustainability metrics
Dashboard design should follow a hierarchical drill-down: Executive Summary, Fabric Overview, Compute/Network views, Historical views
Supported fabrics include ACI, NX-OS, VXLAN EVPN, and AI-specific fabrics (routed and VXLAN-based)
Nexus Dashboard Overview
Cisco Nexus Dashboard serves as "mission control" for the data center -- a single pane of glass aggregating data from every fabric, switch, and compute node into actionable intelligence. It hosts Nexus Dashboard Insights, which automates troubleshooting and enables rapid root-cause analysis.
Core capabilities include topology-aware visualization, real-time KPI monitoring, proactive troubleshooting, and predictive analytics with forecasting and optimization recommendations.
AI-Specific Monitoring
Capability
Description
GPU performance monitoring
Deep visibility into GPU utilization, memory, temperature, and AI-specific performance demands
Distributed compute node monitoring
Real-time monitoring of network interfaces, NICs, GPUs, and compute nodes in training jobs
AI fabric support
Telemetry for routed and VXLAN-based AI fabrics, including rail-optimized and full-mesh topologies
Latency anomaly detection
Automatic detection of unusual delay spikes at flow granularity, correlated with burst events
Dynamic Baselining Process
stateDiagram-v2
[*] --> ObserveKPIs: Collect KPI data from fabric
ObserveKPIs --> BuildBaseline: Analyze behavior patterns
BuildBaseline --> MonitorAgainstBaseline: Baseline established
MonitorAgainstBaseline --> MonitorAgainstBaseline: Metric within threshold band
MonitorAgainstBaseline --> AnomalyDetected: Metric crosses threshold band
AnomalyDetected --> GenerateAlert: Classify severity
GenerateAlert --> CorrelateWithIntersight: Enrich with defect/PSIRT data
CorrelateWithIntersight --> OperatorRemediation: Provide actionable guidance
OperatorRemediation --> MonitorAgainstBaseline: Verify fix, resume monitoring
MonitorAgainstBaseline --> UpdateBaseline: Network conditions change
UpdateBaseline --> MonitorAgainstBaseline: Baseline recalculated
Rather than relying on static thresholds, Nexus Dashboard Insights creates network-specific baselines for each KPI based on observed behavior patterns, continuously updates them to reflect changing conditions, and generates anomaly alerts when network state crosses the threshold band. Administrators can also configure custom thresholds through global rules for fine-tuning alert sensitivity.
Cisco Intersight Integration
Intersight provides cloud-based infrastructure management extending visibility across compute, storage, and networking. Integration with Nexus Dashboard creates a closed-loop workflow:
Nexus Dashboard detects anomalous behavior
Intersight correlates it with known defects or security advisories
The operator receives actionable remediation guidance
Intersight enriches data with the known defect database, field notices, PSIRT alerts, and sustainability/power metrics.
Dashboard Design Best Practices
Top level: Aggregate health scores and SLA compliance across all AI clusters
Fabric level: Topology views, link states, anomaly counts, trend indicators
Resource level: Detailed drill-downs into GPU utilization, network latency per flow, storage I/O rates
Historical views: Baseline comparisons showing whether current performance is within normal operating ranges
Key Takeaway: Cisco Nexus Dashboard and Intersight together provide comprehensive AI infrastructure monitoring -- from dynamic baselines powered by AI/ML to proactive security and defect correlation. Dashboard design should follow a hierarchical drill-down model from cluster health to individual resource metrics.
Animation Slot: Interactive dashboard hierarchy -- click through Executive Summary, Fabric Overview, Compute/GPU View, and Network View layers
Post-Quiz: Monitoring with Cisco Solutions
1. What triggers an anomaly alert in Nexus Dashboard Insights?
A) Any change in network configuration
B) When network state crosses the dynamic threshold band around a baseline
C) When a device reboots
D) Every time a new workload starts
2. What does Intersight contribute when integrated with Nexus Dashboard?
A) GPU driver updates
B) Known defect database, field notices, PSIRT alerts, and sustainability metrics
C) Model training acceleration
D) Network topology auto-configuration
3. Why are dynamic baselines preferred over static thresholds for AI fabric monitoring?
A) They require less compute resources
B) They adapt to changing network conditions, reducing false alarms and catching genuine anomalies
C) They are simpler to configure
D) They only monitor GPU metrics
4. Which fabric types does Nexus Dashboard support for AI workloads?
A) Only Cisco ACI fabrics
B) ACI, NX-OS, VXLAN EVPN, routed and VXLAN-based AI fabrics, and external fabrics
C) Only VXLAN-based fabrics
D) Only third-party fabrics via OpenConfig
5. In the hierarchical dashboard design, what belongs at the "Resource level"?
A) Aggregate SLA compliance scores
B) Topology views and anomaly counts
C) Detailed drill-downs into GPU utilization, network latency per flow, and storage I/O rates
D) Executive summary and cluster health
Section 3: Operational Telemetry and System Health
Pre-Quiz: Operational Telemetry and System Health
1. How does Model-Driven Telemetry (MDT) differ from SNMP polling?
A) MDT uses a pull model while SNMP uses push
B) MDT is a push model that streams data from devices; SNMP is a pull model that polls devices
C) There is no difference; they are the same protocol
D) MDT only works with Cisco devices while SNMP is vendor-neutral
2. What is gRPC in the context of streaming telemetry?
A) A data encoding format
B) A high-performance transport protocol used as the primary transport for telemetry
C) A Cisco-proprietary monitoring tool
D) A YANG model specification
3. What does YANG define?
A) The transport protocol for telemetry
B) The structure and semantics of telemetry data
C) The encryption algorithm for gRPC
D) The physical layer protocol for switch interconnects
4. At what GPU temperature does throttling typically trigger?
A) 65 degrees C
B) 75 degrees C
C) 85 degrees C
D) 95 degrees C
5. What does gNMI provide?
A) A proprietary Cisco management interface
B) A standardized, vendor-neutral interface for telemetry using gRPC and YANG
C) A replacement for syslog
D) A GPU monitoring library
Key Points
Model-Driven Telemetry (MDT) is a push-based approach -- devices stream data continuously, eliminating polling delays
MDT on NX-OS collects from the DME database using distinguished name (DN) paths
Two collection modes: frequency-based (periodic) and event-based (change-triggered)
Primary transport: gRPC with GPB encoding (compact, efficient); alternative: HTTP with JSON
YANG models define data structure; gNMI enables vendor-neutral telemetry collection
Critical thresholds: GPU temp > 85C triggers throttling; GPU utilization < 80% during training signals bottleneck elsewhere; storage latency > 10ms may starve GPUs
Alert severities: Critical (red), Major (orange), Minor (yellow), Warning (blue)
SNMP Polling vs. Streaming Telemetry
Traditional SNMP polling is like checking your mailbox every hour. Streaming telemetry is like receiving push notifications -- data arrives when it matters, without waiting for the next polling cycle.
sequenceDiagram
participant M as Management Station
participant D as Network Device
rect rgb(220, 230, 240)
Note over M,D: SNMP Polling (Pull Model)
M->>D: SNMP GET Request
D-->>M: SNMP Response (data)
Note over M: Wait for next poll interval...
M->>D: SNMP GET Request
D-->>M: SNMP Response (data)
end
rect rgb(210, 240, 210)
Note over M,D: Streaming Telemetry (Push Model)
D->>M: gRPC stream: metrics update (T=0s)
D->>M: gRPC stream: metrics update (T=10s)
D->>M: gRPC stream: metrics update (T=20s)
D->>M: gRPC stream: event-based alert
Note over M: Near-real-time, no polling delay
end
MDT Data Collection and Transport
Mode
Behavior
Use Case
Frequency-based (periodic)
Data collected at regular intervals
Continuous resource utilization monitoring
Event-based
Data collected only on change
Interface state changes, threshold violations
Transport
Encoding
Notes
gRPC
GPB
Primary for high-performance telemetry; chunking for payloads > 12 MB
HTTP
JSON
Simpler setup; suitable for lower-volume streams
TCP dialout
GPB/JSON
Alternative when gRPC is not supported
YANG Models and gNMI
YANG ("Yet Another Next Generation") defines the structure and semantics of telemetry data. Cisco NX-OS supports two types:
Device YANG models -- Cisco-specific, map directly to the NX-OS DME object tree
OpenConfig YANG models -- vendor-neutral for multi-vendor interoperability
gNMI (gRPC Network Management Interface) provides a standardized interface using gRPC transport and YANG-modeled data, enabling vendor-neutral monitoring pipelines across Cisco, Arista, Juniper, and other platforms.
Typical Telemetry Pipeline
flowchart LR
A["Nexus 9000 NX-OS MDT Sensors"] -->|gRPC / GPB| B["Telegraf or gNMI Collector"]
B --> C["InfluxDB / Prometheus"]
C --> D["Grafana Dashboard"]
System Health Thresholds
Dimension
Key Metrics
Critical Thresholds
GPU
Utilization %, memory, temperature, ECC errors
Utilization < 80% during training = bottleneck elsewhere; temp > 85C = throttling
Network
Interface utilization, drops, CRC errors, latency
Any drops on AI fabric links; latency spikes > 2x baseline
CPU/Memory
System CPU, memory utilization, process counts
CPU > 90% sustained; memory > 85%
Storage
IOPS, throughput (GB/s), queue depth, latency
Latency > 10ms may starve GPU pipeline
Power/Thermal
Power draw (W), inlet temp, fan speed
Power approaching PSU capacity; inlet temp > rated max
Significant degradation; investigation needed within minutes
Minor
Yellow
Deviation from baseline; schedule investigation
Warning
Blue
Informational; trend approaching threshold
Alert Configuration Best Practices
Start with dynamic baselines and tune custom thresholds based on observed false-positive rates
Set tighter thresholds for AI fabric links where small latency increases cascade into training slowdowns
Configure event-based telemetry for link state changes on GPU-to-switch connections
Use frequency-based telemetry at 10-30 second intervals for utilization metrics during training
Implement alert suppression during planned maintenance windows to avoid alarm fatigue
Key Takeaway: Model-Driven Telemetry on Cisco NX-OS provides near-real-time, push-based monitoring using gRPC/GPB transport and YANG data models. Combined with gNMI for vendor-neutral collection, this approach far exceeds traditional SNMP polling for AI infrastructure.
Animation Slot: Animated comparison of SNMP polling intervals vs. streaming telemetry continuous data flow, showing latency difference in anomaly detection
Post-Quiz: Operational Telemetry and System Health
1. What encoding format does gRPC use for high-performance telemetry on NX-OS?
A) XML
B) JSON
C) Google Protocol Buffers (GPB)
D) ASN.1/BER
2. What collection mode should be used for interface state changes on GPU-to-switch connections?
A) Frequency-based at 60-second intervals
B) Event-based telemetry
C) SNMP polling every 5 minutes
D) Manual log review
3. What does GPU utilization below 80% during an active training job typically indicate?
A) The model is too small for the GPU
B) A bottleneck exists elsewhere (network, storage, or data pipeline)
C) The GPU hardware is defective
D) Training is complete
4. What is the advantage of OpenConfig YANG models over Device YANG models?
A) They are faster to process
B) They are vendor-neutral, enabling multi-vendor interoperability
C) They provide more detailed Cisco-specific data
D) They require less bandwidth
5. At what storage latency threshold may the GPU pipeline begin to starve?
A) 1ms
B) 5ms
C) 10ms
D) 100ms
Section 4: Log Correlation and Performance Analysis
Pre-Quiz: Log Correlation and Performance Analysis
1. How many severity levels does syslog define?
A) 5 (0-4)
B) 8 (0-7)
C) 10 (0-9)
D) 3 (Low, Medium, High)
2. What is the key difference between SNMP traps and informs?
A) Traps use TCP while informs use UDP
B) Informs require manager acknowledgment and are retransmitted if unacknowledged
C) Traps are encrypted but informs are not
D) There is no difference
3. What does log correlation accomplish?
A) It encrypts log messages for security
B) It connects events across multiple devices to reconstruct the full picture of an incident
C) It deletes duplicate log entries
D) It converts logs to a standard format
4. Which SNMP version is required for production data center environments?
A) SNMPv1
B) SNMPv2c
C) SNMPv3
D) Any version is acceptable
5. What metric best indicates training efficiency for AI workloads?
A) Network packet count
B) Throughput (samples/sec)
C) Disk space remaining
D) Number of active processes
Key Points
Syslog has 8 severity levels (0=Emergency through 7=Debug); production should log at least severity 5 (Notification)
SNMPv3 is required for production -- provides message integrity, authentication, and encryption
SNMP Traps are fire-and-forget; Informs require acknowledgment and are retransmitted if unacknowledged
Log correlation connects events across devices using rules with root-cause messages and timeout timers
Best practice: deploy streaming telemetry for high-frequency AI fabric metrics AND SNMP for device inventory and legacy integration
Performance optimization workflow: Establish baselines, Monitor continuously, Detect deviations, Correlate across layers, Remediate and verify
Syslog Severity Levels
Level
Name
Description
Example
0
Emergency
System unusable
Hardware failure
1
Alert
Immediate action needed
Power supply failure
2
Critical
Critical conditions
Memory allocation failure
3
Error
Error conditions
Interface down
4
Warning
Warning conditions
Temperature approaching limit
5
Notification
Normal but significant
Interface up/down
6
Informational
Informational messages
Configuration change
7
Debug
Debug-level messages
Packet trace output
Debug-level logging (severity 7) should only be enabled temporarily during troubleshooting as it can generate enormous data volumes and impact device performance.
SNMP: Traps vs. Informs
Type
Acknowledgment
Reliability
Use Case
Traps
None (fire-and-forget)
Lower -- no retry if lost
High-volume, non-critical notifications
Informs
Manager must acknowledge
Higher -- retransmitted if unacknowledged
Critical alerts requiring guaranteed delivery
Example SNMPv3 host configuration:
snmp-server host 192.0.2.1 informs version 3 auth NMS
Streaming Telemetry vs. SNMP Comparison
Characteristic
Streaming Telemetry (MDT)
SNMP
Data model
Push (device initiates)
Pull (manager polls)
Latency
Near-real-time (seconds)
Polling interval dependent (minutes)
Scalability
High -- no polling overhead
Degrades with device count
Data richness
Full YANG model paths
MIB-constrained
Encoding
GPB (efficient) or JSON
ASN.1/BER
Best for
High-frequency AI fabric monitoring
Device discovery, capacity planning, legacy systems
Log Correlation Workflow
flowchart TD
A["Correlation Rule Configured (root-cause + related messages)"] --> B["Correlator Captures Matching Message"]
B --> C["Start Timeout Timer"]
C --> D["Continue Capturing Matching Messages"]
D --> E{"Timer Expired?"}
E -- No --> D
E -- Yes --> F{"Root-Cause Message Received?"}
F -- Yes --> G["Correlation Confirmed: Group All Related Messages"]
G --> H["Suppress Duplicates & Highlight Root Cause"]
H --> I["Deliver Correlated Alert to Operator"]
F -- No --> J["No Correlation: Forward Messages Individually"]
style G fill:#27ae60,color:#fff
style J fill:#e74c3c,color:#fff
Cross-System Correlation Example
A distributed training job stalls. The correlation timeline:
Time
Device
Event
Severity
10:01
Nexus 9364C
Interface Eth1/49 CRC errors
Warning
10:01
Nexus 9364C
Interface Eth1/49 flap
Error
10:01
GPU Server 3
NCCL timeout on rank 12
Error
10:02
Nexus Dashboard
Latency anomaly: Spine-Leaf 3
Major
10:02
GPU Servers 1-8
Training job checkpoint fail
Critical
10:03
Intersight
Known defect CSCxx12345 match
Info
Root cause: a failing transceiver on Eth1/49 caused CRC errors, triggering a link flap, disrupting the NCCL collective operation, and stalling the entire training job. Intersight identified a matching known defect.
AI/ML Performance Optimization Workflow
flowchart LR
A["1. Establish Baselines"] --> B["2. Monitor Continuously"]
B --> C["3. Detect Deviations"]
C --> D["4. Correlate Across Layers"]
D --> E["5. Remediate & Verify"]
E --> B
style A fill:#1a5276,color:#fff
style B fill:#2e86c1,color:#fff
style C fill:#e67e22,color:#fff
style D fill:#8e44ad,color:#fff
style E fill:#27ae60,color:#fff
AI/ML Workload Performance Metrics
Metric
What It Measures
Optimization Signal
GPU utilization
% of GPU compute cycles in use
Low utilization = data pipeline or network bottleneck
GPU memory utilization
% of GPU HBM in use
Near 100% = batch size at maximum; OOM = oversized model
Iteration time
Time per training step
Increasing = degradation in compute, network, or storage
Throughput (samples/sec)
Training samples per second
Primary measure of training efficiency
Network throughput per GPU
Bandwidth for collective ops
Should approach theoretical max during all-reduce
Data loading time
Time waiting for next batch
High = storage or data pipeline bottleneck
Key Takeaway: Effective AI infrastructure monitoring combines syslog for event logging, SNMP for structured queries, and streaming telemetry for real-time metrics. Log correlation across systems transforms isolated events into actionable incident narratives for rapid root-cause analysis.
Animation Slot: Interactive timeline showing cross-system log correlation -- click events to see how CRC errors cascade through the stack to training job failure
Post-Quiz: Log Correlation and Performance Analysis
1. In the log correlation process, what happens if the root-cause message is NOT received before the timeout expires?
A) All messages are suppressed
B) The timer restarts
C) No correlation occurs; messages are forwarded individually
D) A critical alert is generated
2. What syslog severity level should be the minimum for production switch logging?
A) 3 (Error)
B) 5 (Notification)
C) 7 (Debug)
D) 0 (Emergency)
3. In the cross-system correlation example, what was the root cause of the training job failure?
A) GPU memory overflow
B) A failing transceiver causing CRC errors and link flap
C) Storage subsystem latency
D) NCCL software bug
4. Which SNMPv3 security feature ensures packets have not been tampered with in transit?
A) Encryption
B) Authentication
C) Message integrity
D) Access control lists
5. What does increasing iteration time during a training job signal?
A) The model is converging faster
B) Degradation in compute, network, or storage performance