Study Guide: Chapter 2 — GPU and Accelerator Management

Pre-Quiz — What do you already know?

1. What is the primary role of a Kubernetes device plugin for GPUs?

It compiles CUDA code inside containers at runtime

It discovers GPU hardware, advertises it as extended resources, and handles device allocation to pods

It replaces the container runtime to support GPU workloads

It installs NVIDIA drivers directly onto worker node operating systems

2. When a pod requests nvidia.com/gpu: 1, what component is responsible for mounting the actual GPU device files into the container?

The Kubernetes API server

The kube-scheduler

The NVIDIA Container Toolkit, after the device plugin allocates the device

The etcd database

3. How does the NVIDIA device plugin communicate with the kubelet?

Through the Kubernetes API server using REST calls

Via gRPC over a Unix socket at /var/lib/kubelet/device-plugins/

Through environment variables set in the pod spec

By writing directly to the container filesystem

4. What is the key advantage of the Container Device Interface (CDI) over the traditional device plugin approach?

CDI provides faster GPU computation speeds

CDI decouples device injection from the container runtime, improving portability across containerd and CRI-O

CDI automatically installs GPU drivers on new nodes

CDI replaces the need for CUDA libraries entirely

5. Why does the NVIDIA Container Toolkit inject driver libraries at runtime rather than requiring them in every container image?

Container images cannot contain binary libraries

It decouples the workload image from the host driver version, so a single image works across nodes with different driver versions

Runtime injection is faster than pre-baked libraries

NVIDIA licensing prevents including drivers in container images

The GPU Landscape for AI

GPUs were originally designed for rendering pixels in parallel. That same parallel architecture — thousands of smaller compute cores working simultaneously — is exactly what neural network training and inference require. Three major vendors compete in the Kubernetes AI workload space:

Vendor	Key Products	Kubernetes Support	Notes
NVIDIA	A100, H100, L40, RTX 40-series	Mature (device plugin + GPU Operator)	Dominant; CUDA ecosystem widely supported
AMD	Instinct MI300, RX 7900 XTX	Stable (ROCm device plugin)	Growing adoption; ROCm is the CUDA equivalent
Intel	Gaudi 2/3, Arc GPUs	Emerging (Intel Device Plugin)	Strong for data center inference; OpenVINO

NVIDIA dominates Kubernetes AI workloads because the CUDA (Compute Unified Device Architecture) ecosystem has been the default target for nearly every major AI framework (PyTorch, TensorFlow, JAX). However, the architectural patterns — device plugins, resource requests, sharing strategies — apply across all vendors.

Device Plugin Architecture and Discovery

Kubernetes was designed around a generic resource model. GPUs are not built-in concepts — they are extended resources: arbitrary named quantities a node advertises and a pod can request. The device plugin framework bridges physical GPU hardware and the Kubernetes resource model.

A device plugin is a small program running as a DaemonSet on each GPU node. It communicates with the kubelet over a Unix socket using gRPC and performs three core functions:

Discovery — Detects available devices using NVML (NVIDIA Management Library)
Advertisement — Registers devices as extended resources (e.g., nvidia.com/gpu: 4) so the scheduler is aware of them
Allocation — When a pod needs a GPU, the plugin injects the correct device files and environment variables into the container

Animation: Device Plugin Architecture — Discovery, Advertisement, and Allocation

Container Runtime GPU Passthrough

Even with a device plugin advertising GPU resources, the container itself still needs to use those GPUs. The NVIDIA Container Toolkit (formerly nvidia-docker2) hooks into the container runtime (containerd or CRI-O) and, at container launch time, injects the correct GPU devices and driver libraries without requiring those libraries in every container image.

What happens when a GPU pod starts:

Pod spec requests nvidia.com/gpu: 1
Scheduler finds a node with available nvidia.com/gpu capacity
kubelet calls the NVIDIA device plugin to allocate one GPU
Device plugin returns device paths (/dev/nvidia0) and environment variables
NVIDIA Container Toolkit intercepts the container start, mounts /dev/nvidia0, and injects CUDA libraries
Container starts with full GPU access

Key Takeaway

GPUs enter Kubernetes as extended resources via the device plugin framework
The NVIDIA device plugin runs as a DaemonSet, discovers GPUs with NVML, and advertises them as nvidia.com/gpu
The device plugin communicates with kubelet via gRPC over a Unix socket
The Container Toolkit handles runtime injection of GPU drivers and device files into containers — decoupling workload images from host driver versions
CDI (Container Device Interface) is an emerging standard for more portable device injection

Post-Quiz — What did you learn?

1. What is the primary role of a Kubernetes device plugin for GPUs?

It compiles CUDA code inside containers at runtime

It discovers GPU hardware, advertises it as extended resources, and handles device allocation to pods

It replaces the container runtime to support GPU workloads

It installs NVIDIA drivers directly onto worker node operating systems

2. When a pod requests nvidia.com/gpu: 1, what component is responsible for mounting the actual GPU device files into the container?

The Kubernetes API server

The kube-scheduler

The NVIDIA Container Toolkit, after the device plugin allocates the device

The etcd database

3. How does the NVIDIA device plugin communicate with the kubelet?

Through the Kubernetes API server using REST calls

Via gRPC over a Unix socket at /var/lib/kubelet/device-plugins/

Through environment variables set in the pod spec

By writing directly to the container filesystem

4. What is the key advantage of the Container Device Interface (CDI) over the traditional device plugin approach?

CDI provides faster GPU computation speeds

CDI decouples device injection from the container runtime, improving portability across containerd and CRI-O

CDI automatically installs GPU drivers on new nodes

CDI replaces the need for CUDA libraries entirely

5. Why does the NVIDIA Container Toolkit inject driver libraries at runtime rather than requiring them in every container image?

Container images cannot contain binary libraries

It decouples the workload image from the host driver version, so a single image works across nodes with different driver versions

Runtime injection is faster than pre-baked libraries

NVIDIA licensing prevents including drivers in container images

Section 2: NVIDIA GPU Operator

Pre-Quiz — What do you already know?

1. What problem does the NVIDIA GPU Operator solve that manual GPU management does not?

It provides GPU hardware at a lower cost

It automates the entire GPU software lifecycle — drivers, runtime config, device plugin, monitoring — using a reconciliation loop

It increases the number of CUDA cores available per GPU

It replaces Kubernetes scheduling entirely for GPU workloads

2. Why does the GPU Operator deploy NVIDIA drivers as a container rather than installing them directly on the host?

Containerized drivers run faster than native kernel modules

It avoids host OS version lock-in and simplifies node upgrades by keeping driver lifecycle independent of the host

Host-installed drivers are not compatible with Kubernetes

Container images are required by NVIDIA licensing

3. What happens if the CUDA Validator component in the GPU Operator's provisioning workflow fails?

The GPU Operator ignores the failure and proceeds to schedule workloads

The node is immediately drained and removed from the cluster

The node remains unschedulable for GPU workloads; the Operator logs the error and retries provisioning

All other GPU nodes in the cluster are also paused until the issue is resolved

4. What is the purpose of GPU Feature Discovery (GFD)?

It discovers new GPU hardware models before they are released

It automatically labels nodes with detailed GPU metadata (model, memory, CUDA version, MIG capability) for scheduling

It discovers and installs missing CUDA libraries on worker nodes

It monitors GPU temperature and automatically throttles workloads

5. A pod requires an 80 GB A100 GPU with MIG support. How can this requirement be expressed in Kubernetes without manually maintaining node pools?

By setting a CPU request equal to the number of GPU cores needed

By using node affinity rules that match GFD-generated labels like nvidia.com/gpu.product and nvidia.com/mig.capable

By deploying a separate scheduler exclusively for A100 nodes

By adding a ConfigMap that maps pod names to specific nodes

The Operator Pattern for GPU Management

Managing GPU nodes manually — installing drivers, configuring the container toolkit, deploying the device plugin, setting up monitoring — is a fragile, error-prone process. The NVIDIA GPU Operator encodes this operational knowledge into a Kubernetes Operator: a controller that continuously watches cluster state and reconciles it toward the desired GPU software configuration.

Components Managed by the GPU Operator

Component	Purpose
NVIDIA Drivers	Kernel module enabling CUDA; deployed as a privileged container
Container Toolkit	Hooks the container runtime to inject GPU access into pods
Device Plugin	Advertises `nvidia.com/gpu` extended resources to the scheduler
GPU Feature Discovery	Labels nodes with GPU metadata (model, memory, CUDA version)
DCGM Exporter	Exposes GPU metrics (utilization, temperature, memory) for Prometheus
MIG Manager	Configures MIG partitioning on supported hardware
CUDA Validator	Runs a test workload to confirm CUDA is functional before scheduling

Installation via Helm

# Add the NVIDIA Helm repository
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update

# Create the namespace and label it for privileged pod security
kubectl create namespace gpu-operator
kubectl label --overwrite namespace gpu-operator \
  pod-security.kubernetes.io/enforce=privileged

# Install the GPU Operator
helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator --wait

Automated Provisioning Workflow

For each new GPU node, the Operator follows a three-step workflow:

Discovery — Identifies nodes with NVIDIA GPUs using labels and hardware detection
Installation and Configuration — Deploys the driver container, configures the Container Toolkit, starts the device plugin and monitoring stack
Validation — The CUDA Validator runs a test workload; only after it passes does the node accept GPU workloads

The validation gate is critical: it prevents misconfigured nodes from silently accepting GPU pods that would then fail at runtime.

Node Labeling and GPU Feature Discovery

GPU Feature Discovery (GFD) automatically labels nodes with rich GPU metadata, enabling fine-grained scheduling decisions. Example labels on a node with an A100:

nvidia.com/gpu.present=true
nvidia.com/gpu.product=A100-SXM4-80GB
nvidia.com/gpu.memory=81920
nvidia.com/gpu.count=8
nvidia.com/cuda.driver.major=525
nvidia.com/mig.capable=true

With these labels, pods can express hardware requirements as node affinity rules rather than relying on manually maintained node pools.

Key Takeaway

The GPU Operator automates the entire GPU software lifecycle — drivers, runtime, device plugin, monitoring — using the Kubernetes Operator pattern
Drivers are deployed as containerized privileged workloads, avoiding host OS version lock-in
The CUDA Validator is a critical gate that prevents misconfigured nodes from receiving workloads
GFD labels provide rich node metadata (GPU model, memory, CUDA version, MIG capability) enabling intelligent scheduling