Chapter 7: MLOps and ML Pipelines on Kubernetes

Running AI Workloads on Kubernetes — Interactive Study Guide

Learning Objectives

Pre-Study Check — What do you already know?

1. What type of computation graph do ML pipelines use to represent step dependencies?

2. What does KFP v2's pipeline caching avoid when component inputs have not changed?

3. In MLflow's Model Registry, what does the "champion" alias indicate?

4. What problem does a feature store like Feast primarily solve?

5. In a GitOps workflow, what is the single source of truth for cluster state?

6. Which KFP v2 decorator wraps a non-Python tool or shell script into a pipeline step?

7. Why should ML pipeline steps pin container images by digest rather than tag?

8. What is the role of Argo Workflows in the Kubeflow Pipelines architecture?

9. Which Feast component provides low-latency feature lookups during model serving?

10. What does a validation gate do in an ML CI/CD pipeline?

11. Which tool is a general-purpose Kubernetes CI/CD framework often used alongside ArgoCD in MLOps?

12. In Feast, what process moves computed features from the offline store to Redis for serving?

Section 1: ML Pipeline Orchestration

An ML pipeline is a directed acyclic graph (DAG) of computational steps. Each node is a containerized task; edges carry data artifacts or parameter values from one task to the next. Getting this orchestration right on Kubernetes is the foundation of production MLOps.

Kubeflow Pipelines v2 Architecture and SDK

Kubeflow Pipelines (KFP) provides a Python SDK for defining pipelines plus backend services for running them on any Kubernetes cluster. KFP v2 introduced major improvements over v1:

FeatureKFP v1KFP v2
Component decorator@component (limited)@dsl.component and @dsl.container_component
Intermediate representationArgo YAMLBackend-agnostic IR
Artifact visibilityHidden implementation detailFirst-class DAG nodes
Nested pipelinesNot supportedPipelines as pipeline components
Single component executionFull pipeline requiredRun individual components

The @dsl.component decorator converts a plain Python function into a self-contained pipeline step that KFP can containerize and schedule. The @dsl.container_component decorator gives precise control over the container command — useful when wrapping non-Python tools or shell scripts.

A Three-Step Training Pipeline

from kfp import dsl
from kfp.dsl import Dataset, Model, Output, Input

@dsl.component(base_image="python:3.11-slim",
               packages_to_install=["pandas", "scikit-learn"])
def preprocess(raw_data_path: str, dataset: Output[Dataset]):
    import pandas as pd
    df = pd.read_csv(raw_data_path).dropna()
    df.to_csv(dataset.path, index=False)

@dsl.component(base_image="python:3.11-slim",
               packages_to_install=["pandas", "scikit-learn"])
def train(dataset: Input[Dataset], model: Output[Model],
          n_estimators: int = 100):
    import pandas as pd
    from sklearn.ensemble import RandomForestClassifier
    import joblib
    df = pd.read_csv(dataset.path)
    X, y = df.drop("label", axis=1), df["label"]
    clf = RandomForestClassifier(n_estimators=n_estimators)
    clf.fit(X, y)
    joblib.dump(clf, model.path)

@dsl.pipeline(name="random-forest-pipeline")
def rf_pipeline(data_path: str, n_estimators: int = 100):
    prep = preprocess(raw_data_path=data_path)
    fit  = train(dataset=prep.outputs["dataset"],
                 n_estimators=n_estimators)
    evaluate(dataset=prep.outputs["dataset"],
             model=fit.outputs["model"])

Notice how Dataset and Model types are declared as Output and Input parameters. KFP v2 surfaces these as first-class artifact nodes in the pipeline visualization, so engineers can trace exactly which model artifact came from which training run.

ML Pipeline DAG Execution STEP 1 Data Prep preprocess() STEP 2 Train Model train() STEP 3 Evaluate evaluate() STEP 4 Deploy register & serve Artifact Store MinIO / S3 Dataset Dataset Model Score

Figure 7.1 — ML Pipeline DAG: steps execute in dependency order, passing typed artifacts along edges

Argo Workflows for ML DAGs

KFP compiles Python pipelines into an intermediate representation executed by Argo Workflows as the default backend. Argo can also be used directly — teams working in multiple languages or wrapping legacy tools often prefer writing Argo YAML. Argo provides DAG-based execution, parameter passing, S3/GCS artifact management, and configurable retry policies.

Tekton Pipelines for ML CI/CD

Tekton occupies a different niche: it is a general-purpose CI/CD framework on Kubernetes. Its strength in MLOps is in the CI/CD layer — building container images, running linting and unit tests before training, and triggering downstream deployments after model validation. Tekton integrates naturally with ArgoCD to create a full GitOps deployment chain.

Pipeline Parameterization and Caching

Effective pipelines treat all tunable values as explicit parameters. This enables:

Pipeline caching is a standout feature: when a component's inputs (parameter values and artifact content hashes) match a previous execution, KFP reuses the cached output. Change n_estimators and only train and evaluate re-execute; preprocess returns instantly from cache.

Key Takeaways — Section 1

Section 2: Experiment Tracking and Model Registry

MLflow on Kubernetes

Every training run produces metrics, parameters, and artifacts. Without a system to capture them, teams lose track of which configuration produced which result. MLflow is the most widely adopted open-source solution, deployed on Kubernetes as a three-tier architecture:

Kubernetes ObjectPurpose
DeploymentRuns the MLflow tracking server as a scalable pod
Service + IngressExposes the UI and REST API
StatefulSet (PostgreSQL)Persists run metadata and model registry entries
Secret / ConfigMapStores database credentials and S3 endpoint configuration
import mlflow

mlflow.set_tracking_uri(
    "http://mlflow-service.mlops.svc.cluster.local:5000"
)
mlflow.set_experiment("fraud-detection-v2")

with mlflow.start_run():
    mlflow.log_param("n_estimators", 200)
    mlflow.log_param("max_depth", 8)
    mlflow.log_metric("accuracy", 0.942)
    mlflow.log_metric("f1_score", 0.917)
    mlflow.sklearn.log_model(clf, "model")

Model Versioning with the MLflow Registry

The MLflow Model Registry provides a versioned catalog of model artifacts with semantic aliases that communicate production status:

AliasMeaning
championThe model currently serving production traffic
challengerA candidate model undergoing A/B testing
shadowA model receiving traffic copies for offline evaluation
archivedA retired model retained for audit purposes

Promotion between stages is a deliberate, audited action — not an automatic side-effect of training. Downstream systems reference aliases rather than hard-coded version numbers, enabling safe canary promotions and rollbacks.

stateDiagram-v2 [*] --> Registered : mlflow.register_model() Registered --> Challenger : Manual promotion (team review) Challenger --> Shadow : A/B test approved Shadow --> Champion : Shadow validation passed Champion --> Archived : New champion promoted Challenger --> Archived : Validation failed

W&B and Neptune Integration

For richer visualization and real-time collaboration, managed services like Weights & Biases and Neptune extend the MLflow pattern. Both integrate via environment variables and lightweight SDK calls. The key trade-off: self-hosted MLflow keeps data governance simple; managed SaaS reduces operational burden.

Key Takeaways — Section 2

Section 3: Feature Stores on Kubernetes

The Training-Serving Skew Problem

Imagine building a fraud detection model. During training, you compute "transactions in the last 24 hours" using a batch SQL query. In production, the same feature must be computed in real time from a live event stream. If these two computations diverge even slightly, the model degrades silently. This is training-serving skew — one of the most common causes of silent model degradation in production.

Feast Architecture

Feast solves this by providing a single versioned feature definition shared between training and serving. Its architecture separates two concerns with distinct storage backends:

ComponentStorage BackendUse Case
Offline storePostgreSQL, BigQuery, SnowflakeHistorical feature retrieval for training
Online storeRedisLow-latency feature lookup during serving
RegistryPostgreSQLFeature definitions, versioning, metadata
Feature Store Data Flow Offline Store PostgreSQL / BigQuery Materialization K8s CronJob Online Store Redis Inference Service get_online_features() Training Pipeline get_historical_features() Feature Registry Single definition shared by both paths Prediction historical data computed features low-latency

Figure 7.4 — Feast dual-path architecture: offline store feeds training, online store (Redis) serves inference, both share one feature definition

Online and Offline Feature Serving

The workflow is:

  1. Feature definitions are written as Python FeatureView objects and committed to Git
  2. Materialization runs on a schedule to push computed features from the offline store into Redis
  3. Training pipelines retrieve historical point-in-time features via get_historical_features
  4. Serving infrastructure retrieves features at inference time via get_online_features — using identical definitions
# Training: historical features
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=["user_stats:transactions_24h",
              "user_stats:avg_amount_7d"]
).to_df()

# Serving: online features (identical feature names)
online_features = store.get_online_features(
    features=["user_stats:transactions_24h",
              "user_stats:avg_amount_7d"],
    entity_rows=[{"user_id": "u-12345"}]
).to_dict()

Key Takeaways — Section 3

Section 4: CI/CD for Machine Learning

ML CI/CD must do everything traditional CI/CD does and: retrain models on new data, validate statistical model quality before promotion, manage large binary artifacts (model weights), and handle the fact that "tests passing" does not guarantee a model will perform well on tomorrow's data distribution.

GitOps with ArgoCD

GitOps makes Git the single source of truth for cluster state. A GitOps controller (ArgoCD or Flux) watches a Git repository and continuously reconciles the cluster to match what is declared there. For ML, this means model configuration changes are committed to Git, ArgoCD detects the drift, and applies the update automatically.

GitOps ML Deployment Loop Git Commit model-config.yaml push CI Build Tekton: test + build image ArgoCD Sync detect drift + reconcile K8s Deploy KServe model update Monitor metrics + drift detection Retrain Trigger if drift detected Continuous Loop

Figure 7.5 — GitOps ML deployment loop: commit to Git triggers CI, ArgoCD syncs to cluster, monitoring feeds back into retraining

Automated Model Validation Gates

A validation gate blocks promotion unless a model meets defined quality thresholds:

Gate TypeExample CriterionAction on Failure
Accuracy thresholdAccuracy >= 0.90 on held-out test setBlock promotion, alert team
Regression checkF1 score >= 95% of current champion's F1Block promotion
Data quality checkFeature distributions within 2 sigma of training distributionBlock promotion
Latency checkp99 inference latency <= 100ms under loadBlock promotion
Bias/fairness auditEqual opportunity difference <= 0.05Block promotion

Container Image Management

ML images have unique challenges: a PyTorch training image with CUDA can exceed 10 GB, dependencies must be mutually compatible (CUDA + cuDNN + PyTorch + Python versions), and exact reproducibility months later is required.

PracticeImplementation
Multi-stage buildsSeparate build-time deps from runtime image
Pinned base imagesFROM pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime
Image signingCosign signatures verified at admission time
Shared base imagesOne org-wide CUDA base; per-project layers on top

Reproducibility

Reproducibility requires discipline at four layers:

  1. Environment: Docker image digest (not tag) pinned in pipeline
  2. Code versioning: Git SHA pinned in pipeline metadata
  3. Data versioning: DVC or Delta Lake snapshot for training datasets
  4. Random seeds: torch.manual_seed(), numpy.random.seed()

Pinning pipeline steps to image digests (sha256:...) rather than tags is the single most impactful reproducibility practice — digests are content-addressed and immutable, unlike tags which can be overwritten.

Key Takeaways — Section 4

Post-Study Check — Test your understanding

1. What type of computation graph do ML pipelines use to represent step dependencies?

2. What does KFP v2's pipeline caching avoid when component inputs have not changed?

3. In MLflow's Model Registry, what does the "champion" alias indicate?

4. What problem does a feature store like Feast primarily solve?

5. In a GitOps workflow, what is the single source of truth for cluster state?

6. Which KFP v2 decorator wraps a non-Python tool or shell script into a pipeline step?

7. Why should ML pipeline steps pin container images by digest rather than tag?

8. What is the role of Argo Workflows in the Kubeflow Pipelines architecture?

9. Which Feast component provides low-latency feature lookups during model serving?

10. What does a validation gate do in an ML CI/CD pipeline?

11. Which tool is a general-purpose Kubernetes CI/CD framework often used alongside ArgoCD in MLOps?

12. In Feast, what process moves computed features from the offline store to Redis for serving?

Your Progress

Answer Explanations