Study Guide: Chapter 7 — MLOps and ML Pipelines on Kubernetes

Pre-Study Check — What do you already know?

1. What type of computation graph do ML pipelines use to represent step dependencies?

2. What does KFP v2's pipeline caching avoid when component inputs have not changed?

3. In MLflow's Model Registry, what does the "champion" alias indicate?

4. What problem does a feature store like Feast primarily solve?

5. In a GitOps workflow, what is the single source of truth for cluster state?

6. Which KFP v2 decorator wraps a non-Python tool or shell script into a pipeline step?

7. Why should ML pipeline steps pin container images by digest rather than tag?

8. What is the role of Argo Workflows in the Kubeflow Pipelines architecture?

9. Which Feast component provides low-latency feature lookups during model serving?

10. What does a validation gate do in an ML CI/CD pipeline?

11. Which tool is a general-purpose Kubernetes CI/CD framework often used alongside ArgoCD in MLOps?

12. In Feast, what process moves computed features from the offline store to Redis for serving?

Section 1: ML Pipeline Orchestration

An ML pipeline is a directed acyclic graph (DAG) of computational steps. Each node is a containerized task; edges carry data artifacts or parameter values from one task to the next. Getting this orchestration right on Kubernetes is the foundation of production MLOps.

Kubeflow Pipelines v2 Architecture and SDK

Kubeflow Pipelines (KFP) provides a Python SDK for defining pipelines plus backend services for running them on any Kubernetes cluster. KFP v2 introduced major improvements over v1:

Feature	KFP v1	KFP v2
Component decorator	`@component` (limited)	`@dsl.component` and `@dsl.container_component`
Intermediate representation	Argo YAML	Backend-agnostic IR
Artifact visibility	Hidden implementation detail	First-class DAG nodes
Nested pipelines	Not supported	Pipelines as pipeline components
Single component execution	Full pipeline required	Run individual components

The @dsl.component decorator converts a plain Python function into a self-contained pipeline step that KFP can containerize and schedule. The @dsl.container_component decorator gives precise control over the container command — useful when wrapping non-Python tools or shell scripts.

A Three-Step Training Pipeline

from kfp import dsl
from kfp.dsl import Dataset, Model, Output, Input

@dsl.component(base_image="python:3.11-slim",
               packages_to_install=["pandas", "scikit-learn"])
def preprocess(raw_data_path: str, dataset: Output[Dataset]):
    import pandas as pd
    df = pd.read_csv(raw_data_path).dropna()
    df.to_csv(dataset.path, index=False)

@dsl.component(base_image="python:3.11-slim",
               packages_to_install=["pandas", "scikit-learn"])
def train(dataset: Input[Dataset], model: Output[Model],
          n_estimators: int = 100):
    import pandas as pd
    from sklearn.ensemble import RandomForestClassifier
    import joblib
    df = pd.read_csv(dataset.path)
    X, y = df.drop("label", axis=1), df["label"]
    clf = RandomForestClassifier(n_estimators=n_estimators)
    clf.fit(X, y)
    joblib.dump(clf, model.path)

@dsl.pipeline(name="random-forest-pipeline")
def rf_pipeline(data_path: str, n_estimators: int = 100):
    prep = preprocess(raw_data_path=data_path)
    fit  = train(dataset=prep.outputs["dataset"],
                 n_estimators=n_estimators)
    evaluate(dataset=prep.outputs["dataset"],
             model=fit.outputs["model"])

Notice how Dataset and Model types are declared as Output and Input parameters. KFP v2 surfaces these as first-class artifact nodes in the pipeline visualization, so engineers can trace exactly which model artifact came from which training run.

ML Pipeline DAG Execution

Figure 7.1 — ML Pipeline DAG: steps execute in dependency order, passing typed artifacts along edges

Argo Workflows for ML DAGs

KFP compiles Python pipelines into an intermediate representation executed by Argo Workflows as the default backend. Argo can also be used directly — teams working in multiple languages or wrapping legacy tools often prefer writing Argo YAML. Argo provides DAG-based execution, parameter passing, S3/GCS artifact management, and configurable retry policies.

Tekton Pipelines for ML CI/CD

Tekton occupies a different niche: it is a general-purpose CI/CD framework on Kubernetes. Its strength in MLOps is in the CI/CD layer — building container images, running linting and unit tests before training, and triggering downstream deployments after model validation. Tekton integrates naturally with ArgoCD to create a full GitOps deployment chain.

Pipeline Parameterization and Caching

Effective pipelines treat all tunable values as explicit parameters. This enables:

Hyperparameter sweeps: run the same pipeline with different learning rates or batch sizes
Environment promotion: swap a data_path parameter to move from staging to production data
Reproducibility auditing: every run records its exact parameter set for recreation on demand

Pipeline caching is a standout feature: when a component's inputs (parameter values and artifact content hashes) match a previous execution, KFP reuses the cached output. Change n_estimators and only train and evaluate re-execute; preprocess returns instantly from cache.

Key Takeaways — Section 1

KFP v2 provides a Python-first authoring experience that compiles to backend-agnostic IR executed by Argo.
First-class artifact types (Dataset, Model) make data lineage visible in the DAG.
Pipeline caching avoids re-running unchanged steps, dramatically reducing iteration time.
Tekton complements KFP/Argo by handling CI/CD concerns (image builds, tests, deployment triggers).

Section 2: Experiment Tracking and Model Registry

MLflow on Kubernetes

Every training run produces metrics, parameters, and artifacts. Without a system to capture them, teams lose track of which configuration produced which result. MLflow is the most widely adopted open-source solution, deployed on Kubernetes as a three-tier architecture:

Kubernetes Object	Purpose
`Deployment`	Runs the MLflow tracking server as a scalable pod
`Service` + `Ingress`	Exposes the UI and REST API
`StatefulSet` (PostgreSQL)	Persists run metadata and model registry entries
`Secret` / `ConfigMap`	Stores database credentials and S3 endpoint configuration

import mlflow

mlflow.set_tracking_uri(
    "http://mlflow-service.mlops.svc.cluster.local:5000"
)
mlflow.set_experiment("fraud-detection-v2")

with mlflow.start_run():
    mlflow.log_param("n_estimators", 200)
    mlflow.log_param("max_depth", 8)
    mlflow.log_metric("accuracy", 0.942)
    mlflow.log_metric("f1_score", 0.917)
    mlflow.sklearn.log_model(clf, "model")

Model Versioning with the MLflow Registry

The MLflow Model Registry provides a versioned catalog of model artifacts with semantic aliases that communicate production status:

Alias	Meaning
`champion`	The model currently serving production traffic
`challenger`	A candidate model undergoing A/B testing
`shadow`	A model receiving traffic copies for offline evaluation
`archived`	A retired model retained for audit purposes

Promotion between stages is a deliberate, audited action — not an automatic side-effect of training. Downstream systems reference aliases rather than hard-coded version numbers, enabling safe canary promotions and rollbacks.

stateDiagram-v2 [*] --> Registered : mlflow.register_model() Registered --> Challenger : Manual promotion (team review) Challenger --> Shadow : A/B test approved Shadow --> Champion : Shadow validation passed Champion --> Archived : New champion promoted Challenger --> Archived : Validation failed

W&B and Neptune Integration

For richer visualization and real-time collaboration, managed services like Weights & Biases and Neptune extend the MLflow pattern. Both integrate via environment variables and lightweight SDK calls. The key trade-off: self-hosted MLflow keeps data governance simple; managed SaaS reduces operational burden.

Key Takeaways — Section 2

MLflow on Kubernetes provides experiment tracking, model versioning, and artifact storage in one platform.
Semantic aliases ("champion", "challenger") decouple deployment from specific version numbers.
The three-tier architecture (tracking server, PostgreSQL, MinIO/S3) is the standard production pattern.
Choose self-hosted MLflow for regulated data; managed SaaS (W&B, Neptune) for exploratory research.

Section 3: Feature Stores on Kubernetes

The Training-Serving Skew Problem

Imagine building a fraud detection model. During training, you compute "transactions in the last 24 hours" using a batch SQL query. In production, the same feature must be computed in real time from a live event stream. If these two computations diverge even slightly, the model degrades silently. This is training-serving skew — one of the most common causes of silent model degradation in production.

Feast Architecture

Feast solves this by providing a single versioned feature definition shared between training and serving. Its architecture separates two concerns with distinct storage backends:

Component	Storage Backend	Use Case
Offline store	PostgreSQL, BigQuery, Snowflake	Historical feature retrieval for training
Online store	Redis	Low-latency feature lookup during serving
Registry	PostgreSQL	Feature definitions, versioning, metadata

Feature Store Data Flow

Figure 7.4 — Feast dual-path architecture: offline store feeds training, online store (Redis) serves inference, both share one feature definition

Online and Offline Feature Serving

The workflow is:

Feature definitions are written as Python FeatureView objects and committed to Git
Materialization runs on a schedule to push computed features from the offline store into Redis
Training pipelines retrieve historical point-in-time features via get_historical_features
Serving infrastructure retrieves features at inference time via get_online_features — using identical definitions

# Training: historical features
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=["user_stats:transactions_24h",
              "user_stats:avg_amount_7d"]
).to_df()

# Serving: online features (identical feature names)
online_features = store.get_online_features(
    features=["user_stats:transactions_24h",
              "user_stats:avg_amount_7d"],
    entity_rows=[{"user_id": "u-12345"}]
).to_dict()

Key Takeaways — Section 3

Feast eliminates training-serving skew by sharing a single versioned feature definition between both paths.
The offline store (PostgreSQL/BigQuery) serves training; the online store (Redis) serves inference.
Materialization (K8s CronJob) pushes precomputed features from offline to online store on a schedule.
A 3-replica Feast server backed by Redis and PostgreSQL gives production-ready horizontal scaling.

Section 4: CI/CD for Machine Learning

ML CI/CD must do everything traditional CI/CD does and: retrain models on new data, validate statistical model quality before promotion, manage large binary artifacts (model weights), and handle the fact that "tests passing" does not guarantee a model will perform well on tomorrow's data distribution.

GitOps with ArgoCD

GitOps makes Git the single source of truth for cluster state. A GitOps controller (ArgoCD or Flux) watches a Git repository and continuously reconciles the cluster to match what is declared there. For ML, this means model configuration changes are committed to Git, ArgoCD detects the drift, and applies the update automatically.

GitOps ML Deployment Loop

Figure 7.5 — GitOps ML deployment loop: commit to Git triggers CI, ArgoCD syncs to cluster, monitoring feeds back into retraining

Automated Model Validation Gates

A validation gate blocks promotion unless a model meets defined quality thresholds:

Gate Type	Example Criterion	Action on Failure
Accuracy threshold	Accuracy >= 0.90 on held-out test set	Block promotion, alert team
Regression check	F1 score >= 95% of current champion's F1	Block promotion
Data quality check	Feature distributions within 2 sigma of training distribution	Block promotion
Latency check	p99 inference latency <= 100ms under load	Block promotion
Bias/fairness audit	Equal opportunity difference <= 0.05	Block promotion

Container Image Management

ML images have unique challenges: a PyTorch training image with CUDA can exceed 10 GB, dependencies must be mutually compatible (CUDA + cuDNN + PyTorch + Python versions), and exact reproducibility months later is required.

Practice	Implementation
Multi-stage builds	Separate build-time deps from runtime image
Pinned base images	`FROM pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime`
Image signing	Cosign signatures verified at admission time
Shared base images	One org-wide CUDA base; per-project layers on top

Reproducibility

Reproducibility requires discipline at four layers:

Environment: Docker image digest (not tag) pinned in pipeline
Code versioning: Git SHA pinned in pipeline metadata
Data versioning: DVC or Delta Lake snapshot for training datasets
Random seeds: torch.manual_seed(), numpy.random.seed()

Pinning pipeline steps to image digests (sha256:...) rather than tags is the single most impactful reproducibility practice — digests are content-addressed and immutable, unlike tags which can be overwritten.

Key Takeaways — Section 4

GitOps with ArgoCD makes ML deployments declarative, auditable, and automatically reconciled.
Validation gates (accuracy, regression, latency, fairness) block unsafe model promotions.
Pin images by digest, not tag, for true reproducibility.
The four layers of reproducibility: environment, code, data, random seeds.

Post-Study Check — Test your understanding