Study Guide: Chapter 7 — Distributed Tracing with OpenTelemetry

In a monolith, a stack trace tells you what happened. In a cloud-native system, that single click might cross a dozen services, two brokers, three databases, and a handful of language runtimes. The stack trace is gone — what replaces it is the distributed trace: a stitched-together view of how one request flowed through the system, who called whom, how long each hop took, and where things went wrong. OpenTelemetry (OTel) is the open standard that makes those traces portable.

Part 1 — Trace Data Model & Context Propagation

Pre-Reading Check — Part 1

1. Which identifier in a SpanContext is shared by every span in a single logical request?

SpanId TraceId TraceFlags parent_span_id

2. A worker reads a Kafka message, processes it, and finishes. Which SpanKind best describes the worker's top-level span?

SERVER CLIENT PRODUCER CONSUMER

3. In the header traceparent: 00-4bf92f35...4736-00f067aa0ba902b7-01, what does 00f067aa0ba902b7 represent?

The TraceId The version The sender's span-id (parent of any span the receiver creates) A vendor-specific tracestate value

4. You want the user ID to be available on every span emitted by every service downstream of the edge. Which OTel facility should carry it?

A span attribute on the edge SERVER span A span event W3C Baggage A span link

5. When composite propagators are configured with W3C, B3, and Jaeger formats, what happens on an outbound HTTP request?

Only the first format that succeeded on extract is injected Every enabled propagator writes its format, so multiple headers go out together The SDK randomly picks one format per request Only W3C is injected; the others are extract-only

7.1 Trace Data Model

A trace is a directed acyclic graph of spans that share a common TraceId. Each span represents one unit of work — an HTTP handler, a database query, a queue publish — and carries a name, a start/end timestamp, a parent reference, attributes, events, status, and a kind. The mental model: a span is to a trace what a stack frame is to a stack trace, except spans cross process boundaries and overlap in time when work happens in parallel.

SpanContext: the wire envelope

TraceId — 128-bit (32 hex chars), globally unique per trace; must not be all zeros.
SpanId — 64-bit (16 hex chars), unique within a trace; the parent's SpanId becomes the child's parent_span_id.
TraceFlags — 8 bits; only bit 0 is defined (01 = sampled).

Analogy: TraceId = conference badge color (everyone shares it), SpanId = individual badge number, TraceFlags = whether the photographer may publish your photo.

Span kinds

Kind	Role	Example
`SERVER`	Inbound RPC handler	HTTP handler, gRPC server method
`CLIENT`	Outbound RPC	http.Client.Do, JDBC query
`PRODUCER`	Async send to queue	kafka.Producer.Send
`CONSUMER`	Async receive/process	Kafka consumer loop
`INTERNAL`	Local work, no network hop	validate_cart, JSON parse

The SERVER/CLIENT pairing is what lets Tempo's service-graph processor build a dependency map automatically — without correct kinds, the map is guesswork.

Status, events, links

Status is UNSET, OK, or ERROR. OTel does not auto-infer status from HTTP codes: a 4xx is generally not an error on a SERVER span (the client sent a bad request), but it is from a CLIENT span's perspective. You must set it deliberately.
Events are timestamped annotations within a span. recordException(e) adds an exception event with exception.type, exception.message, and exception.stacktrace attributes.
Links reference other related SpanContexts that are not the strict parent — the right tool for fan-in (one batch span with 1,000 links instead of 1,000 parent references).

Figure 7.1 — Parent-child span tree

flowchart TD A["SERVER
POST /checkout
checkout-svc
0 - 480ms"] --> B["CLIENT
payment.charge
checkout-svc
20 - 310ms"] A --> C["CLIENT
inventory.reserve
checkout-svc
20 - 180ms"] B --> D["SERVER
POST /charge
payments-svc
30 - 300ms"] C --> E["SERVER
POST /reserve
inventory-svc
30 - 170ms"] D --> F["CLIENT
db.query users
payments-svc
50 - 110ms"] D --> G["CLIENT
POST gateway
payments-svc
120 - 290ms"] E --> H["CLIENT
db.update stock
inventory-svc
40 - 160ms"] classDef server fill:#1f3a5f,stroke:#58a6ff,color:#fff classDef client fill:#3a2f5f,stroke:#a78bfa,color:#fff class A,D,E server class B,C,F,G,H client

Animation — Span Tree Expansion (Gantt View)

Top span (root, 200ms) appears first; child spans cascade in at proper time offsets.

7.2 Context Propagation

A trace only works if every service in the path reads, preserves, and forwards the SpanContext. That cross-process handoff is context propagation, implemented by propagators that inject context into outbound carriers and extract it from inbound carriers.

W3C Trace Context — traceparent

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             |  |                                |                |
             |  |                                |                +-- trace-flags (01 = sampled)
             |  |                                +-- parent span-id (16 hex)
             |  +-- trace-id (32 hex, globally unique per trace)
             +-- version (currently "00")

The optional tracestate header carries up to ~32 vendor-specific key=value entries; leftmost has highest precedence; total ~512 chars; no commas or equals signs in values.

Animation — traceparent propagation across three hops

trace-id is invariant; only the parent span-id field rotates as the request crosses service boundaries.

Legacy formats: B3 and Jaeger

Aspect	W3C Trace Context	B3 single-header	Jaeger
Header(s)	`traceparent` + `tracestate`	`b3`	`uber-trace-id`
Separator	`-`	`-`	`:`
Sampled flag	trace-flags bit 0	3rd field: 1/0/d	flags bit 1
Debug flag	not defined	3rd field: `d`	flags bit 2
Vendor data	`tracestate`	none	none

Composite propagators let one service emit and accept all three formats simultaneously. On extract, the first propagator that succeeds wins; on inject, every enabled propagator writes its own header. That redundancy is the migration trick: roll out W3C alongside B3/Jaeger, let downstream services read whichever they understand, then drop the legacy formats.

Baggage — cross-cutting request data

baggage: user.id=12345, tenant=acme-corp, feature.checkout_v2=enabled

Aspect	Span attribute	Baggage
Lives on	A single span	The context (independent of any span)
Propagated downstream	No	Yes — auto-injected on every outbound call
Use	Describe this operation (`db.statement`)	Request-scoped data (`user.id`, `tenant.id`)
Auto-copied to spans?	n/a	No — instrumentation must opt in

Security note: untrusted clients can send any baggage they want. Sanitize at edges with an allowlist, strip internal baggage on outbound calls to third parties, and never put secrets, tokens, or PII in baggage.

Figure 7.3 — Composite propagation sequence

sequenceDiagram participant C as Client participant A as Service A
(W3C + B3) participant B as Service B
(W3C only) participant D as Service C
(B3 only) C->>A: HTTP request
traceparent: 00-{trace-id}-{span-C}-01 A->>A: extract context,
start SERVER span {span-A} A->>B: HTTP request
traceparent: 00-{trace-id}-{span-A}-01
b3: {trace-id}-{span-A}-1
baggage: user.id=12345 B->>B: extract traceparent,
start SERVER span {span-B} B->>D: HTTP request
traceparent: 00-{trace-id}-{span-B}-01
b3: {trace-id}-{span-B}-1 D->>D: extract b3 header,
start SERVER span {span-D} D-->>B: response B-->>A: response A-->>C: response Note over C,D: Same trace-id flows through
all four hops despite mixed formats

Post-Reading Check — Part 1

1. Which identifier in a SpanContext is shared by every span in a single logical request?

SpanId TraceId TraceFlags parent_span_id

2. A worker reads a Kafka message, processes it, and finishes. Which SpanKind best describes the worker's top-level span?

SERVER CLIENT PRODUCER CONSUMER

3. In the header traceparent: 00-4bf92f35...4736-00f067aa0ba902b7-01, what does 00f067aa0ba902b7 represent?

The TraceId The version The sender's span-id (parent of any span the receiver creates) A vendor-specific tracestate value

4. You want the user ID to be available on every span emitted by every service downstream of the edge. Which OTel facility should carry it?

A span attribute on the edge SERVER span A span event W3C Baggage A span link

5. When composite propagators are configured with W3C, B3, and Jaeger formats, what happens on an outbound HTTP request?

Part 2 — Building Useful Traces & Visualization

Pre-Reading Check — Part 2

6. Which HTTP server span name is correct under OTel semantic conventions?

GET /users/12345 GET /users/{id} GET https://api.example.com/users/12345?ref=home get_user_by_id with the URL in the name

7. A worker processes 5,000 Kafka messages per poll. Which instrumentation pattern is healthiest?

One span per message so every message is searchable One parent span per batch, events for noteworthy items, counters for "how many processed" One span per message but with sampling at 1% No spans — just logs

8. A payments service handler receives a malformed request and returns HTTP 400. What should the SERVER span's status be?

ERROR because the response was non-2xx OK or UNSET — the server worked correctly; the client sent a bad request ERROR because all 4xx and 5xx are errors It depends on the operation, not on the status

9. Which Tempo metrics-generator processor produces metrics keyed by the (caller, callee) edge of a dependency?

span_metrics service_graphs spanmetrics connector in the Collector histogram_quantile

10. Why are trace-derived metrics generally not the right source of truth for an SLO?

They lack the necessary HTTP status labels PromQL cannot compute quantiles over them Sampling distorts rate; tail-sampling that keeps errors/slow traces inflates the error rate Grafana cannot render them

7.3 Building Useful Traces

Auto-instrumentation will produce spans for every HTTP request and DB call out of the box. The difference between a noisy trace and a debuggable one comes down to names, attributes, error recording, and knowing when not to create a span.

Naming spans for searchability

HTTP server: route template, not raw URL. GET /users/{id}, not GET /users/12345.
HTTP client: method plus route or host. POST api.payments.svc.
Database: operation + target. SELECT users, full statement in db.statement.
Messaging: <destination> <operation>. orders.created publish.
Internal: stable verb-noun. validate_cart, compute_shipping_quote.

Test: imagine 10,000 spans — how many distinct names should appear? Tens or low hundreds, not millions. If your span name embeds a UUID, it is too specific.

Attributes vs. events vs. status

Information shape	Use	Example
Stable property of the operation	Attribute	`http.method=GET`, `db.system=postgresql`
Bounded-cardinality filter	Attribute	`http.status_code=503`
Timestamped moment within the span	Event	`cache.miss`, `retry.attempt`
Exception / error	Event + Status	`recordException(e)` + `setStatus(ERROR, "...")`
Pass/fail outcome	Status	`OK` / `ERROR`

Recording exceptions

with tracer.start_as_current_span("charge_payment") as span:
    span.set_attribute("payment.amount_cents", amount)
    try:
        gateway.charge(card, amount)
    except PaymentDeclined as e:
        span.record_exception(e)
        span.set_status(trace.StatusCode.ERROR, "payment declined")
        raise

Record then re-raise — swallowing without re-raising hides bugs.
Status ERROR is what backends color red and what Tempo's metrics-generator counts in RED metrics.
HTTP 4xx is not automatically a server-side error; it usually is on the client side.

Span explosion vs. discipline

The most common mistake is a span per loop iteration. Better options: one span per batch + events for failures; sample inside the loop; use counters for counts; use links for fan-in references.

flowchart TB subgraph wrong["Wrong: 5001 spans per batch"] W1["process_batch SERVER"] W2["process_message x 5000
uniform child spans
blows up trace storage"] W1 --> W2 end subgraph right["Right: 1 span + events + metrics"] R1["process_batch SERVER
messaging.batch.size=5000"] R2["handle_failed_message
(child span, only on error) x 3"] R3["events: cache.miss,
retry.attempt, dlq.send"] R4["counter: messages_processed_total
(metrics, not spans)"] R1 --> R2 R1 -.-> R3 R1 -.-> R4 end classDef bad fill:#5f1f1f,stroke:#f87171,color:#fff classDef good fill:#1f5f3a,stroke:#34d399,color:#fff classDef neutral fill:#1f3a5f,stroke:#58a6ff,color:#fff class W1,W2 bad class R1,R2 good class R3,R4 neutral

Rule of thumb: a span should represent a unit of work big enough that you might one day look at it in a UI.

7.4 Trace Visualization and Analysis

Jaeger and Grafana Tempo dominate the open-source tracing backends. Both ingest OTLP, both render Gantt waterfalls, and both produce RED-style metrics — but they differ in storage and integration.

Jaeger UI

Search by service, operation, time, duration, tags (http.status_code=500, error=true), or free-form trace-id lookup.
Trace timeline — the Gantt chart with attributes, events, and stack traces from record_exception.
Trace graph — nodes/edges of one trace.
System architecture — aggregated dependency graph.
Service Performance Monitoring (SPM) — RED metrics, typically via an OTel Collector spanmetrics processor.

Storage: Cassandra, Elasticsearch, or OpenSearch.

Grafana Tempo

Tempo stores spans in object storage (S3/GCS/Azure Blob) and indexes only the TraceId — per-trace lookup is cheap; full-text search is expensive. The bet: most queries come from exemplars (a metric or log line already gives you the TraceId).

metrics_generator:
  processor:
    service_graphs:
      enabled: true
      wait: 10s
      max_items: 10000
      peer_attributes:
        - peer.service
        - db.name
        - messaging.system
    span_metrics:
      enabled: true
      dimensions:
        - http.method
        - http.status_code
        - rpc.system
      include_span_kinds:
        - server
        - consumer

Animation — Spans → spanmetrics → RED metrics

Spans pour into the spanmetrics connector; Rate, Errors, Duration come out as Prometheus time series.

Service maps

An accurate service map requires three things: consistent service.name resource attribute, correct SpanKind, and peer.service (or db.name, messaging.system) on outbound spans so uninstrumented downstreams can still be inferred.

flowchart LR web["web
SERVER"] gw["api-gateway
SERVER + CLIENT"] orders["orders
SERVER + CLIENT"] pay["payments
SERVER + CLIENT"] inv["inventory
SERVER + CLIENT"] db[("db
peer.service")] web -->|"rate 920/s
err 0.1%
p95 95ms"| gw gw -->|"rate 880/s
err 0.2%
p95 180ms"| orders gw ===>|"rate 412/s
err 3.4%
p95 820ms"| pay orders -->|"rate 720/s
err 0.1%
p95 60ms"| inv orders -->|"rate 720/s
err 0.0%
p95 40ms"| db pay -->|"rate 410/s
err 0.1%
p95 35ms"| db inv -->|"rate 720/s
err 0.0%
p95 30ms"| db classDef ok fill:#1f5f3a,stroke:#34d399,color:#fff classDef hot fill:#5f1f1f,stroke:#f87171,color:#fff classDef store fill:#3a2f5f,stroke:#a78bfa,color:#fff class web,gw,orders,inv ok class pay hot class db store

RED via PromQL

rate(tempo_span_calls_total{span_kind="server"}[5m]) by (service_name)

rate(tempo_span_calls_total{span_kind="server", status_code!="OK"}[5m]) by (service_name)

histogram_quantile(0.95,
  sum by (service_name, le) (
    rate(tempo_span_duration_seconds_bucket{span_kind="server"}[5m])
  )
)

Caveats for trace-derived metrics

Sampling distorts rate. Head sampling at 10% reports ~1/10 of true rate. Tail sampling that preferentially keeps errors/slow traces over-represents errors. Treat trace-derived metrics as a correlation tool; keep direct application metrics as the SLO source of truth.
Cardinality. Don't add user_id or request_id as span_metrics dimensions — it will crash your TSDB. Stick to bounded labels: service, operation, status, method, coarse path.

Key Points — 7.4

Jaeger: Cassandra/ES storage, search-oriented UI, SPM via Collector spanmetrics.
Tempo: object-storage backed, trace-id indexed; built-in metrics-generator with span_metrics (per service) and service_graphs (per edge) processors.
RED via PromQL: rate of span_calls_total, error rate filtered on status_code!="OK", p95 via histogram_quantile over the bucket histogram.
Service maps need consistent service.name, correct SpanKind, and peer.service attributes on outbound spans.
Sampling and cardinality limits mean trace-derived metrics are great for correlation but not for SLO accounting.

Post-Reading Check — Part 2