Chapter 10: The OpenTelemetry Collector in Depth

Learning Objectives

Pre-Study Assessment

1. Which Collector component is the contract that actually wires receivers, processors, and exporters into runnable pipelines?

The receivers top-level block
The service.pipelines block
The extensions block
The connectors block

2. Which processor should always be listed first in a Collector pipeline?

batch
transform
memory_limiter
k8sattributes

3. What is a connector in the OpenTelemetry Collector?

An extension that exposes Prometheus metrics on port 8888
A processor that drops spans matching a predicate
A hybrid component that acts as an exporter on one pipeline and a receiver on another
A YAML anchor used to share config across pipelines

4. Why is the batch processor recommended in essentially every production pipeline?

It enriches spans with Kubernetes pod metadata
It groups telemetry into larger payloads, dramatically reducing per-call overhead at the exporter
It applies tail-sampling policies based on latency and error status
It encrypts data in flight to the backend

5. For tail_sampling policies to actually have spans to evaluate, the SDK must export traces using which sampler?

An aggressive traceidratiobased(0.01) head sampler
always_off — let the Collector decide everything
always_on (or parentbased(always_on)) so unsampled spans reach the Collector
No sampler is needed; tail sampling reconstructs dropped spans

6. Which receiver is most commonly used to ingest container logs from /var/log/pods on a Kubernetes node?

otlp
prometheus
hostmetrics
filelog

7. Which exporter is the standard choice for fanning metrics out into the Prometheus / Mimir / Cortex / Thanos ecosystem?

loki
otlp
prometheusremotewrite
debug

8. What is the role of the file_storage extension on a gateway Collector?

It stores Collector binary releases for rolling upgrades
It backs the exporter sending_queue with disk so accepted telemetry survives restarts
It writes a debug log of every span to a flat file
It mounts a ConfigMap into the Collector pod

9. Which extension exposes a live, in-process view of pipelines and exporter queues for incident triage?

health_check
pprof
zpages
file_storage

10. In the recommended two-tier topology, where does tail_sampling belong?

On the DaemonSet agent, close to each application pod
In the SDK, before spans ever leave the application
On the centralized gateway Deployment, where it can buffer whole traces
On the backend (Tempo), not in the Collector

Section 1: Pipeline Architecture

The Collector is not a single black box — it is a configurable pipeline engine built from four kinds of components, plus optional extensions. Every piece of telemetry flowing through takes the same conceptual journey: it enters through a receiver, traverses a chain of processors, and leaves through one or more exporters. Pipelines are declared per signal type (traces, metrics, logs), and the service block is what actually wires components together into runnable pipelines.

The four component types (plus extensions)

ComponentRoleExamples
ReceiverAccepts data in (push) or pulls from a sourceotlp, prometheus, hostmetrics, filelog, kafka
ProcessorMutates, filters, batches, samples, enriches in flightmemory_limiter, batch, transform, tail_sampling
ExporterSends data to one or more backendsotlp, prometheusremotewrite, loki, debug
ConnectorJoins two pipelines — exporter on one side, receiver on the otherspanmetrics, routing, forward
ExtensionNon-pipeline capabilities (health, profiling, debugging)health_check, pprof, zpages, file_storage

Connectors are the cleanest way to derive one signal from another — for example, generating RED metrics (Rate, Errors, Duration) from spans via a spanmetrics connector that exits a traces pipeline and re-enters a separate metrics pipeline.

Mermaid: Canonical pipeline anatomy (Figure 10.0)

flowchart LR R1[OTLP receiver] --> P1[memory_limiter] R2[Prometheus receiver] --> P1 P1 --> P2[k8sattributes / resource] P2 --> P3[filter / tail_sampling] P3 --> P4[transform OTTL] P4 --> P5[batch] P5 --> E1[OTLP exporter] P5 --> E2[prometheusremotewrite] P5 --> E3[debug]

A minimal three-signal config

service:
  extensions: [health_check, pprof, zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [loki]

A component defined in receivers, processors, or exporters but not referenced under service.pipelines is silently ignored. This is the most common source of "my config does nothing" surprises.

Processor order is decisive

Processors run in the order listed. This is one of the most consequential, and most commonly overlooked, properties of Collector configuration:

  1. memory_limiter always first — back-pressure kicks in before later, more expensive processors waste CPU
  2. Enrichment processors next (e.g., k8sattributes, resource) so downstream filters see full context
  3. Filter / sampling next — drop unwanted data before transforms touch it
  4. transform / scrubbing — reshape what is left
  5. batch last — coalesce into large outbound batches just before the exporter
Figure A — Collector pipeline conveyor belt (data packet journey)
Telemetry flows left to right through ordered processors receiver otlp memory_limiter first — gates back-pressure k8sattributes enrich pod metadata transform OTTL scrub PII tail_sampling drops low-value traces batch coalesce exporter otlp/tempo remotewrite Order matters — what each stage is doing in this animation: 1. receiver accepts span — 2. memory_limiter checks heap, accepts — 3. k8sattributes adds pod name/namespace 4. transform redacts user.email — 5. tail_sampling waits for full trace and may drop (red packet) — 6. batch — 7. exporter Red packet: trace failed policies (no errors, fast latency, not premium tenant) — dropped at tail_sampling Blue packets: matched a keep policy or hit the 1% probabilistic backstop — flow on to the backend

Section 1 Takeaway

Section 2: Key Processors

memory_limiter + batch — the mandatory pair

memory_limiter samples Collector memory on a fixed interval and, when usage crosses configured thresholds, refuses new data by returning errors to receivers. That refusal is what creates back-pressure: upstream senders see failures, retry, and slow down — instead of the Collector dying from an out-of-memory kill.

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 800       # ~80% of container memory limit
    spike_limit_mib: 200 # tolerance for short bursts
  batch:
    timeout: 5s
    send_batch_size: 512
    send_batch_max_size: 4096

Analogy: batch is a hotel shuttle that waits up to five minutes (or until full) before driving to the airport — far more efficient than calling a taxi for every guest. memory_limiter is the bouncer at the lobby door who turns guests away when the lobby is full, so the building never collapses.

transform with OTTL

For richer mutations — conditional logic, regex substitution, cross-field arithmetic — reach for the transform processor, which uses the OpenTelemetry Transformation Language (OTTL). OTTL statements look like set(target, value) where <boolean> and run inside a context (span, metric, datapoint, log, resource, or scope).

processors:
  transform:
    error_mode: ignore
    trace_statements:
      - context: span
        statements:
          # Collapse user IDs in URL paths so cardinality stays bounded
          - replace_pattern(attributes["http.target"], "/users/[0-9]+", "/users/:id") where attributes["http.target"] != nil
          # Remove PII before exporting
          - delete_key(attributes, "user.email")
          - delete_key(attributes, "user.id")
          # Whitelist what is allowed to leave
          - keep_keys(attributes, ["http.method", "http.target", "http.status_code", "service.name"])
          # Mark anything from the checkout service
          - set(attributes["env"], "prod") where resource.attributes["service.name"] == "checkout-service"

error_mode: ignore matters: the default in some versions is propagate, which can fail an entire batch when a single statement errors. A close cousin, filter, uses OTTL conditions to drop data outright (e.g., dropping /healthz spans).

tail_sampling vs probabilistic_sampler

The probabilistic_sampler is cheap and stateless — it picks (say) 5% based on trace ID, but it cannot prefer error or slow traces. The tail_sampling processor is fundamentally different: it buffers all spans for a trace, keyed by trace ID, and decides keep/drop only after decision_wait seconds or all spans have arrived. Because it sees the whole trace, it can sample on end-to-end latency, final status, or attributes that appear only on a leaf span.

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 50000
    expected_new_traces_per_sec: 2000
    policies:
      - name: main
        type: composite
        composite:
          max_total_spans_per_second: 1000
          policy_order: [error-traces, slow-traces, premium-tenants, baseline]
          sub_policies:
            error-traces:    { type: status_code, status_code: { status_codes: [ERROR] } }
            slow-traces:     { type: latency,      latency:      { threshold_ms: 4000 } }
            premium-tenants: { type: string_attribute, string_attribute: { key: tenant.tier, values: ["gold","platinum"] } }
            baseline:        { type: probabilistic, probabilistic: { sampling_percentage: 1.0 } }

Crucial gotcha: tail sampling only works if SDKs export spans unsampled (always_on or parentbased(always_on)). If the SDK already dropped spans, no policy can resurrect them.

Mermaid: Tail sampling decision flow (Figure 10.2)

sequenceDiagram participant SDK as SDK always_on participant Col as Collector tail_sampling participant Buf as Trace buffer participant Pol as Policy evaluator participant BE as Backend Tempo SDK->>Col: Span A trace T1 root Col->>Buf: Buffer T1 spans SDK->>Col: Span B trace T1 child Col->>Buf: Buffer T1 spans SDK->>Col: Span C trace T1 error Col->>Buf: Buffer T1 spans Note over Col,Buf: Wait decision_wait 10s Col->>Pol: Evaluate composite policies Pol->>Pol: error-traces matches? YES Pol-->>Col: KEEP trace T1 Col->>BE: Export all T1 spans

Head vs tail sampling at a glance

AspectHead / parent-based samplertail_sampling processor
Decision pointRoot span startAfter decision_wait in Collector
Sees full traceNoYes
Can prefer errors / slow tracesNoYes
SDK overheadLow (drops at source)High (must export everything)
Collector memory & CPUMinimalSubstantial (buffers spans)

k8sattributes — Kubernetes enrichment

The k8sattributes processor watches the Kubernetes API and decorates telemetry with metadata about the sending pod (namespace, deployment, node, labels). It identifies the sender either by inbound connection IP or by an explicit k8s.pod.ip resource attribute. Run it on the agent (DaemonSet), never on a central gateway — the gateway only sees the agent's IP, not the application pod's. Limit extract.metadata to fields you actually query on; each one multiplies cardinality and API-server load.

Figure B — Tail sampling buffers a trace, then evaluates policies
Trace T1 arrives span-by-span → buffered → policies evaluate after decision_wait SDK always_on Span A root /checkout Span B db.query Span C status=ERROR tail_sampling buffer keyed by trace_id = T1 T1 / A: root T1 / B: db.query T1 / C: status=ERROR decision_wait 10s policies (composite) error-traces — MATCH latency > 4s premium-tenants baseline 1% probabilistic Decision: KEEP T1 Tempo trace store kept Trace T2 — fast, no errors, no premium tag matches no policy → outside 1% bucket dropped before export SDK must use always_on — the Collector can't keep what the SDK never sent

Section 2 Takeaway

Section 3: Key Receivers and Exporters

Workhorse receivers

ReceiverWhat it ingestsTypical use
otlpOTLP gRPC and OTLP HTTP (traces, metrics, logs)Default for SDK and Collector-to-Collector traffic
prometheusScrapes Prometheus /metrics endpointsMigration from Prometheus; scraping exporters
hostmetricsOS-level CPU, memory, disk, network, filesystem, processNode-agent monitoring
filelogTails log files with multiline and parser supportContainer logs from /var/log/pods on the node
kafkaOTLP-encoded data from Kafka topicsDecoupling ingest from processing
jaeger, zipkinLegacy span formatsBrownfield environments mid-migration

A common DaemonSet receiver block:

receivers:
  otlp:
    protocols:
      grpc: { endpoint: 0.0.0.0:4317 }
      http: { endpoint: 0.0.0.0:4318 }
  hostmetrics:
    collection_interval: 30s
    scrapers: { cpu: {}, memory: {}, disk: {}, filesystem: {}, network: {}, load: {} }
  filelog:
    include: ["/var/log/pods/*/*/*.log"]
    start_at: end
    operators:
      - type: container

The prometheus receiver is worth a special mention: it accepts native Prometheus scrape config, so an existing prometheus.yml can be lifted into the Collector almost verbatim — a powerful migration path.

Workhorse exporters

ExporterDestinationNotes
otlpAny OTLP-compatible backend (Tempo, Jaeger, vendors)Default; gRPC and HTTP
prometheusremotewritePrometheus, Mimir, Cortex, ThanosMetrics fan-out into Prometheus ecosystem
lokiGrafana LokiLogs only; attribute-to-label mapping configurable
debugCollector stdoutReplaces the older logging exporter
kafkaKafka topic, OTLP-encodedPairs with the kafka receiver
fileLocal file (JSON)Disaster-recovery sink, offline replay
exporters:
  otlp/tempo:
    endpoint: tempo-distributor.observability:4317
    tls: { insecure: true }
    sending_queue: { enabled: true, num_consumers: 10, queue_size: 2000 }
    retry_on_failure: { enabled: true, initial_interval: 5s, max_interval: 60s, max_elapsed_time: 0 }
  prometheusremotewrite:
    endpoint: http://mimir.observability:8080/api/v1/push
    resource_to_telemetry_conversion: { enabled: true }
  loki:
    endpoint: http://loki-gateway/loki/api/v1/push
  debug:
    verbosity: basic

resource_to_telemetry_conversion: true on prometheusremotewrite promotes OTLP resource attributes (like service.name, k8s.pod.name) into Prometheus labels so they become queryable in PromQL.

Connectors — the inter-pipeline glue

A connector behaves as an exporter on one pipeline and a receiver on another. The canonical example is spanmetrics: it consumes spans and emits aggregated RED metrics — derived signals produced once, near the source, rather than re-derived in each backend.

connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s]
    dimensions:
      - name: http.method
      - name: http.status_code

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [spanmetrics, otlp/tempo]      # spanmetrics is an EXPORTER here
    metrics/spans:
      receivers: [spanmetrics]                  # ... and a RECEIVER here
      processors: [batch]
      exporters: [prometheusremotewrite]

Other useful connectors include routing (split traffic by attribute) and forward (chain pipelines together).

Mermaid: Multi-signal pipeline topology (Figure 10.1)

graph TD subgraph Extensions HC[health_check] PP[pprof] ZP[zpages] end subgraph traces_pipeline[traces pipeline] TR[otlp receiver] --> TML[memory_limiter] TML --> TK8S[k8sattributes] TK8S --> TB[batch] TB --> TE[otlp/tempo] end subgraph metrics_pipeline[metrics pipeline] MR[otlp receiver] --> MML[memory_limiter] MML --> MK8S[k8sattributes] MK8S --> MB[batch] MB --> ME[prometheusremotewrite] end subgraph logs_pipeline[logs pipeline] LR[otlp receiver] --> LML[memory_limiter] LML --> LK8S[k8sattributes] LK8S --> LB[batch] LB --> LE[loki] end

Section 3 Takeaway

Section 4: Reliability and Operations

Persistent queue and retry on failure

Most exporters support a sending_queue and retry_on_failure. By default the sending queue is in memory: fast, but lost on restart. Pairing it with the file_storage extension makes the queue durable, so an OOM kill or rolling deployment doesn't drop telemetry that has already been accepted from upstream.

extensions:
  file_storage:
    directory: /var/lib/otelcol/storage
    timeout: 1s

exporters:
  otlp/tempo:
    endpoint: tempo-distributor.observability:4317
    sending_queue:
      enabled: true
      storage: file_storage   # makes the queue durable across restarts
      num_consumers: 10
      queue_size: 2000
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 60s
      max_elapsed_time: 0     # retry forever

Sizing tips:

Mermaid: memory_limiter, batch, queue, retry interplay (Figure 10.4)

flowchart LR IN[Incoming spans / metrics / logs] --> ML{memory_limiter
over threshold?} ML -- "yes: refuse" --> REJ[Return error
upstream backs off] ML -- "no: accept" --> BAT[batch] BAT --> SQ[(sending_queue
file_storage backed)] SQ --> EXP[Exporter consumer pool
num_consumers] EXP -- success --> BE[Backend] EXP -- failure --> RET{retry_on_failure
exponential backoff} RET -- retry --> SQ RET -- queue full --> DROP[Drop oldest
otelcol_exporter_send_failed]

Extensions for operability

ExtensionEndpointUse
health_check:13133/Kubernetes liveness / readiness probes
pprof:1777/debug/pprof/CPU and heap profiling under load
zpages:55679/debug/Live in-process view: pipelines, exporter queues, recent spans
file_storagefilesystemBacking store for persistent queues
extensions:
  health_check: { endpoint: 0.0.0.0:13133 }
  pprof:        { endpoint: 0.0.0.0:1777 }
  zpages:       { endpoint: 0.0.0.0:55679 }

# pod spec:
livenessProbe:  { httpGet: { path: /, port: 13133 }, initialDelaySeconds: 10 }
readinessProbe: { httpGet: { path: /, port: 13133 }, periodSeconds: 5 }

zpages is especially useful during incidents — per-component counters and sampled recent traces come directly from the Collector's process, so you can answer "is data flowing? is anything dropping?" without leaving the cluster.

Mermaid: Two-tier agent + gateway topology (Figure 10.3)

flowchart LR P1[App pod] --> A1[Agent Collector
DaemonSet
memory_limiter
k8sattributes
light batch] P2[App pod] --> A1 P3[App pod] --> A2[Agent Collector] A1 --> GW[Gateway Collector
Deployment + HPA
tail_sampling
transform
persistent queue] A2 --> GW GW --> TEMPO[(Tempo)] GW --> MIMIR[(Mimir)] GW --> LOKI[(Loki)]

Sizing, throughput, and memory tuning

RoleCPU reqCPU limitMem reqMem limit
Agent (DaemonSet)100-250m500-750m256-512 Mi512 Mi-1 Gi
Gateway (Deployment)500m-1 vCPU2-4 vCPU1-2 Gi2-4 Gi

The memory_limiter should target 70-80% of the container memory limit, with spike_limit_mib covering the largest plausible batch. Agents are not HPA-scaled (they scale with node count via DaemonSet); the gateway runs an HPA on CPU at 60-70% target utilization with minReplicas: 2 for graceful scaling and HA.

Tail-sampling capacity follows a simple rule of thumb: num_traces ≥ expected_new_traces_per_sec × decision_wait × 2. At 2,000 traces/sec and a 10s decision_wait, that is num_traces: 40000 minimum — round up to 50,000 for headroom.

Monitor the Collector with… itself: every Collector exposes its internal metrics on port 8888. Alert on:

Figure C — Back-pressure: exporter stalls, memory_limiter refuses, recovery
Backend slows → memory fills → memory_limiter refuses upstream → recovery Upstream SDKs / agents batch △ batch ◯ batch □ × memory_limiter check_interval 1s heap limit_mib batch timeout 5s sending_queue file_storage queue 70% retry on failure exporter otlp/tempo Tempo backend Backend slow → queue grows → heap rises → memory_limiter refuses upstream (back-pressure) Backend recovers → queue drains → heap settles → limiter accepts again Without memory_limiter the Collector would OOM-kill; with it, errors propagate up so SDKs retry/back off.

Section 4 Takeaway

Post-Study Assessment

1. Which Collector component is the contract that actually wires receivers, processors, and exporters into runnable pipelines?

The receivers top-level block
The service.pipelines block
The extensions block
The connectors block

2. Which processor should always be listed first in a Collector pipeline?

batch
transform
memory_limiter
k8sattributes

3. What is a connector in the OpenTelemetry Collector?

An extension that exposes Prometheus metrics on port 8888
A processor that drops spans matching a predicate
A hybrid component that acts as an exporter on one pipeline and a receiver on another
A YAML anchor used to share config across pipelines

4. Why is the batch processor recommended in essentially every production pipeline?

It enriches spans with Kubernetes pod metadata
It groups telemetry into larger payloads, dramatically reducing per-call overhead at the exporter
It applies tail-sampling policies based on latency and error status
It encrypts data in flight to the backend

5. For tail_sampling policies to actually have spans to evaluate, the SDK must export traces using which sampler?

An aggressive traceidratiobased(0.01) head sampler
always_off — let the Collector decide everything
always_on (or parentbased(always_on)) so unsampled spans reach the Collector
No sampler is needed; tail sampling reconstructs dropped spans

6. Which receiver is most commonly used to ingest container logs from /var/log/pods on a Kubernetes node?

otlp
prometheus
hostmetrics
filelog

7. Which exporter is the standard choice for fanning metrics out into the Prometheus / Mimir / Cortex / Thanos ecosystem?

loki
otlp
prometheusremotewrite
debug

8. What is the role of the file_storage extension on a gateway Collector?

It stores Collector binary releases for rolling upgrades
It backs the exporter sending_queue with disk so accepted telemetry survives restarts
It writes a debug log of every span to a flat file
It mounts a ConfigMap into the Collector pod

9. Which extension exposes a live, in-process view of pipelines and exporter queues for incident triage?

health_check
pprof
zpages
file_storage

10. In the recommended two-tier topology, where does tail_sampling belong?

On the DaemonSet agent, close to each application pod
In the SDK, before spans ever leave the application
On the centralized gateway Deployment, where it can buffer whole traces
On the backend (Tempo), not in the Collector

Your Progress

Answer Explanations