Study Guide: Chapter 6 — Instrumentation: Manual, Automatic, and Zero-Code

Instrumentation is the act of teaching your code to talk about itself. OpenTelemetry recognizes three strategies for producing signals: manual (developers explicitly emit spans, metrics, and logs), automatic (libraries are patched at runtime), and zero-code (an external observer — eBPF in the kernel or a Kubernetes Operator injecting agents — generates telemetry with no application awareness). Think of them as a memoir, a transcriptionist, and a hidden ceiling microphone: each captures the story differently; each has a place.

Part 1 Pre-Quiz — Sections 1 & 2

Answer first, then read. You'll re-answer the same questions after the reading to measure improvement.

Pre-Reading Check — Part 1

1. Which instrumentation approach is the only one that can reliably capture a tenant.id or payment.outcome attribute?

Automatic instrumentation Manual instrumentation eBPF zero-code instrumentation The OpenTelemetry Collector

2. You need to count active WebSocket connections (which go up and down). Which OpenTelemetry instrument fits?

Counter Histogram UpDownCounter Observable Gauge from a synchronous callback

3. How does the Java OpenTelemetry agent inject instrumentation into your app?

It monkey-patches imports at runtime like Python. It registers a premain via the Java Instrumentation API and rewrites class bytes at load time. It runs as a sidecar process and proxies network calls. It uses eBPF kprobes in the Linux kernel.

4. A Python service is auto-instrumented but produces no spans. The most common root cause is…

The Collector is using gRPC and the agent uses HTTP. The library was imported before opentelemetry-instrument patched it. The JVM was started without the -javaagent flag. Python doesn't support OpenTelemetry.

5. Which environment variable convention is shared across Java, Python, and Node.js auto-instrumentation?

Each language uses a unique prefix — JAVA_OTEL_, PY_OTEL_, NODE_OTEL_. All three accept the same OTEL_* variables: OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, etc. Only Java honors environment variables; the others require code config. There is no shared contract — each is configured by SDK API only.

Section 1: Manual Instrumentation

Manual instrumentation puts the developer in direct control. You acquire a Tracer, Meter, or Logger from the SDK and explicitly start spans, record measurements, or write structured log events. It is the only way to express domain context: concepts like order_id, tenant_id, payment.status, and feature_flag.variant that no auto-instrumenter could ever guess.

Acquiring Tracers and Meters

The SDK exposes TracerProvider, MeterProvider, and LoggerProvider. From each you obtain a named, versioned instance scoped to your module — conventionally the package import path. The name becomes instrumentation.scope.name on every signal, letting backends filter by which code produced the data.

// Java
Tracer tracer = GlobalOpenTelemetry.getTracer("com.acme.payments", "1.4.0");
Meter  meter  = GlobalOpenTelemetry.getMeter("com.acme.payments");

# Python
tracer = trace.get_tracer("acme.payments", "1.4.0")
meter  = metrics.get_meter("acme.payments")

// Node.js
const tracer = trace.getTracer('acme-payments', '1.4.0');
const meter  = metrics.getMeter('acme-payments');

Creating Spans and Recording Attributes

A span is a named, timed operation with attributes, events, and a status. The idiomatic pattern wraps a unit of work so the span closes even on exceptions:

# Python: idiomatic context-manager span
with tracer.start_as_current_span("authorize_payment") as span:
    span.set_attribute("payment.method", "card")
    span.set_attribute("tenant.id", tenant_id)
    try:
        approved = gateway.authorize(amount)
        span.set_attribute("payment.outcome",
                           "approved" if approved else "declined")
    except Exception as exc:
        span.record_exception(exc)
        span.set_status(trace.StatusCode.ERROR, str(exc))
        raise

Use dot-namespaced keys (payment.method, not paymentMethod); follow semantic conventions where one exists; treat your custom namespace (acme.*) like a public API — once a dashboard depends on it, you cannot freely rename it.

Picking the Right Metric Instrument

Instrument	Direction	Aggregation	Typical Use
`Counter`	Monotonic up	Sum	Total requests, errors, bytes sent
`UpDownCounter`	Up or down	Sum	Active connections, queue depth, pool size
`Histogram`	Observations	Bucketed distribution	Request latency, payload size
`Gauge` (observable)	Sampled	Last value	CPU utilization, current temperature

Three rules: declare units (s, By, 1); let the SDK pick histogram buckets unless you really know the latency profile; remember that every attribute multiplies cardinality — an idea we revisit in Section 4.

Key Points — Section 1

Manual is the only path to business-meaningful telemetry. No auto-instrumenter can invent tenant.id, order.id, or feature_flag.variant.
Acquire named, versioned tracers and meters once per module — the name becomes the instrumentation.scope.
Wrap units of work in context-managed spans so they close on exception; always set status and record exceptions.
Match the instrument to the semantics: Counter (monotonic), UpDownCounter (ebb & flow), Histogram (distribution), Gauge (current value).
Set units, choose attribute keys carefully, and recognize that each label dimension multiplies the time-series count.

Section 2: Automatic Instrumentation

Auto-instrumentation answers "How do I get traces from libraries I did not write?" The mechanism differs by runtime because each language exposes different hooks.

Bytecode Injection: Java and .NET

The Java agent attaches via the -javaagent flag, registers a premain with the Instrumentation API, and uses ByteBuddy to rewrite classes as the classloader loads them:

OTEL_SERVICE_NAME=checkout-service \
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 \
OTEL_TRACES_EXPORTER=otlp \
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod,service.version=2.3.1 \
java -javaagent:/opt/otel/opentelemetry-javaagent.jar -jar /app/checkout.jar

The agent ships with modules for Servlet, Spring MVC/WebFlux, JAX-RS, gRPC, OkHttp, JDBC, R2DBC, Hibernate, Mongo, Cassandra, Kafka, RabbitMQ, JMS, and more. Because rewriting happens at class load, no source change is needed — but the agent must attach at JVM start and exotic classloaders sometimes need extra config. .NET uses a conceptually similar mechanism via a CLR profiler activated by CORECLR_ENABLE_PROFILING=1.

Monkey-Patching: Python and Node.js

Dynamic languages let you replace functions at runtime. Python's opentelemetry-instrument CLI bootstraps every installed opentelemetry-instrumentation-* package, which monkey-patches its target library at import:

pip install opentelemetry-distro opentelemetry-exporter-otlp \
            opentelemetry-instrumentation-requests \
            opentelemetry-instrumentation-psycopg2 \
            opentelemetry-instrumentation-flask

OTEL_SERVICE_NAME=orders-api \
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 \
opentelemetry-instrument gunicorn orders.wsgi:application

The requests instrumentation replaces Session.request with a wrapper that opens a client span, records HTTP attributes, calls the original, captures the response, and ends the span. It must run before the first import of the patched library; otherwise the cached reference is the unpatched one. Node.js relies on a require hook via require-in-the-middle:

// tracing.js  — must be required first
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({}),
  instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();

// run: NODE_OPTIONS="--require ./tracing.js" node app.js

Java Agent Lifecycle (Figure 6.2)

Cross-Language Comparison

Aspect	Java	Python	Node.js
Primary mechanism	`-javaagent` bytecode rewrite	Monkey-patching at import	`require` hook + export patching
Entry point	JVM flag	`opentelemetry-instrument` CLI	`NODE_OPTIONS=--require`
Code changes	None	None	One bootstrap file
Context propagation	Thread-locals + executor wrappers	`contextvars` + async wrappers	Per-library async hooks
Common pitfall	Custom classloaders	Import order before patch	Bundlers/serverless hide `require`

A shared environment-variable contract spans every language: OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_TRACES_EXPORTER, OTEL_METRICS_EXPORTER, OTEL_LOGS_EXPORTER, OTEL_EXPORTER_OTLP_PROTOCOL, OTEL_TRACES_SAMPLER, OTEL_PROPAGATORS, OTEL_RESOURCE_ATTRIBUTES. One ConfigMap, one vocabulary, every workload.

Debugging axioms: No traces at all usually means an exporter is set to none, the protocol is mismatched (grpc vs. http/protobuf), or the bootstrap loaded too late. Duplicate spans almost always mean a library is being captured by both auto- and manual instrumentation — disable one.

Part 1 Post-Quiz — Sections 1 & 2

Same questions, now with context. Don't peek — the explanations reveal at the end.

Post-Reading Check — Part 1

1. Which instrumentation approach is the only one that can reliably capture a tenant.id or payment.outcome attribute?

Automatic instrumentation Manual instrumentation eBPF zero-code instrumentation The OpenTelemetry Collector

2. You need to count active WebSocket connections (which go up and down). Which OpenTelemetry instrument fits?

Counter Histogram UpDownCounter Observable Gauge from a synchronous callback

3. How does the Java OpenTelemetry agent inject instrumentation into your app?

4. A Python service is auto-instrumented but produces no spans. The most common root cause is…

5. Which environment variable convention is shared across Java, Python, and Node.js auto-instrumentation?

Part 2 Pre-Quiz — Sections 3 & 4

Pre-Reading Check — Part 2

6. What does the OpenTelemetry Operator's mutating webhook do when a pod is annotated with instrumentation.opentelemetry.io/inject-java?

It rebuilds the application container image with the agent baked in. It injects an init container that copies the Java agent to a shared volume and sets JAVA_TOOL_OPTIONS=-javaagent:... plus OTEL_* env vars. It loads an eBPF program into the host kernel to trace the pod. It restarts the kubelet so it picks up the agent.

7. eBPF zero-code instrumentation is great at HTTP/gRPC coverage but loses in which scenario?

When the workload is written in Go. When you need to attribute requests to tenant=acme-corp and order=ORD-9182. When the workload runs on Linux. When tracing a Python web service.

8. Why are semantic conventions important?

They let the Collector compress signals more efficiently. A dashboard written against http.response.status_code works regardless of whether the signal came from a Java agent, a Python monkey-patch, or an eBPF probe. They are required to use the OTLP wire protocol. They reduce span size on the wire by half.

9. Which attribute is safe to use as a metric label dimension on an HTTP latency histogram?

user.id (one per customer) url.full including query string http.route (templated, e.g. /orders/{id}) trace.id

10. Where do attributes like service.name, service.version, and k8s.pod.name belong?

On every span as ordinary attributes — repeat them per call. In the OTLP Resource, set once at SDK startup; they describe the emitter of every signal. Only in trace exporters — metrics do not carry resource identity. In the Collector config — SDKs cannot set them.

Section 3: Zero-Code Instrumentation

"Zero-code" goes further than auto-instrumentation: the developer doesn't write tracing code and the build artifact isn't modified. Two distinct technologies live under this umbrella: eBPF agents that observe processes from the Linux kernel, and the OpenTelemetry Operator that injects auto-instrumentation agents into Kubernetes pods at admission time without changing container images.

eBPF-Based Auto-Instrumentation

eBPF lets you load safe, sandboxed bytecode into the Linux kernel and attach it to kernel events — syscalls, function entry/exit, tracepoints, network events — at runtime, without recompiling the kernel. An eBPF observability agent typically:

Attaches kprobes to network kernel functions (tcp_sendmsg, tcp_cleanup_rbuf, sys_enter_sendto/recvfrom) to observe every byte crossing TCP.
Attaches uprobes to user-space functions in shared libraries — SSL_read/SSL_write in libssl, Go's HTTP runtime, JVM JNI entries — to see data before encryption.
Writes structured records into eBPF maps drained at high frequency by a user-space agent.
Reconstructs requests — pairing sends/recvs, parsing HTTP headers, gRPC HTTP/2 framing — to produce L7 metrics and OTLP spans.

Because the hooks live in kernel and shared libraries, eBPF works for every language on the host — Go, Rust, Java, Python, Node, C++, even closed-source binaries — without touching their code. The output is typically the four golden signals plus distributed traces for common protocols.

Tool	Focus	Output
Grafana Beyla	Zero-code OTel auto-instrumentation for HTTP/gRPC/DB	OTLP traces + RED metrics
Pixie	K8s deep debugging, full request bodies, PxL scripts	In-cluster live data
Cilium Tetragon	Runtime security and policy enforcement	Process/file/network events; can block
Odigos	eBPF + SDK hybrid OTel platform	OTLP routed by policy

OpenTelemetry Operator and the `Instrumentation` CRD

For Kubernetes workloads, the Operator offers a different flavor of zero-code: the cluster itself injects the SDK auto-instrumentation agents we saw in Section 2, without changing your container images.

# 1. The Instrumentation CRD: a reusable recipe per language
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: default-instrumentation
  namespace: production
spec:
  exporter:
    endpoint: http://otel-collector.observability:4317
  propagators: [tracecontext, baggage]
  sampler:
    type: parentbased_traceidratio
    argument: "0.1"
  resource:
    attributes:
      deployment.environment: prod
      service.namespace: payments
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest

# 2. The Deployment opts in via pod annotations
metadata:
  annotations:
    instrumentation.opentelemetry.io/inject-java: "production/default-instrumentation"

When the annotated pod is created, the mutating webhook injects an init container that copies the agent JAR into a shared volume, then patches the application container with JAVA_TOOL_OPTIONS=-javaagent:/otel-auto-instrumentation/javaagent.jar and the appropriate OTEL_* environment variables. Python and Node.js use analogous mechanisms. Write the CRD once, label deployments by language, every new pod is born observable.

Limits of Zero-Code — Hybrid Is Production-Grade

Capability	Manual	Auto (SDK)	eBPF
HTTP/gRPC/DB calls	If coded	Yes, broad	Yes, broad
Business attrs (`order_id`)	Yes	No	No
Closed-source binaries	No	No	Yes
TLS-encrypted in-process	Yes	Yes	Only via `libssl` uprobes
Windows/macOS	Yes	Yes	Linux only
Rollout effort	High	Low	Very low (DaemonSet)
Privilege required	App identity	App identity	`CAP_SYS_ADMIN`/`CAP_BPF`

Hybrid wins. Run eBPF for horizontal, polyglot baseline coverage; use the Operator to inject SDK auto-instrumentation on every K8s pod; add manual instrumentation on the critical business flows where you need tenant_id, feature_flag, and payment.outcome.

Section 4: Semantic Conventions in Practice

Instrumentation nobody can query is expensive noise. Semantic conventions are OpenTelemetry's contract that names things the same way everywhere, so a dashboard against http.response.status_code works whether the data came from a Java agent, a Python monkey-patch, a Beyla eBPF probe, or your own manual code.

Common Stable Attributes

Domain	Attribute	Example Value	Use
HTTP	`http.request.method`	`GET`, `POST`	Method dim. on RED metrics
HTTP	`http.response.status_code`	`200`, `503`	Error rate, SLO burn
HTTP	`http.route`	`/orders/{id}`	Path grouping w/o ID explosion
HTTP	`server.address`	`api.acme.io`	Backend grouping
RPC	`rpc.system`	`grpc`	Filter by RPC family
DB	`db.system`	`postgresql`	Engine breakdown
DB	`db.operation`	`SELECT`	Latency by operation
Messaging	`messaging.system`	`kafka`	Broker breakdown
Messaging	`messaging.destination.name`	`orders.events`	Per-topic throughput

A query like "p95 HTTP latency by route and status, last 30 minutes" is just groupby(http.route, http.response.status_code) of histogram(http.server.request.duration) — same panel, across the fleet, across vendors.

Resource Attributes — Identity of the Emitter

Where attributes describe a single signal, resource attributes describe the emitter. Set them once at SDK init via OTEL_RESOURCE_ATTRIBUTES:

OTEL_SERVICE_NAME=checkout-api
OTEL_RESOURCE_ATTRIBUTES=\
service.namespace=payments,\
service.version=2.3.1,\
service.instance.id=checkout-api-7d4f-x9w2,\
deployment.environment=prod,\
k8s.namespace.name=production,\
k8s.deployment.name=checkout-api,\
k8s.pod.name=checkout-api-7d4f-x9w2,\
cloud.provider=aws,\
cloud.region=us-east-1

The OpenTelemetry Operator can fill many of these from the pod's downward API — you should rarely set K8s resource attributes by hand.

Operator Workflow (Figure 6.4)

Cardinality — The Silent Killer

Every unique combination of attribute values produces a distinct time series. Cardinality destroys pricing, retention, query speed, and cluster stability. Before attaching an attribute, ask "How many distinct values can this take?"

Candidate attribute	Cardinality	Use as metric label?
`http.request.method`	~10	Yes
`http.response.status_code`	~60	Yes
`http.route` (templated)	~hundreds	Yes
`service.version`	~tens	Yes
`tenant.id` (large SaaS)	~thousands+	Carefully — often spans only
`url.full` with raw path	~unbounded	No on metrics; redact on spans
`user.id`	~unbounded	Span attribute only
`trace.id` / `request.id`	per-request	Span only — never a metric label
`db.statement` raw	per-call	Span only; parameterize/redact

Three mitigations:

Templates, not raw values. Use http.route=/orders/{id} on metrics; keep url.full for span-only debugging.
Drop or hash at the Collector. attributes, transform, and redaction processors can drop, truncate, or one-way-hash before export.
Separate metric and span schemas. A span can carry tenant.id and order.id; the derived metric should only carry tenant.tier and payment.method. Spans are sampled; metrics aggregate forever.

These same mitigations double as PII hygiene: url.full, url.query, client.address, and db.statement may contain personal data — redact at the SDK, allow-list at the Collector, hash where you need cardinality without identity.

Stability and Evolution

Conventions evolve under Experimental → Stable → Deprecated. The migration from http.method to http.request.method is a real example: both names exist for a period, the new is preferred, Collector transform processors normalize older signals so dashboards survive. Anchor dashboards on Stable attributes; treat Experimental as opt-in.

Key Points — Section 4

Semantic conventions are what make OpenTelemetry portable — the same attribute keys appear from Java agents, Python monkey-patches, eBPF probes, and manual code.
Resource attributes describe the emitter (service.name, service.version, k8s.pod.name); set once via OTEL_RESOURCE_ATTRIBUTES.
Cardinality is multiplicative: user.id on a metric label can produce billions of series.
Use templated routes, the Collector's processors, and a separate span/metric schema to keep cardinality and PII in check.
Anchor on Stable attributes; the Collector's transform processor can normalize older signals during convention migrations.

Part 2 Post-Quiz — Sections 3 & 4

Post-Reading Check — Part 2

6. What does the OpenTelemetry Operator's mutating webhook do when a pod is annotated with instrumentation.opentelemetry.io/inject-java?

7. eBPF zero-code instrumentation is great at HTTP/gRPC coverage but loses in which scenario?

When the workload is written in Go. When you need to attribute requests to tenant=acme-corp and order=ORD-9182. When the workload runs on Linux. When tracing a Python web service.

8. Why are semantic conventions important?

9. Which attribute is safe to use as a metric label dimension on an HTTP latency histogram?

user.id (one per customer) url.full including query string http.route (templated, e.g. /orders/{id}) trace.id

10. Where do attributes like service.name, service.version, and k8s.pod.name belong?

Chapter 6 — Instrumentation: Manual, Automatic, and Zero-Code

Learning Objectives

Part 1 Pre-Quiz — Sections 1 & 2

Section 1: Manual Instrumentation

Acquiring Tracers and Meters

Creating Spans and Recording Attributes

Picking the Right Metric Instrument

Key Points — Section 1

Section 2: Automatic Instrumentation

Bytecode Injection: Java and .NET

Monkey-Patching: Python and Node.js

Java Agent Lifecycle (Figure 6.2)

Cross-Language Comparison

Key Points — Section 2

Part 1 Post-Quiz — Sections 1 & 2

Part 2 Pre-Quiz — Sections 3 & 4

Section 3: Zero-Code Instrumentation

eBPF-Based Auto-Instrumentation

OpenTelemetry Operator and the `Instrumentation` CRD

Limits of Zero-Code — Hybrid Is Production-Grade

Key Points — Section 3

Section 4: Semantic Conventions in Practice

Common Stable Attributes

Resource Attributes — Identity of the Emitter

Operator Workflow (Figure 6.4)

Cardinality — The Silent Killer

Stability and Evolution

Key Points — Section 4

Part 2 Post-Quiz — Sections 3 & 4

Your Progress

Answer Explanations

Chapter 6 — Instrumentation: Manual, Automatic, and Zero-Code

Learning Objectives

Part 1 Pre-Quiz — Sections 1 & 2

Section 1: Manual Instrumentation

Acquiring Tracers and Meters

Creating Spans and Recording Attributes

Picking the Right Metric Instrument

Key Points — Section 1

Section 2: Automatic Instrumentation

Bytecode Injection: Java and .NET

Monkey-Patching: Python and Node.js

Java Agent Lifecycle (Figure 6.2)

Cross-Language Comparison

Key Points — Section 2

Part 1 Post-Quiz — Sections 1 & 2

Part 2 Pre-Quiz — Sections 3 & 4

Section 3: Zero-Code Instrumentation

eBPF-Based Auto-Instrumentation

OpenTelemetry Operator and the Instrumentation CRD

Limits of Zero-Code — Hybrid Is Production-Grade

Key Points — Section 3

Section 4: Semantic Conventions in Practice

Common Stable Attributes

Resource Attributes — Identity of the Emitter

Operator Workflow (Figure 6.4)

Cardinality — The Silent Killer

Stability and Evolution

Key Points — Section 4

Part 2 Post-Quiz — Sections 3 & 4

Your Progress

Answer Explanations

OpenTelemetry Operator and the `Instrumentation` CRD