Your platform handles 50,000 requests per second and tracing every one of them is blowing up the observability bill. How do you approach sampling, and what is the tradeoff between head and tail sampling?

Question

Accepted Answer

First decide what questions the traces need to answer, because that drives the strategy. Head sampling decides at the root of the trace, before anything happens: a parent-based ratio sampler keeps, say, 1% of traces, and the decision rides along in the traceparent sampled flag so every downstream service agrees. It is cheap and simple, but it is blind: the slow request and the 500 error are dropped with the same probability as the boring 200. Tail sampling flips that. Every span is exported, buffered in the Collector gateway, and the keep-or-drop decision happens after the trace completes, with full knowledge: keep 100% of traces with errors, 100% of traces slower than two seconds, and 1% of everything else as a healthy baseline. That is exactly what you want for debugging, but you pay for it: the gateway buffers every in-flight trace in memory, and all spans of a given trace must land on the same Collector instance, which means a load-balancing tier that routes by trace ID in front of your sampling tier. At 50k rps that buffer is real memory and the loadbalancing exporter is not optional. In practice I combine them: light head sampling or rate limiting at the SDK to take the edge off volume, tail sampling at the gateway for the keep-the-interesting-ones logic. Two things people forget: any metrics derived from spans, like RED dashboards from the spanmetrics connector, must be generated before sampling or your latency percentiles are fiction, and you should write down that trace-based alerting now sees sampled data, so absence of a trace no longer proves absence of a problem.

Sampling Strategies at Scale

Sample answer

Why this matters

Code examples

Common mistakes to avoid

Likely follow-ups

More Observability interview questions

Also worth your time on this topic

Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization

Traces and Spans Explained

Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization