Skip to main content

Traces and Spans Explained

A request hits your API gateway, which calls two backend services, and one of those queries a database. Walk me through what that looks like as a distributed trace. What is a span, and how do spans connect to each other?

junior
beginner
Observability
Question

A request hits your API gateway, which calls two backend services, and one of those queries a database. Walk me through what that looks like as a distributed trace. What is a span, and how do spans connect to each other?

Answer

The whole request is one trace, identified by a single trace ID that follows the request everywhere it goes. Each unit of work along the way is a span: one span for the gateway handling the request, one for each backend call, and one for the database query. Every span records a name, start and end timestamps, a status, and key-value attributes like http.method or db.statement. Spans link together through parent-child relationships: the gateway span is the root, the two backend spans are its children, and the database span is a child of the backend that made the query. The child knows its parent because the parent's span ID travels with the request. When a backend visualizes this, you get a waterfall view showing exactly where the 800ms went: maybe 50ms in the gateway, 100ms in service A, and 650ms stuck in that database query. That is the core value: instead of grepping logs across four services and guessing, you see the full request path and its timing in one place.

Why This Matters

This is the entry-level question for any observability conversation. You are checking whether the candidate has the mental model right: trace as the whole journey, span as one hop, parent-child links forming a tree. Strong candidates explain it through a concrete scenario and mention what spans actually contain (attributes, status, timestamps). Weak candidates recite 'tracing tracks requests' without being able to describe how two spans end up connected. If they bring up the waterfall view and what it tells you about latency, that is a good sign they have used tracing, not just read about it.

Code Examples

Creating a parent and child span manually

python

Simplified span as exported via OTLP

json
Common Mistakes
  • Confusing traces with logs, describing tracing as 'centralized logging with request IDs' and missing the timing and hierarchy aspects
  • Saying spans connect through the trace ID alone, when the parent-child link comes from the parent span ID
  • Not being able to name a single thing a span contains beyond 'the request'
Follow-up Questions
Interviewers often ask these as follow-up questions
  • How does the trace ID get from service A to service B in the first place?
  • What is the difference between a span attribute and a span event, and when would you use each?
  • If a service publishes to a queue and a consumer picks it up an hour later, should that be one trace or two?
Tags
opentelemetry
distributed-tracing
observability
spans
Sponsored
Carbon Ads

More Observability interview questions

Also worth your time on this topic