Traces and Spans Explained
A request hits your API gateway, which calls two backend services, and one of those queries a database. Walk me through what that looks like as a distributed trace. What is a span, and how do spans connect to each other?
A request hits your API gateway, which calls two backend services, and one of those queries a database. Walk me through what that looks like as a distributed trace. What is a span, and how do spans connect to each other?
The whole request is one trace, identified by a single trace ID that follows the request everywhere it goes. Each unit of work along the way is a span: one span for the gateway handling the request, one for each backend call, and one for the database query. Every span records a name, start and end timestamps, a status, and key-value attributes like http.method or db.statement. Spans link together through parent-child relationships: the gateway span is the root, the two backend spans are its children, and the database span is a child of the backend that made the query. The child knows its parent because the parent's span ID travels with the request. When a backend visualizes this, you get a waterfall view showing exactly where the 800ms went: maybe 50ms in the gateway, 100ms in service A, and 650ms stuck in that database query. That is the core value: instead of grepping logs across four services and guessing, you see the full request path and its timing in one place.
This is the entry-level question for any observability conversation. You are checking whether the candidate has the mental model right: trace as the whole journey, span as one hop, parent-child links forming a tree. Strong candidates explain it through a concrete scenario and mention what spans actually contain (attributes, status, timestamps). Weak candidates recite 'tracing tracks requests' without being able to describe how two spans end up connected. If they bring up the waterfall view and what it tells you about latency, that is a good sign they have used tracing, not just read about it.
Creating a parent and child span manually
Simplified span as exported via OTLP
- Confusing traces with logs, describing tracing as 'centralized logging with request IDs' and missing the timing and hierarchy aspects
- Saying spans connect through the trace ID alone, when the parent-child link comes from the parent span ID
- Not being able to name a single thing a span contains beyond 'the request'
- How does the trace ID get from service A to service B in the first place?
- What is the difference between a span attribute and a span event, and when would you use each?
- If a service publishes to a queue and a consumer picks it up an hour later, should that be one trace or two?
More Observability interview questions
Also worth your time on this topic
Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization
A practical checklist for adding OpenTelemetry tracing to your services, shipping spans through the Collector, and turning that data into something you can actually debug with.
90-150 minutes
Auto vs Manual Instrumentation
You need to roll out tracing across 40 services owned by six different teams. Do you go with auto-instrumentation or manual instrumentation, and how do you decide?
mid
Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization
A walkthrough of instrumenting a real service with OpenTelemetry, running the Collector, and finding the slow span in Jaeger when a request hops across five microservices.