Traces and Spans Explained

A request hits your API gateway, which calls two backend services, and one of those queries a database. Walk me through what that looks like as a distributed trace. What is a span, and how do spans connect to each other?

Junior questions

Observabilitybeginner

// interview question

Sample answer

The whole request is one trace, identified by a single trace ID that follows the request everywhere it goes. Each unit of work along the way is a span: one span for the gateway handling the request, one for each backend call, and one for the database query. Every span records a name, start and end timestamps, a status, and key-value attributes like http.method or db.statement. Spans link together through parent-child relationships: the gateway span is the root, the two backend spans are its children, and the database span is a child of the backend that made the query. The child knows its parent because the parent's span ID travels with the request. When a backend visualizes this, you get a waterfall view showing exactly where the 800ms went: maybe 50ms in the gateway, 100ms in service A, and 650ms stuck in that database query. That is the core value: instead of grepping logs across four services and guessing, you see the full request path and its timing in one place.

Why this matters

This is the entry-level question for any observability conversation. You are checking whether the candidate has the mental model right: trace as the whole journey, span as one hop, parent-child links forming a tree. Strong candidates explain it through a concrete scenario and mention what spans actually contain (attributes, status, timestamps). Weak candidates recite 'tracing tracks requests' without being able to describe how two spans end up connected. If they bring up the waterfall view and what it tells you about latency, that is a good sign they have used tracing, not just read about it.

Code examples

Creating a parent and child span manually

python

from opentelemetry import trace

tracer = trace.get_tracer("checkout-service")

with tracer.start_as_current_span("process_order") as parent:
    parent.set_attribute("order.id", order_id)
    parent.set_attribute("order.items", len(items))

    # Child span is linked automatically via the active context
    with tracer.start_as_current_span("charge_card") as child:
        child.set_attribute("payment.provider", "stripe")
        result = charge(order_id)
        if not result.ok:
            child.set_status(trace.StatusCode.ERROR, result.error)

Simplified span as exported via OTLP

json

{
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
  "spanId": "00f067aa0ba902b7",
  "parentSpanId": "b7ad6b7169203331",
  "name": "SELECT orders",
  "kind": "SPAN_KIND_CLIENT",
  "startTimeUnixNano": "1717500000000000000",
  "endTimeUnixNano": "1717500000650000000",
  "attributes": {
    "db.system": "postgresql",
    "db.statement": "SELECT * FROM orders WHERE id = $1"
  },
  "status": { "code": "STATUS_CODE_OK" }
}

Common mistakes to avoid

Confusing traces with logs, describing tracing as 'centralized logging with request IDs' and missing the timing and hierarchy aspects
Saying spans connect through the trace ID alone, when the parent-child link comes from the parent span ID
Not being able to name a single thing a span contains beyond 'the request'

Likely follow-ups

How does the trace ID get from service A to service B in the first place?
What is the difference between a span attribute and a span event, and when would you use each?
If a service publishes to a queue and a consumer picks it up an hour later, should that be one trace or two?

Answer out loud first, then check yourself against the model answer.

Practice all Junior questions More Observability questions

#opentelemetry#distributed-tracing#observability#spans

Also worth your time on this topic

Checklist

Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization

A practical checklist for adding OpenTelemetry tracing to your services, shipping spans through the Collector, and turning that data into something you can actually debug with.

90-150 minutes

Interview

Auto vs Manual Instrumentation

You need to roll out tracing across 40 services owned by six different teams. Do you go with auto-instrumentation or manual instrumentation, and how do you decide?

mid

Article

Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization

A walkthrough of instrumenting a real service with OpenTelemetry, running the Collector, and finding the slow span in Jaeger when a request hops across five microservices.

Traces and Spans Explained

More Observability interview questions

Also worth your time on this topic

Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization

Auto vs Manual Instrumentation

Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization