Auto vs Manual Instrumentation
You need to roll out tracing across 40 services owned by six different teams. Do you go with auto-instrumentation or manual instrumentation, and how do you decide?
You need to roll out tracing across 40 services owned by six different teams. Do you go with auto-instrumentation or manual instrumentation, and how do you decide?
Start with auto-instrumentation for breadth, then add manual spans where the business logic lives. Auto-instrumentation hooks into known libraries: HTTP servers and clients, database drivers, gRPC, queue clients. In Java you attach the agent jar with a JVM flag and get traces with zero code changes, in Python you wrap the process with opentelemetry-instrument. That gets all 40 services emitting spans in days, not quarters, and you immediately see the service map and cross-service latency. What auto-instrumentation cannot give you is domain context. It shows 'POST /api/checkout took 2s' but not which step of checkout was slow or which customer tier was affected. So the second phase is manual: teams add spans around meaningful operations and attach attributes like order.value or tenant.id to the spans that auto-instrumentation already created. Language matters too. Java and Python have strong runtime agents. Go has no agent in the traditional sense, so you instrument in code or look at eBPF-based approaches, which means your Go services need more team buy-in up front. The practical rollout: platform team ships a base image or init container with the agent preconfigured, exporting to a standard endpoint, and each team only touches code when they want richer spans. Forcing six teams to hand-instrument everything before seeing any value is how tracing rollouts die.
This question tests judgment, not API knowledge. The trap is treating it as either-or: strong candidates immediately say both, in sequence, and explain why breadth-first wins for adoption. You also learn whether they have dealt with organizational reality: getting six teams to do anything is harder than any technical step, so answers that include a platform-level rollout path (base images, agents injected at deploy time, sane defaults) signal real experience. Bonus points for knowing the per-language differences, especially that Go cannot just attach an agent.
Zero-code instrumentation for Java and Python services
Kubernetes Operator injecting auto-instrumentation via annotation
Enriching auto-created spans with business context
- Picking one approach for everything: all-manual stalls the rollout for months, all-auto leaves you with traces that lack any business meaning
- Assuming every language works like Java, then discovering the Go services emit nothing because nobody planned for in-code instrumentation
- Ignoring semantic conventions, so every team invents its own attribute names and cross-service queries become impossible
- How would you handle the Go services in that fleet, where attaching a runtime agent is not an option?
- Auto-instrumentation just doubled your span volume and your tracing bill. What do you do?
- How do you enforce consistent attribute naming, like tenant.id versus tenantId, across six teams?
More Observability interview questions
Also worth your time on this topic
Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization
A practical checklist for adding OpenTelemetry tracing to your services, shipping spans through the Collector, and turning that data into something you can actually debug with.
90-150 minutes
Traces and Spans Explained
A request hits your API gateway, which calls two backend services, and one of those queries a database. Walk me through what that looks like as a distributed trace. What is a span, and how do spans connect to each other?
junior
Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization
A walkthrough of instrumenting a real service with OpenTelemetry, running the Collector, and finding the slow span in Jaeger when a request hops across five microservices.