Skip to main content

Auto vs Manual Instrumentation

You need to roll out tracing across 40 services owned by six different teams. Do you go with auto-instrumentation or manual instrumentation, and how do you decide?

mid
intermediate
Observability
Question

You need to roll out tracing across 40 services owned by six different teams. Do you go with auto-instrumentation or manual instrumentation, and how do you decide?

Answer

Start with auto-instrumentation for breadth, then add manual spans where the business logic lives. Auto-instrumentation hooks into known libraries: HTTP servers and clients, database drivers, gRPC, queue clients. In Java you attach the agent jar with a JVM flag and get traces with zero code changes, in Python you wrap the process with opentelemetry-instrument. That gets all 40 services emitting spans in days, not quarters, and you immediately see the service map and cross-service latency. What auto-instrumentation cannot give you is domain context. It shows 'POST /api/checkout took 2s' but not which step of checkout was slow or which customer tier was affected. So the second phase is manual: teams add spans around meaningful operations and attach attributes like order.value or tenant.id to the spans that auto-instrumentation already created. Language matters too. Java and Python have strong runtime agents. Go has no agent in the traditional sense, so you instrument in code or look at eBPF-based approaches, which means your Go services need more team buy-in up front. The practical rollout: platform team ships a base image or init container with the agent preconfigured, exporting to a standard endpoint, and each team only touches code when they want richer spans. Forcing six teams to hand-instrument everything before seeing any value is how tracing rollouts die.

Why This Matters

This question tests judgment, not API knowledge. The trap is treating it as either-or: strong candidates immediately say both, in sequence, and explain why breadth-first wins for adoption. You also learn whether they have dealt with organizational reality: getting six teams to do anything is harder than any technical step, so answers that include a platform-level rollout path (base images, agents injected at deploy time, sane defaults) signal real experience. Bonus points for knowing the per-language differences, especially that Go cannot just attach an agent.

Code Examples

Zero-code instrumentation for Java and Python services

bash

Kubernetes Operator injecting auto-instrumentation via annotation

yaml

Enriching auto-created spans with business context

python
Common Mistakes
  • Picking one approach for everything: all-manual stalls the rollout for months, all-auto leaves you with traces that lack any business meaning
  • Assuming every language works like Java, then discovering the Go services emit nothing because nobody planned for in-code instrumentation
  • Ignoring semantic conventions, so every team invents its own attribute names and cross-service queries become impossible
Follow-up Questions
Interviewers often ask these as follow-up questions
  • How would you handle the Go services in that fleet, where attaching a runtime agent is not an option?
  • Auto-instrumentation just doubled your span volume and your tracing bill. What do you do?
  • How do you enforce consistent attribute naming, like tenant.id versus tenantId, across six teams?
Tags
opentelemetry
distributed-tracing
instrumentation
observability
Sponsored
Carbon Ads

More Observability interview questions

Also worth your time on this topic