You need to roll out tracing across 40 services owned by six different teams. Do you go with auto-instrumentation or manual instrumentation, and how do you decide?

Question

Accepted Answer

Start with auto-instrumentation for breadth, then add manual spans where the business logic lives. Auto-instrumentation hooks into known libraries: HTTP servers and clients, database drivers, gRPC, queue clients. In Java you attach the agent jar with a JVM flag and get traces with zero code changes, in Python you wrap the process with opentelemetry-instrument. That gets all 40 services emitting spans in days, not quarters, and you immediately see the service map and cross-service latency. What auto-instrumentation cannot give you is domain context. It shows 'POST /api/checkout took 2s' but not which step of checkout was slow or which customer tier was affected. So the second phase is manual: teams add spans around meaningful operations and attach attributes like order.value or tenant.id to the spans that auto-instrumentation already created. Language matters too. Java and Python have strong runtime agents. Go has no agent in the traditional sense, so you instrument in code or look at eBPF-based approaches, which means your Go services need more team buy-in up front. The practical rollout: platform team ships a base image or init container with the agent preconfigured, exporting to a standard endpoint, and each team only touches code when they want richer spans. Forcing six teams to hand-instrument everything before seeing any value is how tracing rollouts die.

Auto vs Manual Instrumentation

Sample answer

Why this matters

Code examples

Common mistakes to avoid

Likely follow-ups

More Observability interview questions

Also worth your time on this topic

Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization

Traces and Spans Explained

Distributed Tracing with OpenTelemetry: From Instrumentation to Visualization