Istio Circuit Breakers and Outlier Detection
How do you implement a circuit breaker in Istio? Explain the difference between the connection pool limits and outlier detection.
How do you implement a circuit breaker in Istio? Explain the difference between the connection pool limits and outlier detection.
Circuit breakers in Istio are configured on the DestinationRule under `trafficPolicy`. There are two parts and they do different jobs. `connectionPool` is the load-shedding part: it caps concurrent connections, pending requests, and requests per connection. When a caller exceeds those caps, Envoy rejects the request immediately with a 503 instead of queueing forever. That's how you protect a slow downstream from getting buried in pending work. `outlierDetection` is the eject-the-bad-pod part: Envoy tracks consecutive 5xx errors per endpoint, and when a pod crosses the threshold (say 5 consecutive failures), it's ejected from the load balancing pool for `baseEjectionTime`. Other healthy pods get the traffic instead. After the ejection time, the pod is allowed back. The two work together. Connection pool limits stop one slow service from holding too many in-flight requests across the whole client. Outlier detection isolates a single bad pod so you stop sending it traffic until it recovers. A common mistake is setting `consecutive5xxErrors: 1` — that's too aggressive and any transient blip ejects pods, causing flapping. Start at 5 and tune from there. Also, `maxEjectionPercent` matters: if it's too high, you can eject most of your pods during a real outage and make things worse.
This is the senior-level Istio question. The interviewer wants the candidate to distinguish two distinct mechanisms that both get lumped under 'circuit breaker' in casual conversation. A strong answer separates the load shedding role from the endpoint ejection role, knows the key fields, and can articulate sensible defaults. Watch for candidates who confuse Istio circuit breakers with Hystrix-style application circuit breakers — they're related concepts but Istio's is endpoint-level.
Full DestinationRule with connection pool and outlier detection
Verify ejections in the sidecar stats
Watch a pod get ejected in real time
- Setting `consecutive5xxErrors: 1` and causing pod flapping on transient errors
- Confusing Istio's endpoint-level circuit breaker with Hystrix-style per-call circuit breakers in code
- Forgetting `minHealthPercent` and ejecting the entire upstream during a real outage
- What happens if you set `maxEjectionPercent: 100` and all your pods start failing?
- How would you tune these settings differently for a database-backed service vs a stateless API?
- What signals would you use to decide that your `consecutive5xxErrors` threshold is wrong?
More Service Mesh interview questions
Also worth your time on this topic
Istio Retries and Retry Amplification
How do you configure retries in Istio, and what's the danger of being too aggressive with them?
mid
Istio Traffic Management Checklist: Routing, Retries, and Circuit Breaking
How to configure traffic management policies in Istio so your services can do canary releases, retry transient failures, and shed load when a downstream service goes bad. Covers VirtualService, DestinationRule, retries, timeouts, circuit breakers, and outlier detection.
60-90 minutes
Istio Traffic Management: Routing, Retries, and Circuit Breaking
Configure weighted routing, automatic retries, and circuit breakers in Istio with copy-paste YAML examples and real kubectl output you can verify on your own cluster.