Skip to main content

Weighted Canary Rollout with Istio

Walk me through how you'd canary a new version of a service with Istio. Say you want to start at 5% traffic to v2 and ramp up.

mid
intermediate
Service Mesh
Question

Walk me through how you'd canary a new version of a service with Istio. Say you want to start at 5% traffic to v2 and ramp up.

Answer

First, both versions need to be running with distinct labels — `version: v1` on the stable pods and `version: v2` on the new ones. Then I write a DestinationRule that declares both subsets so Istio knows what v1 and v2 mean. Next, a VirtualService with a weighted route: 95 to v1, 5 to v2. To ramp, I bump the weight values and re-apply. Most teams script this or wire it into Flagger or Argo Rollouts so the ramp is automated based on success rate and latency metrics from Prometheus. The key thing to verify after each step is that the subset labels actually match real pods. A common failure mode is shifting weight to v2 when v2 has zero healthy pods because the pod label is `app=reviews-v2` instead of `version: v2`. You'll see 503s spike on the percentage of traffic you shifted. Also remember weights are percentages of matched traffic, not absolute. If your VirtualService has a header match before the weighted route, the weights only apply to requests that fell through to that rule.

Why This Matters

This is the most common real-world Istio use case. The interviewer wants to hear that the candidate understands the two-CRD pattern, the label-matching gotcha, and ideally that they know weights are not an SLO-aware progressive delivery system on their own — that's what Flagger or Argo Rollouts add on top. Bonus points for mentioning sticky sessions or session affinity considerations.

Code Examples

Initial 95/5 split with DestinationRule subsets

yaml

Verify the split is hitting both subsets

bash

Flagger Canary for automated ramp

yaml
Common Mistakes
  • Forgetting to label the v2 pods with `version: v2`, so traffic shifts to an empty subset
  • Manually editing weights in production instead of using Flagger or Argo Rollouts
  • Assuming weights apply globally when an earlier match rule already routed the request
Follow-up Questions
Interviewers often ask these as follow-up questions
  • How would you stick a specific user or header to v2 while everyone else stays on v1?
  • What metrics would you watch during the ramp, and where would you get them?
  • How does this change if your service holds in-memory session state?
Tags
istio
service-mesh
traffic-management
canary
progressive-delivery
Sponsored
Carbon Ads

More Service Mesh interview questions

Also worth your time on this topic