SRE Interview Questions
Practice 10 SRE interview questions across every experience level. Think through each one, then reveal the model answer.
10 questions
SLO vs SLI vs SLA Differences
Your team just launched a new API service. Your manager asks you to set up SLOs for it. Can you walk me through what SLOs, SLIs, and SLAs are, and how they relate to each other?
JuniorbeginnerApplication Performance Optimization
How do you identify and resolve performance bottlenecks in a production application?
MidintermediateSLI, SLO, and SLA Definitions
Explain the difference between SLI, SLO, and SLA with examples.
MidintermediateChoosing the Right SLIs
You're joining a team that runs a checkout service for an e-commerce platform. There are no SLOs yet. How would you decide which SLIs to track?
MidintermediateError Budget Management
Your service has a 99.9% availability SLO over a 30-day window. How much downtime does that give you, and what do you actually do with that error budget day-to-day?
MidintermediateCapacity Planning and Scaling
How do you approach capacity planning for a growing production system? What metrics and strategies do you use?
SenioradvancedChaos Engineering Practices
What is chaos engineering and how would you implement it safely in a production environment?
SenioradvancedIncident Postmortems
Describe a production incident you handled and how you structured the postmortem. What makes a good blameless postmortem?
SenioradvancedError Budget Burn Investigation
It's Monday morning. You check the dashboard and see that your service burned 80% of its monthly error budget over the weekend. Walk me through how you'd investigate this and what you'd do next.
SenioradvancedSLO-Based Alerting and Burn Rates
Traditional alerting fires when error rate crosses a static threshold, like 'alert if errors > 1%'. What's wrong with that approach, and how would you set up SLO-based alerting instead?
Senioradvanced