SLO vs SLI vs SLA Differences
Your team just launched a new API service. Your manager asks you to set up SLOs for it. Can you walk me through what SLOs, SLIs, and SLAs are, and how they relate to each other?
Your team just launched a new API service. Your manager asks you to set up SLOs for it. Can you walk me through what SLOs, SLIs, and SLAs are, and how they relate to each other?
An SLI (Service Level Indicator) is a measurement of your service's behavior -- something you can actually observe and put a number on. For example, the percentage of HTTP requests that return in under 300ms, or the ratio of successful responses to total responses. An SLO (Service Level Objective) is a target you set for an SLI. You're saying "we want this SLI to meet this threshold over this time window." For example, "99.9% of requests should succeed over a rolling 30-day window." An SLA (Service Level Agreement) is a contract with your customers that says what happens if you miss your targets. It has business consequences -- refunds, credits, or penalties. Not every service needs an SLA, but every production service should have SLOs. The relationship flows like this: you pick SLIs (what to measure), set SLOs (what target to hit), and if you make a promise to customers, that becomes an SLA. Your SLOs should always be tighter than your SLAs. If your SLA promises 99.9% availability, your internal SLO should be something like 99.95%. That gap gives you a buffer to catch problems before they become contract violations.
This is a foundational SRE question. You're testing whether the candidate understands the relationship between these three concepts or just memorized definitions. Strong candidates will explain the hierarchy (SLI feeds SLO, SLO is stricter than SLA) and give concrete examples. Weak candidates mix up SLOs and SLAs or can't give real numbers.
SLO definition in a config file (OpenSLO format)
Quick availability calculation from logs
- Treating SLOs and SLAs as the same thing. SLAs are contracts with business consequences; SLOs are internal engineering targets.
- Setting SLOs at 100%. No service should target 100% reliability -- it leaves zero room for deployments, maintenance, or any changes at all.
- Picking SLIs that don't reflect user experience, like CPU usage instead of request latency or error rate.
- Why should your SLO always be stricter than your SLA?
- Can you have SLOs without SLAs? When would you do that?
- What happens if you set your SLO target too high, like 99.99% for an internal tool?
More SRE interview questions
Also worth your time on this topic
SLOs, SLIs, and Error Budgets: A Practical Implementation Guide
A step-by-step checklist for defining service level objectives, picking the right service level indicators, and using error budgets to make better decisions about reliability vs. feature velocity.
45-90 minutes
Choosing the Right SLIs
You're joining a team that runs a checkout service for an e-commerce platform. There are no SLOs yet. How would you decide which SLIs to track?
mid
SLOs, SLIs, and Error Budgets: A Practical Implementation Guide
Your service went down at 2 AM and nobody could agree on whether it was "bad enough" to page someone. SLOs, SLIs, and error budgets fix that. Here is how to define, measure, and act on them with real Prometheus queries and alerting rules.