SLI, SLO, and SLA Definitions
Explain the difference between SLI, SLO, and SLA with examples.
SLI (Service Level Indicator) is a metric measuring service behavior (e.g., latency, error rate). SLO (Service Level Objective) is an internal target for that metric (e.g., 99.9% availability). SLA (Service Level Agreement) is an external contract with customers specifying consequences for missing targets. Example: SLI = request latency, SLO = 95% of requests < 200ms, SLA = credits if monthly availability drops below 99.5%.
These concepts form the foundation of reliability engineering. SLIs help measure user experience, SLOs balance reliability with feature velocity, and SLAs formalize business commitments. Without clear SLOs, teams either over-engineer for unnecessary reliability or ship too fast and burn out on incidents.
SLO definition example
SLI measurement in Prometheus
- Setting SLOs at 100% (impossible and prevents any changes)
- Confusing SLOs (internal targets) with SLAs (customer contracts)
- Not tracking error budget, making it meaningless
- How do you handle situations when you're close to exhausting your error budget?
- What's the difference between availability and reliability?
- How do you choose appropriate SLO targets?