Skip to main content
junior
beginner
Monitoring

Four Golden Signals of Monitoring

Question

What are the four golden signals of monitoring and why are they important?

Answer

The Four Golden Signals (from Google SRE) are: Latency (request response time), Traffic (requests per second), Errors (rate of failed requests), and Saturation (resource utilization). These metrics quickly indicate service health and help diagnose issues. They're the minimum metrics every service should track.

Why This Matters

These signals come from Google's Site Reliability Engineering practices. Together they provide a comprehensive view of system health. Latency shows user experience, traffic shows demand, errors show reliability, and saturation shows capacity. Start with these before adding more specific metrics.

Code Examples

Prometheus alerting rules

yaml
Common Mistakes
  • Only monitoring one signal (e.g., just CPU usage)
  • Setting alerts too sensitive (alert fatigue) or too loose (missing issues)
  • Not measuring latency at different percentiles (p50, p95, p99)
Follow-up Questions
Interviewers often ask these as follow-up questions
  • How do you distinguish between latency for successful vs failed requests?
  • What's the difference between metrics, logs, and traces?
  • How do you set appropriate alert thresholds?
Tags
monitoring
observability
sre
metrics
fundamentals