Skip to main content
senior
advanced
SRE

Capacity Planning and Scaling

Question

How do you approach capacity planning for a growing production system? What metrics and strategies do you use?

Answer

Capacity planning ensures systems can handle current and future load. Process: 1) Establish baselines - current CPU, memory, disk, network utilization and request rates. 2) Understand growth patterns - historical trends, seasonality, planned campaigns. 3) Define headroom - typically 30-40% buffer for unexpected spikes. 4) Model scenarios - what happens at 2x, 5x, 10x traffic? 5) Identify bottlenecks - database connections, API rate limits, stateful components. 6) Plan scaling strategy - vertical vs horizontal, auto-scaling policies. 7) Load test regularly. Review capacity quarterly.

Why This Matters

Capacity planning is both art and science. Too much capacity wastes money; too little causes outages. Cloud auto-scaling helps but doesn't solve everything - databases, third-party APIs, and stateful services often can't scale horizontally. Senior engineers must think about bottlenecks that aren't obvious and plan for Black Friday scenarios before they happen.

Code Examples

Horizontal Pod Autoscaler

yaml

Capacity analysis queries

bash
Common Mistakes
  • Only planning for average load, not peak load
  • Forgetting about dependent services that may become bottlenecks
  • Not accounting for the time it takes to scale (cold start, provisioning)
Follow-up Questions
Interviewers often ask these as follow-up questions
  • How do you handle capacity planning for stateful services like databases?
  • What is the difference between scaling up and scaling out?
  • How do you account for third-party API rate limits in capacity planning?
Tags
capacity-planning
scaling
sre
performance
architecture