Resource Management and Scaling

Learn how to optimize resource allocation and implement efficient scaling strategies in Kubernetes

Effective resource management is crucial for running efficient, cost-effective, and reliable Kubernetes clusters. In this section, we'll explore how to allocate compute resources to your workloads, implement various scaling strategies, and optimize your cluster's overall performance.

Understanding Kubernetes Resource Management

Kubernetes schedules pods based on available resources. When you specify resource requirements for your containers, the scheduler can make better decisions about which nodes to place your pods on.

Resource Requests and Limits

The two primary resource controls in Kubernetes are requests and limits:

Requests: The minimum amount of resources a container needs
Limits: The maximum amount of resources a container can use

apiVersion: v1
kind: Pod
metadata:
  name: resource-demo
spec:
  containers:
    - name: app
      image: nginx
      resources:
        requests:
          memory: '64Mi'
          cpu: '250m'
        limits:
          memory: '128Mi'
          cpu: '500m'

In this example:

The container requests 64MiB of memory and 0.25 CPU cores
The container is limited to 128MiB of memory and 0.5 CPU cores

CPU Resources

CPU resources are measured in CPU units:

1 CPU unit = 1 physical or virtual core
Can be expressed as decimal values
Commonly specified in millicores (m), where 1000m = 1 CPU

Examples:

100m = 0.1 CPU (10% of a core)
500m = 0.5 CPU (50% of a core)
2 = 2 CPUs (2 full cores)

Memory Resources

Memory is measured in bytes, and you can use the following suffixes:

Ki or K: Kibibytes (1024 bytes)
Mi or M: Mebibytes (1024 Ki)
Gi or G: Gibibytes (1024 Mi)
Ti or T: Tebibytes (1024 Gi)

Examples:

256Mi = 256 Mebibytes
1Gi = 1 Gibibyte

Resource Behavior

How Kubernetes handles resource requests and limits:

CPU (Compressible Resource):
- If a container exceeds its CPU limit, it's throttled but not terminated
- CPU requests guarantee a minimum amount of CPU time
Memory (Incompressible Resource):
- If a container exceeds its memory limit, it may be terminated (OOMKilled)
- Memory requests are used for scheduling decisions

Quality of Service (QoS) Classes

Kubernetes assigns a QoS class to each pod based on its resource specifications:

Guaranteed

Pods receive the Guaranteed QoS class when:

Every container has both memory and CPU limits and requests set
The limits equal the requests

resources:
  requests:
    memory: '128Mi'
    cpu: '500m'
  limits:
    memory: '128Mi'
    cpu: '500m'

These pods are least likely to be evicted under resource pressure.

Burstable

Pods receive the Burstable QoS class when:

At least one container has a memory or CPU request set
The pod doesn't qualify as Guaranteed

resources:
  requests:
    memory: '64Mi'
    cpu: '250m'
  limits:
    memory: '128Mi'
    cpu: '500m'

These pods may be evicted if the system is under memory pressure and there are no lower-priority pods.

BestEffort

Pods receive the BestEffort QoS class when:

No container has any memory or CPU requests or limits set

resources: {}

These pods are the first to be evicted under resource pressure.

Resource Quotas

Resource Quotas restrict the total resource consumption within a namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: '10'
    requests.memory: 20Gi
    limits.cpu: '20'
    limits.memory: 40Gi
    pods: '10'

This quota limits the namespace to:

10 CPU cores in total requests
20 GiB of memory in total requests
20 CPU cores in total limits
40 GiB of memory in total limits
10 pods maximum

You can also set quotas for object counts:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: object-quota
  namespace: team-a
spec:
  hard:
    configmaps: '10'
    persistentvolumeclaims: '5'
    services: '5'
    secrets: '10'
    services.loadbalancers: '1'

Limit Ranges

Limit Ranges set default resource limits and enforce minimum and maximum constraints on resources:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-a
spec:
  limits:
    - type: Container
      default:
        cpu: 500m
        memory: 256Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      max:
        cpu: 2
        memory: 2Gi
      min:
        cpu: 50m
        memory: 64Mi

This LimitRange:

Sets default limits for containers that don't specify them
Sets default requests for containers that don't specify them
Enforces maximum and minimum resource boundaries

Horizontal Pod Autoscaling

Horizontal Pod Autoscaler (HPA) automatically scales the number of pods based on observed CPU utilization or other metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

This HPA:

Targets the nginx deployment
Maintains between 2 and 10 replicas
Aims to keep CPU utilization at 70%

Custom Metrics

You can scale based on custom metrics by installing a metrics adapter:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-processor
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-processor
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metric:
          name: messages_in_queue
        target:
          type: AverageValue
          averageValue: 100

This HPA scales based on a custom metric messages_in_queue, aiming for an average of 100 messages per pod.

External Metrics

You can also scale based on metrics from external systems:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: sqs-processor
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sqs-processor
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric:
          name: sqs_messages
          selector:
            matchLabels:
              queue: my-queue
        target:
          type: AverageValue
          averageValue: 30

This HPA scales based on the number of messages in an SQS queue.

Vertical Pod Autoscaling

Vertical Pod Autoscaler (VPA) adjusts the CPU and memory requests and limits for pods:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: nginx-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 1
          memory: 1Gi
        controlledResources: ['cpu', 'memory']

This VPA:

Targets the nginx deployment
Automatically updates the pods' resource requests
Sets minimum and maximum resource boundaries

VPA update modes:

Auto: Automatically updates pod resources, may require pod restarts
Recreate: Similar to Auto but requires pod restarts
Initial: Only applies to new pods
Off: Only provides recommendations without making changes

Note: VPA is not part of standard Kubernetes and requires additional setup.

Cluster Autoscaling

Cluster Autoscaler adjusts the size of the Kubernetes cluster based on pod resource requests:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-status
  namespace: kube-system
data:
  status: |
    Cluster-autoscaler status at 2023-05-17 12:00:00.
    Cluster is balanced.
    No scale-up required.
    No scale-down candidates.

Cluster Autoscaler works with many cloud providers:

AWS EKS
GKE
Azure AKS
DigitalOcean DOKS

On DigitalOcean, you can enable autoscaling when creating a cluster or by modifying existing node pools. Specify the minimum and maximum node count, and the cluster will automatically scale based on resource demands.

Analyzing Resource Usage

To optimize resource allocation, you need to understand how your applications use resources:

Viewing Current Resource Allocation

Check resource requests and limits for pods:

kubectl get pods -o custom-columns=NAME:.metadata.name,REQUESTS:.spec.containers[0].resources.requests,LIMITS:.spec.containers[0].resources.limits

Get resource allocation per node:

kubectl describe nodes | grep -A 5 "Allocated resources"

Monitoring Resource Usage

Use kubectl to check current resource usage:

kubectl top nodes
kubectl top pods

For more comprehensive monitoring, deploy tools like:

Prometheus + Grafana
Datadog
New Relic
Dynatrace

Advanced Scheduling

Node Affinity

Node Affinity controls which nodes your pods can run on based on node labels:

apiVersion: v1
kind: Pod
metadata:
  name: nginx-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: disktype
                operator: In
                values:
                  - ssd
  containers:
    - name: nginx
      image: nginx

This pod will only run on nodes with the label disktype=ssd.

Types of node affinity:

requiredDuringSchedulingIgnoredDuringExecution: Hard requirement
preferredDuringSchedulingIgnoredDuringExecution: Soft preference

Pod Affinity and Anti-Affinity

Pod Affinity and Anti-Affinity control how pods are scheduled relative to other pods:

apiVersion: v1
kind: Pod
metadata:
  name: web-pod
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - cache
          topologyKey: 'kubernetes.io/hostname'
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app
                  operator: In
                  values:
                    - web
            topologyKey: 'kubernetes.io/hostname'
  containers:
    - name: web
      image: nginx

This pod:

Must run on the same node as pods with the label app=cache
Prefers to avoid nodes that already have pods with the label app=web

Taints and Tolerations

Taints mark nodes as unsuitable for certain pods, while tolerations allow pods to run on tainted nodes:

# Taint a node
kubectl taint nodes node1 key=value:NoSchedule

# Pod with toleration
apiVersion: v1
kind: Pod
metadata:
  name: nginx-toleration
spec:
  tolerations:
  - key: "key"
    operator: "Equal"
    value: "value"
    effect: "NoSchedule"
  containers:
  - name: nginx
    image: nginx

Taint effects:

NoSchedule: Pods won't be scheduled on the node unless they have a matching toleration
PreferNoSchedule: Kubernetes tries to avoid scheduling pods without matching tolerations
NoExecute: Pods without matching tolerations will be evicted if already running

Pod Topology Spread Constraints

Topology Spread Constraints control how pods are distributed across nodes:

apiVersion: v1
kind: Pod
metadata:
  name: topology-spread-pod
  labels:
    app: web
spec:
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app: web
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          app: web
  containers:
    - name: nginx
      image: nginx

This ensures pods with the label app=web are distributed evenly across availability zones and nodes.

Resource Optimization Strategies

Right-sizing Containers

Follow these steps to properly size your containers:

Start with educated guesses based on application requirements
Monitor actual usage:
```
kubectl top pods
```
Analyze historical metrics using Prometheus or other monitoring tools
Adjust based on observed usage patterns

Request vs. Limit Ratio

A common practice is to set limits higher than requests to allow for bursting:

CPU: Limit = 2 × Request (e.g., request: 250m, limit: 500m)
Memory: Limit = 1.5 × Request (e.g., request: 256Mi, limit: 384Mi)

This approach:

Guarantees minimum resources
Allows bursting when resources are available
Prevents a single pod from consuming all resources

Using Pod Disruption Budgets

Pod Disruption Budgets (PDBs) ensure high availability during voluntary disruptions:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb
spec:
  minAvailable: 2 # or maxUnavailable: 1
  selector:
    matchLabels:
      app: nginx

This PDB ensures at least 2 pods with the label app=nginx are available during disruptions like node drains.

CPU Throttling Detection

Detect CPU throttling by monitoring the container CPU throttled metric:

In Prometheus: container_cpu_cfs_throttled_seconds_total
If a container is throttled frequently, consider increasing its CPU limit

Memory Optimization

Memory optimization techniques:

Set appropriate JVM heap sizes for Java applications (e.g., -Xms256m -Xmx512m)
Use memory-efficient containers (Alpine-based images)
Monitor for Out-Of-Memory (OOM) events
Use initContainers for memory-intensive setup tasks

Implementing Resource Policies

Create comprehensive resource policies:

Define namespace quotas for teams:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
spec:
  hard:
    requests.cpu: '10'
    requests.memory: 20Gi
    limits.cpu: '20'
    limits.memory: 40Gi

Set default limits with LimitRanges:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
spec:
  limits:
    - default:
        cpu: 500m
        memory: 256Mi
      defaultRequest:
        cpu: 100m
        memory: 64Mi
      type: Container

Establish clear naming conventions and tagging strategies
Create documentation and training for teams

Cost Optimization

Resource Efficiency

Implement these strategies to optimize costs:

Utilize node auto-scaling: Automatically adjust cluster size based on demand
Use spot/preemptible instances: For non-critical or batch workloads
Implement cluster hibernation: Scale down during off-hours
Consolidate underutilized pods: Use affinity rules to pack pods efficiently

Resource Monitoring and Reporting

Set up comprehensive monitoring:

Monitor resource usage with Prometheus and Grafana
Create dashboards showing:
- Cluster utilization
- Cost per namespace/team
- Request vs. usage ratio
Set up alerts for:
- Low resource efficiency
- Quota approaching limits
- Cost anomalies

Cost Allocation

Implement cost allocation strategies:

Use namespaces for team or project separation

Label resources with cost centers:

metadata:
  labels:
    cost-center: team-a
    environment: production
    project: website

Use tools like Kubecost or CloudHealth for detailed cost analysis

Advanced Scaling Patterns

Predictive Scaling

Set up scheduled scaling for predictable workloads:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: predictive-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: website
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
    scaleDown:
      stabilizationWindowSeconds: 300

For more advanced predictive scaling, use external scaling solutions like KEDA (Kubernetes Event-driven Autoscaling).

Event-Driven Scaling

KEDA enables scaling based on event sources like queues, stream processing, and scheduled times:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-scaler
spec:
  scaleTargetRef:
    name: consumer
    kind: Deployment
  pollingInterval: 15
  cooldownPeriod: 30
  minReplicaCount: 1
  maxReplicaCount: 30
  triggers:
    - type: rabbitmq
      metadata:
        queueName: taskQueue
        host: amqp://user:password@rabbitmq:5672/vhost
        queueLength: '50'

This scales the consumer deployment based on RabbitMQ queue length.

Multi-Dimensional Scaling

Use multiple HPAs for different metrics:

# CPU-based HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-cpu-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

# Custom metric HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-requests-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: 1000

The deployment will scale based on whichever HPA requires more replicas.

Best Practices for Resource Management

Always set resource requests and limits: Prevents resource starvation and noisy neighbor issues
Analyze actual usage: Monitor resource usage and adjust requests and limits accordingly
Implement namespace quotas: Control resource consumption by teams
Set default limits with LimitRanges: Ensure all containers have reasonable defaults
Use appropriate QoS classes: Match QoS to application importance
Implement Pod Disruption Budgets: Ensure high availability during disruptions
Configure HPA with appropriate metrics: Choose metrics that accurately reflect application load
Set up proper alert thresholds: Be notified before resource issues become critical
Document resource decisions: Maintain documentation explaining resource choices
Regularly review and optimize: Resource management is an ongoing process

Using DigitalOcean for Kubernetes Resource Management

DigitalOcean Kubernetes provides a cost-effective platform with built-in autoscaling capabilities:

Node auto-scaling based on pod resource requests
Simple, transparent pricing model
Prometheus-compatible metrics API
Integration with DOKS metrics to Datadog, Prometheus, and other monitoring tools

In the next section, we'll explore monitoring and logging in Kubernetes, which are essential for maintaining visibility into your applications and infrastructure.

Part of: Introduction to Kubernetes

Updated: 5/17/2025

Tags