2024-12-19

7 min read

How to Fix Pods Stuck in Terminating Status in Kubernetes

When Kubernetes pods get stuck in the "Terminating" status, it can disrupt your application deployments and consume cluster resources. This issue occurs when pods cannot gracefully shut down within the expected timeframe or when finalizers prevent deletion. This guide covers the causes and solutions for resolving stuck terminating pods.

Prerequisites

You'll need kubectl configured to access your Kubernetes cluster with sufficient permissions to delete pods and modify resources. Basic understanding of Kubernetes concepts like pods, namespaces, and finalizers is helpful.

Understanding the Terminating Status

When a pod is stuck in "Terminating" status, it means Kubernetes has initiated the deletion process but the pod hasn't been fully removed. You can identify these pods using:

kubectl get pods --all-namespaces | grep Terminating

Or for a specific namespace:

kubectl get pods -n my-namespace | grep Terminating

To see more details about a stuck pod:

kubectl describe pod <pod-name> -n <namespace>

Method 1: Force Delete the Pod

The quickest solution is often to force delete the pod, bypassing the graceful termination period:

kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0

This command immediately removes the pod from the API server without waiting for the kubelet to confirm termination.

Force delete multiple pods:

kubectl delete pods <pod1> <pod2> <pod3> -n <namespace> --force --grace-period=0

Force delete all terminating pods in a namespace:

kubectl get pods -n <namespace> | grep Terminating | awk '{print $1}' | xargs kubectl delete pod -n <namespace> --force --grace-period=0

Method 2: Remove Finalizers

Finalizers can prevent pod deletion. Check if the pod has finalizers:

kubectl get pod <pod-name> -n <namespace> -o yaml | grep finalizers -A 5

If finalizers are present, remove them by patching the pod:

kubectl patch pod <pod-name> -n <namespace> -p '{"metadata":{"finalizers":null}}'

For more targeted finalizer removal:

kubectl patch pod <pod-name> -n <namespace> --type='merge' -p='{"metadata":{"finalizers":[]}}'

Method 3: Check and Restart Node Components

Sometimes the issue is with the node's kubelet. Check node status:

kubectl get nodes
kubectl describe node <node-name>

If the node is having issues, you might need to restart the kubelet service (this varies by cluster setup):

# For systemd-based systems (SSH to the node)
sudo systemctl restart kubelet

# Check kubelet logs
sudo journalctl -u kubelet -f

Method 4: Investigate Pod Events and Logs

Before force deleting, investigate what's causing the termination delay:

# Check pod events
kubectl describe pod <pod-name> -n <namespace>

# Check pod logs
kubectl logs <pod-name> -n <namespace>

# Check previous container logs if pod restarted
kubectl logs <pod-name> -n <namespace> --previous

Look for error messages that might indicate why the pod cannot terminate gracefully.

Method 5: Handle Persistent Volume Issues

Pods with persistent volumes might get stuck if there are storage issues:

# Check persistent volume claims
kubectl get pvc -n <namespace>

# Check persistent volumes
kubectl get pv

# Describe problematic PVC
kubectl describe pvc <pvc-name> -n <namespace>

If storage is the issue, you might need to:

# Force delete the PVC (be careful - this removes data)
kubectl delete pvc <pvc-name> -n <namespace> --force --grace-period=0

# Or patch to remove finalizers
kubectl patch pvc <pvc-name> -n <namespace> -p '{"metadata":{"finalizers":null}}'

Automated Script for Bulk Operations

Create a script to handle multiple stuck pods automatically:

#!/bin/bash
# fix-terminating-pods.sh

NAMESPACE=${1:-default}

echo "Checking for terminating pods in namespace: $NAMESPACE"

# Get all terminating pods
TERMINATING_PODS=$(kubectl get pods -n $NAMESPACE | grep Terminating | awk '{print $1}')

if [ -z "$TERMINATING_PODS" ]; then
    echo "No terminating pods found in namespace $NAMESPACE"
    exit 0
fi

echo "Found terminating pods:"
echo "$TERMINATING_PODS"

echo "Attempting to force delete pods..."

for pod in $TERMINATING_PODS; do
    echo "Force deleting pod: $pod"
    kubectl delete pod $pod -n $NAMESPACE --force --grace-period=0

    # Wait a moment and check if pod is gone
    sleep 2
    if kubectl get pod $pod -n $NAMESPACE >/dev/null 2>&1; then
        echo "Pod $pod still exists, trying to remove finalizers..."
        kubectl patch pod $pod -n $NAMESPACE -p '{"metadata":{"finalizers":null}}'
    else
        echo "Pod $pod successfully deleted"
    fi
done

echo "Cleanup complete!"

Make it executable and run:

chmod +x fix-terminating-pods.sh
./fix-terminating-pods.sh my-namespace

Prevention Strategies

Set appropriate grace periods in your deployment manifests:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 30 # Adjust based on your app's needs
      containers:
        - name: my-app
          image: my-app:latest

Implement proper signal handling in your applications:

# Example for a Node.js app
process.on('SIGTERM', () => {
    console.log('Received SIGTERM, shutting down gracefully');
    server.close(() => {
        process.exit(0);
    });
});

Add readiness and liveness probes to ensure proper health checks:

spec:
  containers:
    - name: my-app
      livenessProbe:
        httpGet:
          path: /health
          port: 8080
        periodSeconds: 10
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        periodSeconds: 5

Monitoring and Alerting

Set up monitoring to detect stuck pods early:

# Check for terminating pods across all namespaces
kubectl get pods --all-namespaces --field-selector=status.phase=Terminating

# Create a monitoring script
cat << 'EOF' > monitor-stuck-pods.sh
#!/bin/bash
STUCK_PODS=$(kubectl get pods --all-namespaces | grep Terminating | wc -l)
if [ $STUCK_PODS -gt 0 ]; then
    echo "WARNING: $STUCK_PODS pods stuck in Terminating status"
    kubectl get pods --all-namespaces | grep Terminating
fi
EOF

Advanced Troubleshooting

Check for resource quotas or limits:

kubectl describe resourcequota -n <namespace>
kubectl describe limitrange -n <namespace>

Investigate cluster-level issues:

# Check cluster events
kubectl get events --sort-by=.metadata.creationTimestamp

# Check cluster resource usage
kubectl top nodes
kubectl top pods --all-namespaces

Examine etcd if you have access:

# This requires cluster admin access
kubectl get pods -n kube-system | grep etcd
kubectl logs etcd-<node-name> -n kube-system

Emergency Cluster Recovery

If many pods are stuck and affecting cluster stability:

# Drain a problematic node
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force

# Restart the node (method depends on your infrastructure)
# Then uncordon the node
kubectl uncordon <node-name>

Working with Different Pod Controllers

Different controllers may require specific approaches:

For Deployment pods:

kubectl rollout restart deployment <deployment-name> -n <namespace>

For StatefulSet pods:

kubectl delete sts <statefulset-name> -n <namespace> --cascade=orphan
# Then recreate the StatefulSet
kubectl apply -f statefulset.yaml

For DaemonSet pods:

kubectl rollout restart daemonset <daemonset-name> -n <namespace>

Creating Recovery Procedures

Document your recovery procedures for team reference:

# Create a recovery runbook
cat << 'EOF' > pod-termination-runbook.md
# Pod Termination Recovery Runbook

## Quick Commands
- List terminating pods: `kubectl get pods --all-namespaces | grep Terminating`
- Force delete: `kubectl delete pod <name> -n <ns> --force --grace-period=0`
- Remove finalizers: `kubectl patch pod <name> -n <ns> -p '{"metadata":{"finalizers":null}}'`

## Investigation Steps
1. Check pod describe output
2. Review pod logs
3. Check node status
4. Investigate storage issues
5. Review cluster events

## Escalation Criteria
- More than 10 pods stuck for >5 minutes
- Critical system pods affected
- Cluster performance degraded
EOF

Next Steps

Now that you can resolve stuck terminating pods, consider learning about:

Implementing proper application shutdown procedures
Setting up cluster monitoring and alerting
Understanding Kubernetes networking and service mesh
Configuring resource limits and quotas
Planning disaster recovery procedures

Published: 2024-12-19|Last updated: 2024-12-19T10:00:00Z