How to Fix Pods Stuck in Terminating Status in Kubernetes
When Kubernetes pods get stuck in the "Terminating" status, it can disrupt your application deployments and consume cluster resources. This issue occurs when pods cannot gracefully shut down within the expected timeframe or when finalizers prevent deletion. This guide covers the causes and solutions for resolving stuck terminating pods.
Prerequisites
You'll need kubectl configured to access your Kubernetes cluster with sufficient permissions to delete pods and modify resources. Basic understanding of Kubernetes concepts like pods, namespaces, and finalizers is helpful.
Understanding the Terminating Status
When a pod is stuck in "Terminating" status, it means Kubernetes has initiated the deletion process but the pod hasn't been fully removed. You can identify these pods using:
kubectl get pods --all-namespaces | grep Terminating
Or for a specific namespace:
kubectl get pods -n my-namespace | grep Terminating
To see more details about a stuck pod:
kubectl describe pod <pod-name> -n <namespace>
Method 1: Force Delete the Pod
The quickest solution is often to force delete the pod, bypassing the graceful termination period:
kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0
This command immediately removes the pod from the API server without waiting for the kubelet to confirm termination.
Force delete multiple pods:
kubectl delete pods <pod1> <pod2> <pod3> -n <namespace> --force --grace-period=0
Force delete all terminating pods in a namespace:
kubectl get pods -n <namespace> | grep Terminating | awk '{print $1}' | xargs kubectl delete pod -n <namespace> --force --grace-period=0
Method 2: Remove Finalizers
Finalizers can prevent pod deletion. Check if the pod has finalizers:
kubectl get pod <pod-name> -n <namespace> -o yaml | grep finalizers -A 5
If finalizers are present, remove them by patching the pod:
kubectl patch pod <pod-name> -n <namespace> -p '{"metadata":{"finalizers":null}}'
For more targeted finalizer removal:
kubectl patch pod <pod-name> -n <namespace> --type='merge' -p='{"metadata":{"finalizers":[]}}'
Method 3: Check and Restart Node Components
Sometimes the issue is with the node's kubelet. Check node status:
kubectl get nodes
kubectl describe node <node-name>
If the node is having issues, you might need to restart the kubelet service (this varies by cluster setup):
# For systemd-based systems (SSH to the node)
sudo systemctl restart kubelet
# Check kubelet logs
sudo journalctl -u kubelet -f
Method 4: Investigate Pod Events and Logs
Before force deleting, investigate what's causing the termination delay:
# Check pod events
kubectl describe pod <pod-name> -n <namespace>
# Check pod logs
kubectl logs <pod-name> -n <namespace>
# Check previous container logs if pod restarted
kubectl logs <pod-name> -n <namespace> --previous
Look for error messages that might indicate why the pod cannot terminate gracefully.
Method 5: Handle Persistent Volume Issues
Pods with persistent volumes might get stuck if there are storage issues:
# Check persistent volume claims
kubectl get pvc -n <namespace>
# Check persistent volumes
kubectl get pv
# Describe problematic PVC
kubectl describe pvc <pvc-name> -n <namespace>
If storage is the issue, you might need to:
# Force delete the PVC (be careful - this removes data)
kubectl delete pvc <pvc-name> -n <namespace> --force --grace-period=0
# Or patch to remove finalizers
kubectl patch pvc <pvc-name> -n <namespace> -p '{"metadata":{"finalizers":null}}'
Automated Script for Bulk Operations
Create a script to handle multiple stuck pods automatically:
#!/bin/bash
# fix-terminating-pods.sh
NAMESPACE=${1:-default}
echo "Checking for terminating pods in namespace: $NAMESPACE"
# Get all terminating pods
TERMINATING_PODS=$(kubectl get pods -n $NAMESPACE | grep Terminating | awk '{print $1}')
if [ -z "$TERMINATING_PODS" ]; then
echo "No terminating pods found in namespace $NAMESPACE"
exit 0
fi
echo "Found terminating pods:"
echo "$TERMINATING_PODS"
echo "Attempting to force delete pods..."
for pod in $TERMINATING_PODS; do
echo "Force deleting pod: $pod"
kubectl delete pod $pod -n $NAMESPACE --force --grace-period=0
# Wait a moment and check if pod is gone
sleep 2
if kubectl get pod $pod -n $NAMESPACE >/dev/null 2>&1; then
echo "Pod $pod still exists, trying to remove finalizers..."
kubectl patch pod $pod -n $NAMESPACE -p '{"metadata":{"finalizers":null}}'
else
echo "Pod $pod successfully deleted"
fi
done
echo "Cleanup complete!"
Make it executable and run:
chmod +x fix-terminating-pods.sh
./fix-terminating-pods.sh my-namespace
Prevention Strategies
Set appropriate grace periods in your deployment manifests:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
terminationGracePeriodSeconds: 30 # Adjust based on your app's needs
containers:
- name: my-app
image: my-app:latest
Implement proper signal handling in your applications:
# Example for a Node.js app
process.on('SIGTERM', () => {
console.log('Received SIGTERM, shutting down gracefully');
server.close(() => {
process.exit(0);
});
});
Add readiness and liveness probes to ensure proper health checks:
spec:
containers:
- name: my-app
livenessProbe:
httpGet:
path: /health
port: 8080
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
Monitoring and Alerting
Set up monitoring to detect stuck pods early:
# Check for terminating pods across all namespaces
kubectl get pods --all-namespaces --field-selector=status.phase=Terminating
# Create a monitoring script
cat << 'EOF' > monitor-stuck-pods.sh
#!/bin/bash
STUCK_PODS=$(kubectl get pods --all-namespaces | grep Terminating | wc -l)
if [ $STUCK_PODS -gt 0 ]; then
echo "WARNING: $STUCK_PODS pods stuck in Terminating status"
kubectl get pods --all-namespaces | grep Terminating
fi
EOF
Advanced Troubleshooting
Check for resource quotas or limits:
kubectl describe resourcequota -n <namespace>
kubectl describe limitrange -n <namespace>
Investigate cluster-level issues:
# Check cluster events
kubectl get events --sort-by=.metadata.creationTimestamp
# Check cluster resource usage
kubectl top nodes
kubectl top pods --all-namespaces
Examine etcd if you have access:
# This requires cluster admin access
kubectl get pods -n kube-system | grep etcd
kubectl logs etcd-<node-name> -n kube-system
Emergency Cluster Recovery
If many pods are stuck and affecting cluster stability:
# Drain a problematic node
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force
# Restart the node (method depends on your infrastructure)
# Then uncordon the node
kubectl uncordon <node-name>
Working with Different Pod Controllers
Different controllers may require specific approaches:
For Deployment pods:
kubectl rollout restart deployment <deployment-name> -n <namespace>
For StatefulSet pods:
kubectl delete sts <statefulset-name> -n <namespace> --cascade=orphan
# Then recreate the StatefulSet
kubectl apply -f statefulset.yaml
For DaemonSet pods:
kubectl rollout restart daemonset <daemonset-name> -n <namespace>
Creating Recovery Procedures
Document your recovery procedures for team reference:
# Create a recovery runbook
cat << 'EOF' > pod-termination-runbook.md
# Pod Termination Recovery Runbook
## Quick Commands
- List terminating pods: `kubectl get pods --all-namespaces | grep Terminating`
- Force delete: `kubectl delete pod <name> -n <ns> --force --grace-period=0`
- Remove finalizers: `kubectl patch pod <name> -n <ns> -p '{"metadata":{"finalizers":null}}'`
## Investigation Steps
1. Check pod describe output
2. Review pod logs
3. Check node status
4. Investigate storage issues
5. Review cluster events
## Escalation Criteria
- More than 10 pods stuck for >5 minutes
- Critical system pods affected
- Cluster performance degraded
EOF
Next Steps
Now that you can resolve stuck terminating pods, consider learning about:
- Implementing proper application shutdown procedures
- Setting up cluster monitoring and alerting
- Understanding Kubernetes networking and service mesh
- Configuring resource limits and quotas
- Planning disaster recovery procedures
Found an issue?