I hand you a fresh cluster with a demo nginx deployment. Take me from nothing to a controlled pod-delete experiment. What are the steps, and how do you keep it from turning into an outage?

Question

Accepted Answer

Six steps. First, install Litmus, which gives you the chaos-operator and the CRDs (helm chart or the operator manifest is fine). Second, install the pod-delete ChaosExperiment from the ChaosHub into the target namespace. Third, set up RBAC: a dedicated ServiceAccount with a Role and RoleBinding scoped to that namespace, granting only what pod-delete needs. Do not reach for the bundled litmus-admin here. Fourth, create a ChaosEngine pointing at the target with appns=default, applabel=app=nginx, appkind=deployment, and the service account from step three. Fifth, keep the blast radius small in that ChaosEngine: PODS_AFFECTED_PERC set to 50 or even target a single pod, a short TOTAL_CHAOS_DURATION like 30 seconds, and FORCE=false so you mimic a graceful eviction instead of a hard kill. Sixth, watch it: tail the runner and experiment pod logs, watch the ChaosResult verdict, and keep an eye on the app's availability the whole time. The safety rules that matter: start in staging, target a Deployment so the ReplicaSet actually reschedules the pod (pod-delete against a bare pod just leaves you down), kill one replica at a time, have a probe or at least a live dashboard so you observe steady state rather than assume it, and know the abort before you start: kubectl delete chaosengine, or set engineState to stop.

Running Your First Pod-Delete Experiment Safely

Sample answer

Why this matters

Code examples

Common mistakes to avoid

Likely follow-ups

More Chaos Engineering interview questions

Also worth your time on this topic

Running Your First Chaos Engineering Experiment with Litmus

Litmus Building Blocks: ChaosEngine vs ChaosExperiment

Running Your First Chaos Engineering Experiment with Litmus