Litmus Building Blocks: ChaosEngine vs ChaosExperiment
You install Litmus on a cluster and want to kill a pod to see what happens. Walk me through the pieces Litmus gives you, and what is the actual difference between a ChaosExperiment and a ChaosEngine?
You install Litmus on a cluster and want to kill a pod to see what happens. Walk me through the pieces Litmus gives you, and what is the actual difference between a ChaosExperiment and a ChaosEngine?
Litmus is built around a few custom resources plus an operator that watches them. A ChaosExperiment is the reusable template: it describes one fault (pod-delete, pod-network-latency, node-drain), the litmus-go container image that runs it, the default tunables, and the RBAC permissions that fault needs. You install these from the ChaosHub, and on their own they do nothing. Think of ChaosExperiment as the recipe for what could go wrong. A ChaosEngine is the order ticket: it binds one or more ChaosExperiments to a specific target through appinfo (appns, applabel, appkind), names the chaosServiceAccount to run as, lets you override env, and attaches probes. Applying a ChaosEngine is what actually triggers a run. The chaos-operator reconciles that ChaosEngine, spins up a chaos-runner pod, and the runner launches the experiment job that injects the fault. The third resource is ChaosResult, which holds the verdict: Pass, Fail, or Awaited, plus probeSuccessPercentage and the failStep. That is your source of truth for whether the hypothesis held, not just whether the pod came back. So the short version: ChaosExperiment is the recipe, ChaosEngine is the order, ChaosResult is the review.
This question tells you fast whether someone has actually run Litmus or only read about chaos engineering in the abstract. The ChaosExperiment versus ChaosEngine split is the single thing beginners get muddled. Listen for the operator and runner flow, and for the candidate naming ChaosResult as the place you read the outcome. If they say 'the pod rescheduled so it passed' without mentioning the verdict resource, they have done the tutorial once and stopped thinking.
A minimal ChaosEngine that runs the pod-delete experiment
Install the experiment definition, then read the verdict
- Thinking that installing Litmus alone runs chaos; you still need the experiment definition, an RBAC service account, and a ChaosEngine to trigger anything
- Using ChaosEngine and ChaosExperiment as if they are the same object
- Treating a rescheduled pod as a pass instead of reading the ChaosResult verdict
- Where does the ChaosExperiment definition actually come from, and what is the ChaosHub?
- What is the chaos-runner pod doing, and how is it different from the chaos-operator?
- If the deleted pod comes back but ChaosResult never reaches Pass, what would you go look at first?
More Chaos Engineering interview questions
Also worth your time on this topic
Running Your First Chaos Engineering Experiment with Litmus
How to install Litmus on Kubernetes and run a controlled failure experiment from a written hypothesis to a verdict you can act on, without breaking production by accident.
90-150 minutes
Running Your First Pod-Delete Experiment Safely
I hand you a fresh cluster with a demo nginx deployment. Take me from nothing to a controlled pod-delete experiment. What are the steps, and how do you keep it from turning into an outage?
mid
Running Your First Chaos Engineering Experiment with Litmus
A hands-on walkthrough of installing LitmusChaos on Kubernetes, killing pods on purpose, and watching whether your app actually recovers. Real YAML, real output, no theory.