Skip to main content

Litmus Building Blocks: ChaosEngine vs ChaosExperiment

You install Litmus on a cluster and want to kill a pod to see what happens. Walk me through the pieces Litmus gives you, and what is the actual difference between a ChaosExperiment and a ChaosEngine?

junior
beginner
Chaos Engineering
Question

You install Litmus on a cluster and want to kill a pod to see what happens. Walk me through the pieces Litmus gives you, and what is the actual difference between a ChaosExperiment and a ChaosEngine?

Answer

Litmus is built around a few custom resources plus an operator that watches them. A ChaosExperiment is the reusable template: it describes one fault (pod-delete, pod-network-latency, node-drain), the litmus-go container image that runs it, the default tunables, and the RBAC permissions that fault needs. You install these from the ChaosHub, and on their own they do nothing. Think of ChaosExperiment as the recipe for what could go wrong. A ChaosEngine is the order ticket: it binds one or more ChaosExperiments to a specific target through appinfo (appns, applabel, appkind), names the chaosServiceAccount to run as, lets you override env, and attaches probes. Applying a ChaosEngine is what actually triggers a run. The chaos-operator reconciles that ChaosEngine, spins up a chaos-runner pod, and the runner launches the experiment job that injects the fault. The third resource is ChaosResult, which holds the verdict: Pass, Fail, or Awaited, plus probeSuccessPercentage and the failStep. That is your source of truth for whether the hypothesis held, not just whether the pod came back. So the short version: ChaosExperiment is the recipe, ChaosEngine is the order, ChaosResult is the review.

Why This Matters

This question tells you fast whether someone has actually run Litmus or only read about chaos engineering in the abstract. The ChaosExperiment versus ChaosEngine split is the single thing beginners get muddled. Listen for the operator and runner flow, and for the candidate naming ChaosResult as the place you read the outcome. If they say 'the pod rescheduled so it passed' without mentioning the verdict resource, they have done the tutorial once and stopped thinking.

Code Examples

A minimal ChaosEngine that runs the pod-delete experiment

yaml

Install the experiment definition, then read the verdict

bash
Common Mistakes
  • Thinking that installing Litmus alone runs chaos; you still need the experiment definition, an RBAC service account, and a ChaosEngine to trigger anything
  • Using ChaosEngine and ChaosExperiment as if they are the same object
  • Treating a rescheduled pod as a pass instead of reading the ChaosResult verdict
Follow-up Questions
Interviewers often ask these as follow-up questions
  • Where does the ChaosExperiment definition actually come from, and what is the ChaosHub?
  • What is the chaos-runner pod doing, and how is it different from the chaos-operator?
  • If the deleted pod comes back but ChaosResult never reaches Pass, what would you go look at first?
Tags
chaos-engineering
litmus
kubernetes
resilience
Sponsored
Carbon Ads

More Chaos Engineering interview questions

Also worth your time on this topic