Running Your First Chaos Engineering Experiment with Litmus

How to install Litmus on Kubernetes and run a controlled failure experiment from a written hypothesis to a verdict you can act on, without breaking production by accident.

14items

Back to all checklists

Chaos EngineeringBeginner

chaos-engineeringlitmuskubernetesresilience

Progress0 / 14 completed

Write down a hypothesis and a steady-state metric before touching anything

Critical

Run the first experiment in staging on a single stateless workload

Critical

Install Litmus in its own namespace with Helm

Critical

Confirm the chaos CRDs and operator are installed and healthy

Critical

Create a ServiceAccount with only the permissions the experiment needs

Critical

Install the pod-delete ChaosExperiment from ChaosHub

Add probes so the experiment knows what 'healthy' means

Critical

Write the ChaosEngine with exact label selectors and a short duration

Critical

Apply the ChaosEngine and tail the runner pod and ChaosResult

Keep your real dashboards and logs open while chaos is running

Critical

Read the ChaosResult, then delete the ChaosEngine

Increase blast radius only after a clean run

Write up the run and file tickets for whatever broke

Critical

Schedule a recurring gameday so the system stays tested

More checklists

Service Mesh

Istio Traffic Management Checklist: Routing, Retries, and Circuit Breaking

How to configure traffic management policies in Istio so your services can do canary releases, retry transient failures, and shed load when a downstream service goes bad. Covers VirtualService, DestinationRule, retries, timeouts, circuit breakers, and outlier detection.

60-90 minutes

GitOps

Argo CD Multi-Environment Repository Structure Checklist

How to organize your Git repositories when running Argo CD across dev, staging, and production. Covers folder layout, app-of-apps, ApplicationSets, secrets, RBAC, and promotion flow.

60-90 minutes

DevOps

GitOps Implementation Checklist

Comprehensive checklist for implementing GitOps practices with repository structure, sync policies, secret management, and deployment strategies.

60-90 minutes

Also worth your time on this topic

Interview

Litmus Building Blocks: ChaosEngine vs ChaosExperiment

You install Litmus on a cluster and want to kill a pod to see what happens. Walk me through the pieces Litmus gives you, and what is the actual difference between a ChaosExperiment and a ChaosEngine?

junior

Article