Chaos engineering and resilience

Chaos engineering is the practice of injecting controlled failures into a system to validate that it handles disruptions gracefully. Fault injection is the mechanism that makes this possible. It introduces errors like network latency, resource unavailability, or sudden load.

Azure Chaos Studio applies these principles as a managed service. You can run preconfigured Scenarios through a Workspace, or build custom experiments with fine-grained control over faults, targets, and sequencing.

Why resilience testing matters

Distributed cloud applications depend on infrastructure, services, and networks that can fail independently. A disruption in one component can cascade into a system-wide incident if the application wasn't designed to tolerate it. Examples include a database failover, a DNS outage, or an availability zone going offline.

Resilience is a property of the whole system, not individual components. The only way to know whether your application survives a specific failure pattern is to test it under that condition. Chaos engineering provides a structured way to do this in preproduction and production environments.

How Chaos Studio applies chaos engineering

Chaos Studio injects faults against Azure resources in a controlled, time-bounded manner. An experiment defines which faults to run, against which resources, in what order. Faults can run in parallel or sequentially. Many continuous faults are time-bounded and remove their temporary changes when the experiment ends. For example, the fault removes the NSG rules it added or restarts the resources it stopped. Verify the cleanup behavior for each fault you use by checking the Fault and action library.

For a deeper look at experiment structure, see Chaos experiments in Azure Chaos Studio. For the list of available faults, see the Fault and action library.

Next steps

Feedback

Was this page helpful?

Last updated on 2026-06-19