Skip to main content

Chaos engineering on AWS

Project description

Chaos Imp

PyPI Version License

Chaos Imp is a framework for creating, executing, and running chaos engineering (CE) experiments on AWS. It provides shorthand syntax to express experiment templates, executions, and automations. With just a few lines, you can define the experiment you want and model it using YAML and shell scripts. During deployment, Chaos Imp transforms and expands your YAML and shell scripts into AWS CloudFormation syntax, enabling you to run chaos experiments faster.

Chaos Imp uses a plethora of AWS services under the hood. It glues Systems Manager Agent (SSM), Failure Injection Simulator (FIS), Events, and Lambda APIs to create an easy-to-use tool around the following components of the CE process:

  • Defining infrastructure, application, and security failure injection templates.
  • Running CE experiments in a controlled way by using AWS access capabilities.
  • Automating experiments to be run continuously.

What benefits does Chaos Imp bring to organizations when compared to SSM/FIS/Lambda?

  • Experiment scripts are decoupled from YAML, which means that they are much more easily editable and can be re-used across multiple experiments.
  • Templates and automations are automatically managed via CloudFormation templates, which makes it easy to control and cleanup.
  • CLI API is very minimalist. It has three namespaces for creating templates, running experiments, and setting up automations. No need to worry about gluing different services together and resolving IAM shenanigans.
  • Chaos Imp uses unified config file syntax. Think of it as AWS SAM for chaos engineering.

Installation

Chaos Imp is a Python package. To install it run:

pip install chaosimp

Now, you can start using scripts and classes from the chaosimp package. You can also run CLI commands. Chaos Imp supports four namespaces:

  • config: get, list, and set operations.
  • templates: list, show, create, update, and delete operations.
  • experiments: get, get-by-id, list, start, and stop operations.
  • automations: list, show, create, update, and delete operations.

For example, to list all of your templates run:

imp templates list

The CLI is self-documenting, so you can learn about any command by running:

imp <COMMAND_NAME> --help

Creating Experiment Templates

Check out Chaos Imp example templates that include resource, network, and state chaos experiments.

Let's create a simple experiment that stresses CPUs of several EC2 instances.

Experiment Boundaries

You can perform experiments on a variety of different AWS resources. Chaos Imp automatically translates resources defined in the YAML experiment template to AWS FIS targets.

For example, to target a subset of EC2 instances tagged with imp: ec2-experiment define the following target in `imp.yml:

Targets:
  - Name: "ec2-instances"
    ResourceType: "aws:ec2:instance"
    ResourceTags:
      - Key: "imp"
        Value: "ec2-experiment"
    SelectionMode: "ALL"

This defines a FIS target that experiment actions can be applied to.

Actions

Now, let's define a custom Chaos Imp action that runs a script with stress-ng stressing CPUs:

Actions:
  - Name: "stress-cpus"
    Type: "imp:run-script"
    Target: "ec2-instances"
    Parameters:
      Duration: "PT1M"
    Document:
      Path: "stress-cpu.sh"

This defines a Chaos Imp action that is later translated into a FIS action. To avoid confusion, you can use all FIS action types defined in the official documentation.

Chaos Imp introduces its own namespace and action type into the mix: imp:run-script. This action functions just like aws:ssm:send-command except for you can reference a local file instead of documentArn and documentVersion.

Now, we just add an experiment script file stress-cpu.sh:

#!/bin/bash

sudo yum -y install stress-ng
stress-ng --cpu 0 --cpu-method matrixprod --cpu-load 100 -t 20s

This will install stress-ng and apply 100% load on all CPUs for 20 seconds.

Creating a Template

Before creating a template, you have to create a role with a policy that allows FIS to run actions.

You can reference this role with every template creation call by using --role-arn but it's much more convenient to store it in the local config:

imp config set TemplateRoleArn <ROLE_ARN>

We are finally ready to create our first template:

imp templates create --path . cpu-stress

Running Experiments

Before running an experiment on EC2 instances those instances have to be assigned a role with a policy that allows them to interact with SSM. This is required for all FIS SSM actions as well as Chaos Imp special actions.

Once instances are ready, you can run an experiment based on the template we created:

imp experiments start --template cpu-stress my-cpu-experiment

This will create and run an experiment in FIS. If you run subsequent experiments with the same name you can list all experiment executions by running:

imp experiments get my-cpu-experiment

If you are interested in the specific instance of an experiment then run:

imp experiments get-by-id <EXPERIMENT_ID>

Automating Experiments

Experiment automation is a work in progress. Chaos Imp uses a combination of CloudWatch Events and Lambda Functions to create automations.

Unfortunately, AWS SDK is out of date in the Lambda runtime and doesn't support FIS yet, so you'll have to create a Docker image with an updated AWS SDK in it.

First, download Dockerfile and app.py on your machine. Then run the following commands to create and push an image to your private AWS ECR:

aws ecr get-login-password | docker login --username AWS --password-stdin <AWS_ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com
docker build -t imp-automation .
docker tag imp-automation:latest <REPO_URL>/imp-automation:latest
docker push <REPO_URL>/imp-automation:latest

This will become unnecessary once Lambda supports a more recent SDK.

To create an automation run:

imp automations create \
    --schedule="rate(30 minutes)" \
    --template="cpu-stress" \
    --image=<AWS_ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/imp-automation:latest \
    cpu-stress-automation

This will create a CloudWatch Event Rule that will kickoff a Lambda every 30 minutes. The Lambda starts a FIS experiment.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chaosimp-0.1.0.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

chaosimp-0.1.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file chaosimp-0.1.0.tar.gz.

File metadata

  • Download URL: chaosimp-0.1.0.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.6

File hashes

Hashes for chaosimp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aad3e4e57d270dc6866b8c266d72fdebe7b509a8e49187768389da99d49a912d
MD5 d26212503cf62c338c39f83a7e0dae06
BLAKE2b-256 53745e048f9093bd971318c35a40d423aef8af856ba823896e11654249ae3ddb

See more details on using hashes here.

File details

Details for the file chaosimp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: chaosimp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.6

File hashes

Hashes for chaosimp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f94267ddfe2ec65fa17f432a3f89e3142d7900fb0982a483a796cbf0af58c4a
MD5 63f938edf39c4af72a9693ff3a0cbe6d
BLAKE2b-256 af9186597cbca9361726cda5ba5f96bf4c34d4329acceb306ee07aefc3349a33

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page