Custom Chaos Toolkit extension to simulate AZ failure on AWS resources
Project description
Chaos Toolkit AZ Failure Extension for AWS
This project is a collection of actions, gathered as an extension to the Chaos Toolkit to test the resiliency of your applications hosted on AWS.
Install
This package requires Python 3.5+
To be used from your experiment, this package must be installed in the Python environment where chaostoolkit already lives.
Install via pip
$ pip install -U aws-az-failure-chaostoolkit
Usage
To use the probes and actions from this package, add the following to your experiment file (replace Key1 and Value1 to the appropriate key-value pair you tagged your resources with):
This action removes subnets belonging to the target AZ in all tagged ASGs and suspends AZRebalance process if its running, or updates the min, max and desired capacity to 0 for the ASG if it's only configured for one AZ:
- type: action
name: Simulate AZ Failure for ASG
provider:
type: python
module: azchaosaws.asg.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
tags:
- Key: "Key1"
Value: "Value1"
This action with network failure will affect tagged/filtered subnets in the target AZ by replacing the current NACL association with a newly created blackhole NACL:
- type: action
name: Simulate AZ Failure for EC2
provider:
type: python
module: azchaosaws.ec2.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
failure_type: "network"
filters:
- Name: tag:TagKey1
Values:
- "TagValue1"
This action with instance failure will affect tagged/filtered instances in the target AZ that are in pending/running state by stopping/terminating normal/spot instances:
- type: action
name: Simulate AZ Failure for EC2
provider:
type: python
module: azchaosaws.ec2.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
failure_type: "instance"
filters:
- Name: tag:TagKey1
Values:
- "TagValue1"
This action removes subnets from target AZ in tagged application load balancers:
- type: action
name: Simulate AZ Failure for ALB
provider:
type: python
module: azchaosaws.elbv2.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
tags:
- Key: "Key1"
Value: "Value1"
This action detaches classic load balancers from subnets belonging to target AZ if they are in non-default VPC, and disables target AZ from classic load balancer if they are in a default VPC:
- type: action
name: Simulate AZ Failure for CLB
provider:
type: python
module: azchaosaws.elb.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
tags:
- Key: "Key1"
Value: "Value1"
This action forces RDS to reboot and failver to another AZ:
- type: action
name: Simulate AZ Failure for RDS
provider:
type: python
module: azchaosaws.rds.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
tags:
- Key: "Key1"
Value: "Value1"
This action forces ElastiCache (cluster mode disabled) to failover primary nodes if exists in the target az:
- type: action
name: Simulate AZ Failure for ElastiCache (cluster mode disabled)
provider:
type: python
module: azchaosaws.elasticache.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
tags:
- Key: "Key1"
Value: "Value1"
This action forces ElastiCache (cluster mode enabled) to failover the shards provided as cache cluster ids (sequential if multiple shards of same cluster) (replace ReplicationGroup1, CacheClusterId1 and CacheClusterId2 if needed):
- type: action
name: Simulate AZ Failure for ElastiCache (cluster mode enabled)
provider:
type: python
module: azchaosaws.elasticache.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
tags:
- Key: "Key1"
Value: "Value1"
replication_groups:
- replication_group_id: ReplicationGroup1
cache_cluster_ids:
- CacheClusterId1
- CacheClusterId2
This action removes subnets belonging to the target AZ in all nodegroup ASGs that are part of the tagged EKS clusters and suspends AZRebalance process if its running. Network failure will affect subnets of the nodegroups in the target AZ by associating a newly created blackhole NACL. All its previous NACL association will be replaced with the blackhole NACL:
- type: action
name: Simulate AZ Failure for EKS Clusters
provider:
type: python
module: azchaosaws.eks.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
failure_type: "network"
tags:
- Key1: "Value1"
This action removes subnets belonging to the target AZ in all nodegroup ASGs that are part of the tagged EKS clusters and suspends AZRebalance process if its running. Instance failure will affect instances part of the node groups that are in the target AZ that are in pending/running state by stopping normal/spot instances:
- type: action
name: Simulate AZ Failure for EKS Clusters
provider:
type: python
module: azchaosaws.eks.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
failure_type: "instance"
tags:
- Key1: "Value1"
This action reboots the specified brokers that are tagged, or tagged brokers if broker_ids not specified:
- type: action
name: Simulate AZ Failure for Amazon MQ (ActiveMQ)
provider:
type: python
module: azchaosaws.mq.actions
func: fail_az
arguments:
az: "ap-southeast-1a"
tags:
- "Key1": "Value1"
broker_ids:
- BrokerId1
- BrokerId2
To 'rollback' the changes made by the fail_az
action, you can use recover_az
in your experiment template. The recover_az
function will read the state file generated and rollback if it's a service that's supported.
Please explore the code to see existing probes, actions and supported capabilities.
Alternatively, you can run chaos discover aws-az-failure-chaostoolkit
to view the list of supported actions and probes along with their required and optional arguments for each service in the generated discovery.json
file.
Do also note that by default, the dry_run
argument for the functions are set to True
. Set it to False
if you want the actions to make changes your resources.
Configuration
Develop
If you wish to develop on this project, make sure to install the development dependencies. But first, create a virtual environment and then install those dependencies.
$ pip install -r requirements-dev.txt -r requirements.txt
Then, point your environment to this directory:
$ pip install -e .
Now, you can edit the files and they will be automatically be seen by your
environment, even when running from the chaos
command locally.
Test
To run the tests for the project execute the following:
$ pytest
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for aws-az-failure-chaostoolkit-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e01f4f5ec101d906acddc002f8db908707a5d6ac40eb2d78917f9b224b8a495 |
|
MD5 | e7fc0cc5bbdb18cb540b4386622f792a |
|
BLAKE2b-256 | 881de7d01d384c38dadeb3f309c9de20b183e9288f5669752c89e6f536dcaea3 |
Hashes for aws_az_failure_chaostoolkit-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94cce96df196ba6e353d183741cbe575ea24042860011ab71d8745d35d513f50 |
|
MD5 | 266ee37b59ac3907dbe5233f37a9326d |
|
BLAKE2b-256 | 7d48d4e0a275c960b25762b0587e75e1e56aa7f21b4436fa584f97e466f76290 |