Skip to main content

MCP Server for Chaos Engineering with Chaos Mesh on EKS

Project description

Chaos Mesh MCP Server

An MCP server that enables AI agents to perform chaos engineering through Chaos Mesh on EKS clusters.

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   AI Agent      │───▶│   MCP Server     │───▶│  EKS Cluster    │
│                 │    │                  │    │                 │
│ - Failure       │    │ - OIDC Auth      │    │ - Chaos Mesh    │
│   Scenarios     │    │ - K8s API Calls  │    │ - Workloads     │
│ - Experiment    │    │ - Experiment     │    │ - Monitoring    │
│   Planning      │    │   Management     │    │ - Resource Info │
│ - Result        │    │ - Resource Query │    │                 │
│   Analysis      │    │                  │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Key Features

1. Authentication and Authorization Management

  • OIDC-based EKS cluster authentication
  • RBAC permission validation
  • Token renewal and management

2. Chaos Mesh Experiment Management

  • Experiment creation and execution
  • Experiment status monitoring
  • Experiment termination and cleanup

3. Chaos Engineering Tools

  • Pod failure injection
  • Network failure simulation
  • Storage failure testing
  • Time and stress testing

4. EKS Cluster Resource Management

  • Node information retrieval
  • Namespace listing and details
  • Deployment status monitoring
  • Service discovery
  • Pod status and health checks
  • Cluster summary information

Available MCP Tools

Cluster Management

  • add_remote_cluster - Add EKS cluster to management
  • list_remote_clusters - List all managed clusters
  • install_chaos_mesh - Install Chaos Mesh on cluster

Resource Information

  • get_cluster_nodes - Get detailed node information
  • get_cluster_namespaces - Get namespace information
  • get_cluster_deployments - Get deployment status (all or specific namespace)
  • get_cluster_services - Get service information (all or specific namespace)
  • get_cluster_pods - Get pod information (all or specific namespace)
  • get_cluster_resource_summary - Get cluster resource overview and summary

Chaos Experiments

  • create_pod_chaos_experiment - Create pod failure experiments
  • create_network_chaos_experiment - Create network failure experiments
  • create_stress_chaos_experiment - Create stress testing experiments
  • create_io_chaos_experiment - Create I/O failure experiments
  • create_dns_chaos_experiment - Create DNS failure experiments
  • create_time_chaos_experiment - Create time manipulation experiments

Installation and Setup

  1. Install Chaos Mesh on EKS cluster
  2. Configure OIDC provider
  3. Set up RBAC permissions
  4. Deploy MCP server

Security Considerations

  • Apply principle of least privilege
  • Limit experiment scope
  • Record audit logs
  • Implement safety mechanisms

Usage

Before setting the cluster, you should add IAM Role to cluster RBAC group to access your cluster with system:managers permission. You can add it with Kubernets RBAC configuration You also add a mapping between an IAM role to a Kubernetes user and groups. With AWS CDK, see Masters Role and addRoleMapping

You can initialize your cluster with env or manually add with add_remote_cluter tool.

*.json file

{
  "mcpServers": {
    "sequential-thinking": {
      "command": "uvx",
      "args": ["chaos-mesh-mcp-server@latest"],
      "env": {
        "CLUSTERS_CONFIG": "{my-cluster-name}:{my-cluster-region},{my-cluster-2-name}:{my-cluster-2-region}"
        "AWS_ACCESS_KEY": "KEY",
        "AWS_SECRET_ID": "SECRET",
      }
    }
  }
}

Strands Agent SDK

chaos_mesh_mcp_client = MCPClient(
    lambda: stdio_client(
        StdioServerParameters(
            command="uvx",
            args=["chaos-mesh-mcp-server@latest"],
            env={
                "CLUSTERS_CONFIG": "{my-cluster-name}:{my-cluster-region},{my-cluster-2-name}:{my-cluster-2-region}",
                "AWS_ACCESS_KEY": "KEY",
                "AWS_SECRET_ID": "SECRET",
            },
        )
    )
)

chaos_mesh_mcp_client.start()

agent = Agent(
    model,
    system_prompt,
    tools=[chaos_mesh_mcp_client.list_tools_sync()],
)

Example Usage

Get Cluster Information

# Get cluster summary
cluster_summary = await get_cluster_resource_summary("my-cluster")

# Get specific resource information
nodes = await get_cluster_nodes("my-cluster")
namespaces = await get_cluster_namespaces("my-cluster")
deployments = await get_cluster_deployments("my-cluster", "default")

Create Chaos Experiments

# Create pod chaos experiment
result = await create_pod_chaos_experiment(
    cluster_name="my-cluster",
    namespace="default",
    target_app="my-app",
    action="pod-kill",
    duration="30s"
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chaos_mesh_mcp_server-6.2.0.tar.gz (43.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chaos_mesh_mcp_server-6.2.0-py3-none-any.whl (51.7 kB view details)

Uploaded Python 3

File details

Details for the file chaos_mesh_mcp_server-6.2.0.tar.gz.

File metadata

File hashes

Hashes for chaos_mesh_mcp_server-6.2.0.tar.gz
Algorithm Hash digest
SHA256 4c03a408c0cc61b6742ea264c83d9ae7914e32a2926e224c1ff4a92e5723318b
MD5 5e832e95de05c82c97c17fc31f17f116
BLAKE2b-256 cefc607e4647bc6d37900db024a617934181187b5e7b5b333bd32b8ab505dc43

See more details on using hashes here.

File details

Details for the file chaos_mesh_mcp_server-6.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for chaos_mesh_mcp_server-6.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c45b86467275049fc09fae8fb683826e8945c410ba3aa6fbeaa7b2c6ec2f2e8d
MD5 9c685b137653131c9c7ad68140962ddd
BLAKE2b-256 9dbc261eea043a07b25c361e93697246cb74ac543ac6b2824481df9d3d49a929

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page