Skip to main content

MCP Server for Chaos Engineering with Chaos Mesh on EKS

Project description

Chaos Mesh MCP Server

An MCP server that enables AI agents to perform chaos engineering through Chaos Mesh on EKS clusters.

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   AI Agent      │───▶│   MCP Server     │───▶│  EKS Cluster    │
│                 │    │                  │    │                 │
│ - Failure       │    │ - OIDC Auth      │    │ - Chaos Mesh    │
│   Scenarios     │    │ - K8s API Calls  │    │ - Workloads     │
│ - Experiment    │    │ - Experiment     │    │ - Monitoring    │
│   Planning      │    │   Management     │    │ - Resource Info │
│ - Result        │    │ - Resource Query │    │                 │
│   Analysis      │    │                  │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Key Features

1. Authentication and Authorization Management

  • OIDC-based EKS cluster authentication
  • RBAC permission validation
  • Token renewal and management

2. Chaos Mesh Experiment Management

  • Experiment creation and execution
  • Experiment status monitoring
  • Experiment termination and cleanup

3. Chaos Engineering Tools

  • Pod failure injection
  • Network failure simulation
  • Storage failure testing
  • Time and stress testing

4. EKS Cluster Resource Management

  • Node information retrieval
  • Namespace listing and details
  • Deployment status monitoring
  • Service discovery
  • Pod status and health checks
  • Cluster summary information

Available MCP Tools

Cluster Management

  • add_remote_cluster - Add EKS cluster to management
  • list_remote_clusters - List all managed clusters
  • install_chaos_mesh - Install Chaos Mesh on cluster

Resource Information

  • get_cluster_nodes - Get detailed node information
  • get_cluster_namespaces - Get namespace information
  • get_cluster_deployments - Get deployment status (all or specific namespace)
  • get_cluster_services - Get service information (all or specific namespace)
  • get_cluster_pods - Get pod information (all or specific namespace)
  • get_cluster_resource_summary - Get cluster resource overview and summary

Chaos Experiments

  • create_pod_chaos_experiment - Create pod failure experiments
  • create_network_chaos_experiment - Create network failure experiments
  • create_stress_chaos_experiment - Create stress testing experiments
  • create_io_chaos_experiment - Create I/O failure experiments
  • create_dns_chaos_experiment - Create DNS failure experiments
  • create_time_chaos_experiment - Create time manipulation experiments

Installation and Setup

  1. Install Chaos Mesh on EKS cluster
  2. Configure OIDC provider
  3. Set up RBAC permissions
  4. Deploy MCP server

Security Considerations

  • Apply principle of least privilege
  • Limit experiment scope
  • Record audit logs
  • Implement safety mechanisms

Usage

Before setting the cluster, you should add IAM Role to cluster RBAC group to access your cluster with system:managers permission. You can add it with Kubernets RBAC configuration You also add a mapping between an IAM role to a Kubernetes user and groups. With AWS CDK, see Masters Role and addRoleMapping

You can initialize your cluster with env or manually add with add_remote_cluter tool.

*.json file

{
  "mcpServers": {
    "sequential-thinking": {
      "command": "uvx",
      "args": ["chaos-mesh-mcp-server@latest"],
      "env": {
        "CLUSTERS_CONFIG": "{my-cluster-name}:{my-cluster-region},{my-cluster-2-name}:{my-cluster-2-region}"
        "AWS_ACCESS_KEY": "KEY",
        "AWS_SECRET_ID": "SECRET",
      }
    }
  }
}

Strands Agent SDK

chaos_mesh_mcp_client = MCPClient(
    lambda: stdio_client(
        StdioServerParameters(
            command="uvx",
            args=["chaos-mesh-mcp-server@latest"],
            env={
                "CLUSTERS_CONFIG": "{my-cluster-name}:{my-cluster-region},{my-cluster-2-name}:{my-cluster-2-region}",
                "AWS_ACCESS_KEY": "KEY",
                "AWS_SECRET_ID": "SECRET",
            },
        )
    )
)

chaos_mesh_mcp_client.start()

agent = Agent(
    model,
    system_prompt,
    tools=[chaos_mesh_mcp_client.list_tools_sync()],
)

Example Usage

Get Cluster Information

# Get cluster summary
cluster_summary = await get_cluster_resource_summary("my-cluster")

# Get specific resource information
nodes = await get_cluster_nodes("my-cluster")
namespaces = await get_cluster_namespaces("my-cluster")
deployments = await get_cluster_deployments("my-cluster", "default")

Create Chaos Experiments

# Create pod chaos experiment
result = await create_pod_chaos_experiment(
    cluster_name="my-cluster",
    namespace="default",
    target_app="my-app",
    action="pod-kill",
    duration="30s"
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chaos_mesh_mcp_server-6.1.0.tar.gz (43.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chaos_mesh_mcp_server-6.1.0-py3-none-any.whl (51.4 kB view details)

Uploaded Python 3

File details

Details for the file chaos_mesh_mcp_server-6.1.0.tar.gz.

File metadata

File hashes

Hashes for chaos_mesh_mcp_server-6.1.0.tar.gz
Algorithm Hash digest
SHA256 998ffdfda9a165058c60bbccdd72da306fe52d4d9800e9e3c4e380f58cfb6ff8
MD5 03de9f8e29fd711acb2bfd7a3e4610e4
BLAKE2b-256 6df84315a0ffc0cccb384579fa3c23cd86428e174a713bdf982f124beccbe0ad

See more details on using hashes here.

File details

Details for the file chaos_mesh_mcp_server-6.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for chaos_mesh_mcp_server-6.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a965630f5b287e84148bbf32c61f6d9773498f3d5b7828f2dab37122b1c7cae2
MD5 8c19b2fa45866a550f01f3256c881566
BLAKE2b-256 eef9b20180d64fda985d6116fe1e19c04a3b2295f9b73ee344ad738822551a63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page