Skip to main content

An AWS Labs Model Context Protocol (MCP) server for dataprocessing

Project description

Amazon Data Processing MCP Server

The AWS DataProcessing MCP server provides AI code assistants with comprehensive data processing tools and real-time pipeline visibility across AWS Glue and Amazon EMR-EC2. This integration equips large language models (LLMs) with essential data engineering capabilities and contextual awareness, enabling AI code assistants to streamline data processing workflows through intelligent guidance — from initial data discovery and cataloging through complex ETL pipeline orchestration and big data analytics optimization.

Integrating the DataProcessing MCP server into AI code assistants transforms data engineering workflows across all phases, from simplifying data catalog management with automated schema discovery and data quality validation. Additionally, it streamlines ETL job creation with intelligent code generation and best practice recommendations. It accelerates big data processing through automated EMR cluster provisioning and workload optimization. Finally, it enhances troubleshooting through intelligent debugging tools and operational insights. All of this simplifies complex data operations through natural language interactions in AI code assistants.

Key features

AWS Glue Integration

  • Data Catalog Management: Enables users to explore, create, and manage databases, tables, and partitions through natural language requests, automatically translating them into appropriate AWS Glue Data Catalog operations.

Prerequisites

Setup

Add these IAM policies to the IAM role or user that you use to manage your Glue, EMR-EC2 or Athena resources.

Read-Only Operations Policy

For read operations, the following permissions are required:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase*",
        "glue:GetTable*",
        "glue:GetPartition*",
        "glue:GetConnection*",
        "glue:GetDatabases",
        "glue:GetTables",
        "glue:SearchTables",
        "cloudwatch:GetMetricData",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams",
      ],
      "Resource": "*"
    }
  ]
}

Write Operations Policy

For write operations, we recommend the following IAM policies:

  • AWSGlueServiceRole: Enables Glue service operations including job execution, crawler runs, and data catalog modifications

Important Security Note: Users should exercise caution when --allow-write and --allow-sensitive-data-access modes are enabled with these broad permissions, as this combination grants significant privileges to the MCP server. Only enable these flags when necessary and in trusted environments.

Resource Management Limitation: The DataProcessing MCP Server can only update or delete resources that were originally created through it. Resources created by other means cannot be modified or deleted using the DataProcessing MCP Server.

Quickstart

This quickstart guide walks you through the steps to configure the Amazon Data Processing MCP Server for use with both the Cursor IDE and the Amazon Q Developer CLI. By following these steps, you'll setup your development environment to leverage the Data Processing MCP Server's tools for managing your Glue, EMR and Athena resources.

Set up Cursor

  1. Open Cursor.
  2. Click the gear icon (⚙️) in the top right to open the settings panel, click MCP, Add new global MCP server.
  3. Paste your MCP server definition. For example, this example shows how to configure the Data Processing MCP Server, including enabling mutating actions by adding the --allow-write flag to the server arguments:
{
  "mcpServers": {
    "aws.aws-dataprocessing-mcp-server": {
      "autoApprove": [],
      "disabled": false,
      "command": "uvx",
      "args": [
        "aws.aws-dataprocessing-mcp-server@latest",
        "--allow-write"
      ],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR",
        "AWS_REGION": "us-east-1"
      },
      "transportType": "stdio"
    }
  }
}

After a few minutes, you should see a green indicator if your MCP server definition is valid.

  1. Open a chat panel in Cursor (e.g., Ctrl/⌘ + L). In your Cursor chat window, enter your prompt. For example, "Look at all the tables from my account federated across GDC"

Set up the Amazon Q Developer CLI

  1. Install the Amazon Q Developer CLI .
  2. The Q Developer CLI supports MCP servers for tools and prompts out-of-the-box. Edit your Q developer CLI's MCP configuration file named mcp.json following these instructions. For example:
{
  "mcpServers": {
    "aws.aws-dataprocessing-mcp-server": {
      "command": "uvx",
      "args": ["aws.aws-dataprocessing-mcp-server@latest"],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR"
      },
      "autoApprove": [],
      "disabled": false
    }
  }
}
  1. Verify your setup by running the /tools command in the Q Developer CLI to see the available Data Processing MCP tools.

Note that this is a basic quickstart. You can enable additional capabilities, such as running MCP servers in containers or combining more MCP servers like the AWS Documentation MCP Server into a single MCP server definition. To view an example, see the Installation and Setup guide in AWS MCP Servers on GitHub. To view a real-world implementation with application code in context with an MCP server, see the Server Developer guide in Anthropic documentation.

Configurations

Arguments

The args field in the MCP server definition specifies the command-line arguments passed to the server when it starts. These arguments control how the server is executed and configured. For example:

{
  "mcpServers": {
    "awslabs.aws-dataprocessing-mcp-server": {
      "command": "uvx",
      "args": [
        "aws.aws-dataprocessing-mcp-server@latest",
        "--allow-write",
        "--allow-sensitive-data-access"
      ],
      "env": {
        "AWS_PROFILE": "your-profile",
        "AWS_REGION": "us-east-1"
      }
    }
  }
}

awslabs.aws-dataprocessing-mcp-server@latest (required)

Specifies the latest package/version specifier for the MCP client config.

  • Enables MCP server startup and tool registration.

--allow-write (optional)

Enables write access mode, which allows mutating operations (e.g., create, update, delete resources)

  • Default: false (The server runs in read-only mode by default)
  • Example: Add --allow-write to the args list in your MCP server definition.

--allow-sensitive-data-access (optional)

Enables access to sensitive data such as logs, events, and Kubernetes Secrets.

  • Default: false (Access to sensitive data is restricted by default)
  • Example: Add --allow-sensitive-data-access to the args list in your MCP server definition.

Environment variables

The env field in the MCP server definition allows you to configure environment variables that control the behavior of the DataProcessing MCP server. For example:

{
  "mcpServers": {
    "awslabs.aws-dataprocessing-mcp-server": {
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR",
        "AWS_PROFILE": "my-profile",
        "AWS_REGION": "us-west-2"
      }
    }
  }
}

FASTMCP_LOG_LEVEL (optional)

Sets the logging level verbosity for the server.

  • Valid values: "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"
  • Default: "WARNING"
  • Example: "FASTMCP_LOG_LEVEL": "ERROR"

AWS_PROFILE (optional)

Specifies the AWS profile to use for authentication.

  • Default: None (If not set, uses default AWS credentials).
  • Example: "AWS_PROFILE": "my-profile"

AWS_REGION (optional)

Specifies the AWS region where Glue,EMR clusters or Athena are managed, which will be used for all AWS service operations.

  • Default: None (If not set, uses default AWS region).
  • Example: "AWS_REGION": "us-west-2"

Tools

Glue Data Catalog Handler Tools

Tool Name Description Key Operations Requirements
manage_aws_glue_databases Manage AWS Glue Data Catalog databases create-database, delete-database, get-database, list-databases, update-database --allow-write flag for create/delete/update operations, appropriate AWS permissions
manage_aws_glue_tables Manage AWS Glue Data Catalog tables create-table, delete-table, get-table, list-tables, update-table, search-tables --allow-write flag for create/delete/update operations, database must exist, appropriate AWS permissions
manage_aws_glue_connections Manage AWS Glue Data Catalog connections create-connection, delete-connection, get-connection, list-connections, update-connection --allow-write flag for create/delete/update operations, appropriate AWS permissions
manage_aws_glue_partitions Manage AWS Glue Data Catalog partitions create-partition, delete-partition, get-partition, list-partitions, update-partition --allow-write flag for create/delete/update operations, database and table must exist, appropriate AWS permissions
manage_aws_glue_catalog Manage AWS Glue Data Catalog create-catalog, delete-catalog, get-catalog, list-catalogs, import-catalog-to-glue --allow-write flag for create/delete/import operations, appropriate AWS permissions

Version

Current MCP server version: 0.1.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awslabs_aws_dataprocessing_mcp_server-0.1.1.tar.gz (117.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file awslabs_aws_dataprocessing_mcp_server-0.1.1.tar.gz.

File metadata

File hashes

Hashes for awslabs_aws_dataprocessing_mcp_server-0.1.1.tar.gz
Algorithm Hash digest
SHA256 617ac7d27dad5703e5b169df69ecb0734ab7c4c350735d2bd202a855656653a7
MD5 30e156533a7e5b08519b4cee7efdc575
BLAKE2b-256 336b58f94a98c0e80a85b003820361184ad74ddcb250452c9e00a6bd266553bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for awslabs_aws_dataprocessing_mcp_server-0.1.1.tar.gz:

Publisher: release.yml on awslabs/mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file awslabs_aws_dataprocessing_mcp_server-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for awslabs_aws_dataprocessing_mcp_server-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3405a945f1b8f1d86afdfe4ad977828df5b37c991f372dbb5a4f17fa3ffab6b0
MD5 3e6a90d47c08d3e9324f851d427dcf95
BLAKE2b-256 7c22a927fe6e42c716b5989e6b3b2efdc0decd3179d1aaa860e065e86ae8ac30

See more details on using hashes here.

Provenance

The following attestation bundles were made for awslabs_aws_dataprocessing_mcp_server-0.1.1-py3-none-any.whl:

Publisher: release.yml on awslabs/mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page