AWS DMS Troubleshooting MCP Server - Root Cause Analysis tool for AWS Database Migration Service replication issues
Project description
AWS DMS Troubleshooting MCP Server
A Model Context Protocol (MCP) server for AWS Database Migration Service (DMS) troubleshooting and Root Cause Analysis (RCA). This server helps customers diagnose and resolve DMS replication issues through automated analysis of replication tasks, CloudWatch logs, and endpoint configurations.
Overview
The AWS DMS Troubleshooting MCP Server is designed to assist with post-migration troubleshooting, particularly for:
- Failed or stopped replication tasks
- CDC (Change Data Capture) replication issues
- Connection and authentication problems
- Network connectivity and security group issues
- VPC routing and configuration problems
- Performance and latency issues
- Configuration errors
Features
-
Replication Task Management
- List all DMS replication tasks with status filtering
- Get detailed task information including statistics and configuration
- Analyze task performance and progress
-
CloudWatch Logs Analysis
- Retrieve and filter DMS task logs
- Identify error patterns and frequencies
- Search logs by time range and severity
-
Endpoint Analysis
- Validate source and target endpoint configurations
- Test endpoint connectivity
- Identify common configuration issues
-
Network Diagnostics
- Analyze security group rules for DMS connectivity
- Diagnose network connectivity issues between replication instances and endpoints
- Check VPC routing, network ACLs, and connectivity options
- Identify VPC peering and Transit Gateway configurations
-
Root Cause Analysis
- Comprehensive diagnosis of failed tasks
- Pattern-based error identification
- Network-level diagnostics integration
- Actionable recommendations based on AWS best practices
-
Documentation Integration
- Context-aware troubleshooting recommendations
- Links to relevant AWS documentation
- Best practice guidance
Installation
Using uv (recommended)
The package is published on PyPI. The
simplest way to run it is with uvx, which downloads and runs the
server without a manual install:
uvx aws-dms-troubleshoot-mcp@latest
Alternatively, install it into your environment with uv or pip:
uv pip install aws-dms-troubleshoot-mcp
# or
pip install aws-dms-troubleshoot-mcp
Both expose the aws-dms-troubleshoot-mcp console command.
From Source
cd aws-dms-troubleshoot-mcp-server
uv sync
Configuration
Environment Variables
# Required: AWS region where your DMS resources are located
export AWS_REGION=us-east-1
# Optional: AWS CLI profile to use
export AWS_PROFILE=default
# Optional: Logging level
export FASTMCP_LOG_LEVEL=INFO
AWS Credentials
The server uses standard AWS credential chain:
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - AWS credentials file (
~/.aws/credentials) - IAM role (when running on EC2, ECS, or Lambda)
MCP Client Configuration
Add to your MCP client configuration (e.g., Claude Desktop, Kiro). The recommended setup uses
uvx to run the published package directly:
{
"mcpServers": {
"aws-dms-troubleshoot": {
"command": "uvx",
"args": ["aws-dms-troubleshoot-mcp@latest"],
"env": {
"AWS_REGION": "us-east-1",
"AWS_PROFILE": "default",
"FASTMCP_LOG_LEVEL": "INFO"
}
}
}
}
To run from a local clone instead (useful for development), point uv at the project directory:
{
"mcpServers": {
"aws-dms-troubleshoot": {
"command": "uv",
"args": [
"--directory",
"/path/to/aws-dms-troubleshoot-mcp-server",
"run",
"aws-dms-troubleshoot-mcp"
],
"env": {
"AWS_REGION": "us-east-1",
"AWS_PROFILE": "default",
"FASTMCP_LOG_LEVEL": "INFO"
}
}
}
}
Note on
AWS_PROFILE: If your environment authenticates with environment-variable credentials (AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY) or an IAM role rather than a named profile in~/.aws/credentials, omitAWS_PROFILEso the default credential chain is used. Setting it todefaultwhen no such profile exists will cause boto3 to fail.
AWS Permissions Required
The IAM user or role needs the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dms:DescribeReplicationTasks",
"dms:DescribeReplicationInstances",
"dms:DescribeEndpoints",
"dms:TestConnection"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"logs:DescribeLogStreams",
"logs:GetLogEvents",
"logs:FilterLogEvents"
],
"Resource": "arn:aws:logs:*:*:log-group:dms-tasks-*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeSecurityGroups",
"ec2:DescribeSecurityGroupRules",
"ec2:DescribeSubnets",
"ec2:DescribeRouteTables",
"ec2:DescribeNetworkAcls",
"ec2:DescribeVpcs",
"ec2:DescribeVpcPeeringConnections",
"ec2:DescribeTransitGatewayAttachments",
"ec2:DescribeNatGateways",
"ec2:DescribeInternetGateways"
],
"Resource": "*"
}
]
}
Note: Network diagnostic features require EC2 read permissions. If these permissions are not available, the server will still function but network diagnostic tools will return permission errors.
Available Tools
1. list_replication_tasks
List all DMS replication tasks with their current status.
Parameters:
region(string, optional): AWS region (default: from environment)aws_profile(string, optional): AWS profile to usestatus_filter(string, optional): Filter by status (running, stopped, failed, etc.)
Example:
await list_replication_tasks(
region="us-east-1",
status_filter="failed"
)
2. get_replication_task_details
Get comprehensive details about a specific replication task.
Parameters:
task_identifier(string, required): Task identifier or ARNregion(string, optional): AWS regionaws_profile(string, optional): AWS profile to use
Example:
await get_replication_task_details(
task_identifier="my-replication-task",
region="us-east-1"
)
3. get_task_cloudwatch_logs
Retrieve CloudWatch logs for a replication task.
Parameters:
task_identifier(string, required): Task identifierregion(string, optional): AWS regionaws_profile(string, optional): AWS profile to usehours_back(integer, optional): Hours of logs to retrieve (default: 24)filter_pattern(string, optional): Log filter pattern (e.g., "ERROR")max_events(integer, optional): Maximum events to return (default: 100)
Example:
await get_task_cloudwatch_logs(
task_identifier="my-replication-task",
hours_back=48,
filter_pattern="ERROR",
max_events=100
)
4. analyze_endpoint
Analyze a DMS endpoint configuration for potential issues.
Parameters:
endpoint_arn(string, required): Endpoint ARNregion(string, optional): AWS regionaws_profile(string, optional): AWS profile to use
Example:
await analyze_endpoint(
endpoint_arn="arn:aws:dms:us-east-1:123456789012:endpoint:ABCDEFG",
region="us-east-1"
)
5. diagnose_replication_issue
Perform comprehensive Root Cause Analysis for a replication task.
Parameters:
task_identifier(string, required): Task identifier to diagnoseregion(string, optional): AWS regionaws_profile(string, optional): AWS profile to use
Example:
await diagnose_replication_issue(
task_identifier="my-failing-task",
region="us-east-1"
)
6. get_troubleshooting_recommendations
Get recommendations based on error patterns.
Parameters:
error_pattern(string, required): Error message or pattern
Example:
await get_troubleshooting_recommendations(
error_pattern="connection timeout"
)
7. analyze_security_groups
Analyze security group rules for DMS replication instance connectivity.
Parameters:
replication_instance_arn(string, required): DMS Replication Instance ARNregion(string, optional): AWS regionaws_profile(string, optional): AWS profile to use
Example:
await analyze_security_groups(
replication_instance_arn="arn:aws:dms:us-east-1:123456789012:rep:ABCDEFG",
region="us-east-1"
)
8. diagnose_network_connectivity
Perform comprehensive network connectivity diagnostics for a DMS task.
Parameters:
task_identifier(string, required): Task identifier to diagnoseregion(string, optional): AWS regionaws_profile(string, optional): AWS profile to use
Example:
await diagnose_network_connectivity(
task_identifier="my-replication-task",
region="us-east-1"
)
9. check_vpc_configuration
Analyze VPC routing, network ACLs, and connectivity configuration.
Parameters:
vpc_id(string, required): VPC ID to analyzeregion(string, optional): AWS regionaws_profile(string, optional): AWS profile to use
Example:
await check_vpc_configuration(
vpc_id="vpc-12345678",
region="us-east-1"
)
Common Use Cases
Post-Migration Troubleshooting
When a replication task fails after migration:
- Use
list_replication_tasksto identify failed tasks - Run
diagnose_replication_issuefor comprehensive RCA - Review
get_task_cloudwatch_logsfor detailed error context - Use
diagnose_network_connectivityto check for network issues - Use
get_troubleshooting_recommendationsfor specific errors - Apply recommended fixes and monitor results
Network Connectivity Issues
When experiencing connection timeouts or network errors:
- Run
diagnose_network_connectivityfor the failing task - Use
analyze_security_groupsto verify security group rules - Check
check_vpc_configurationto validate VPC routing - Verify DNS resolution for endpoint hostnames
- Ensure proper VPC peering or Transit Gateway configuration
- Validate NAT gateway or internet gateway setup
CDC Replication Issues
For Change Data Capture problems:
- Check task status with
get_replication_task_details - Analyze logs for CDC-specific errors
- Verify endpoint configurations support CDC
- Review recommendations for binlog/WAL configuration
- Use
diagnose_network_connectivityto ensure continuous connectivity - Check network connectivity and permissions
Performance Optimization
To investigate slow replication:
- Review task statistics from
get_replication_task_details - Check CloudWatch logs for warnings
- Analyze endpoint configurations for optimization opportunities
- Use
diagnose_network_connectivityto identify network bottlenecks - Get performance-related recommendations
Troubleshooting
Server Won't Start
Issue: Server fails to start or authenticate
Solution:
- Verify AWS credentials are configured correctly
- Check IAM permissions match requirements
- Ensure AWS_REGION is set
- Review logs with
FASTMCP_LOG_LEVEL=DEBUG
No Tasks Found
Issue: list_replication_tasks returns empty
Solution:
- Verify you're using the correct AWS region
- Check that DMS tasks exist in the specified region
- Confirm IAM permissions include
dms:DescribeReplicationTasks
Log Group Not Found
Issue: CloudWatch logs cannot be retrieved
Solution:
- Verify task has been started (logs only exist after task runs)
- Check task identifier is correct
- Ensure CloudWatch Logs permissions are granted
- Confirm logs retention hasn't expired
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Support
Authors & Support Contacts
- Mike Revitt - revittmk@amazon.com
- Hardik Panchal - hkvp@amazon.com
- Wei Chen - wchemz@amazon.com
Resources
Related Resources
Changelog
See CHANGELOG.md for version history and release notes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aws_dms_troubleshoot_mcp-1.0.1.tar.gz.
File metadata
- Download URL: aws_dms_troubleshoot_mcp-1.0.1.tar.gz
- Upload date:
- Size: 136.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
821b3ce47db60cdab456f5ce560fbef203461e9e34f5ba5559855a474bb7772a
|
|
| MD5 |
a7fb397e4a2f581de23cbbaa804fed26
|
|
| BLAKE2b-256 |
edda69218a6c302124ad4473d36ee391a718fea2464564a610f6a91e9881f0c5
|
File details
Details for the file aws_dms_troubleshoot_mcp-1.0.1-py3-none-any.whl.
File metadata
- Download URL: aws_dms_troubleshoot_mcp-1.0.1-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e4006031ed96d08fbcbac593884e1fb4e4f6301e1eb3e97d43b04b23a2cd6ec
|
|
| MD5 |
47d8356001d6ca8d0b48e56c849493b1
|
|
| BLAKE2b-256 |
d47ae397165ff06facbcdf2ac8e9eb11b7603af28970e7450f45cdc7040c6214
|