Skip to main content

A comprehensive Spark event log analysis MCP server with performance monitoring and optimization recommendations

Project description

Spark EventLog MCP Server

中文版本 | English

A comprehensive Spark event log analysis MCP server built on FastMCP 2.0 and FastAPI, providing in-depth performance analysis, resource monitoring, and optimization recommendations.

Features

  • 🌐 FastMCP & FastAPI Integration: MCP protocol support and analysis report APIs powered by FastAPI & FastMCP
  • 📊 Performance Analysis: Shuffle analysis, resource utilization monitoring, task execution analysis
  • 📈 Visual Reports: Auto-generated interactive HTML reports with direct browser access
  • ☁️ Multiple Data Sources: Support for S3, HTTP URLs, and local files
  • 💡 Intelligent Optimization: Automated optimization recommendations based on analysis results

Quick Start

MCP Client Integration

stdio Mode (Recommended for Local Development)

{
  "mcpServers": {
    "spark-eventlog": {
      "command": "uv run python",
      "args": ["/path/to/spark-eventlog-mcp/start.py"],
      "env": {
        "MCP_TRANSPORT": "stdio"
      }
    }
  }
}

HTTP Mode

1. Start HTTP Server:

export MCP_TRANSPORT=streamable-http
export MCP_HOST=localhost
export MCP_PORT=7799

uv run python start.py

2. Configure Remote MCP:

{
  "mcpServers": {
    "spark-eventlog": {
      "url": "http://localhost:7799/mcp",
      "type": "http"
    }
  }
}

3. Access Services:

Analysis Examples

emr-serverless-small-job

emr-eks-big-job

emr-eks-big-job-sub-01

emr-eks-big-job-sub-02

Project Structure

spark-eventlog-mcp/
├── src/spark_eventlog_mcp/
│   ├── server.py              # FastAPI + MCP integrated server
│   ├── core/
│   │   └── mature_data_loader.py    # Data loader (S3/URL/Local)
│   ├── tools/
│   │   ├── mature_analyzer.py       # Event log analyzer
│   │   └── mature_report_generator.py  # HTML report generator
│   ├── models/
│   │   ├── schemas.py        # Pydantic data models
│   │   └── mature_models.py  # Analysis result models
│   └── utils/
│       └── helpers.py         # Utility functions and logging config
├── report_data/               # Generated reports storage
├── start.py                   # Launch script
├── README.md                 # This file (English)
└── README_zh.md              # Chinese version

MCP Tools

Tool Name Description
parse_eventlog Parse event logs (S3/URL/Local)
analyze_performance Execute performance analysis
generate_report Generate visual reports
get_optimization_suggestions Get optimization recommendations
get_analysis_status Query current analysis status
clear_session Clear session cache

RESTful API Endpoints

Basic Endpoints

  • GET / - Service information
  • GET /health - Health check
  • GET /docs - API documentation (Swagger UI)

Report Management

  • GET /api/reports - List all reports
  • GET /api/reports/{filename} - View HTML report
  • GET /reports/{filename} - Direct access to report files
  • DELETE /api/reports/{filename} - Delete report

MCP Tool Calls

  • POST /mcp - MCP protocol endpoint

Configuration

Environment Variables

# Server Configuration
MCP_TRANSPORT=http          # stdio or streamable-http
MCP_HOST=0.0.0.0           # HTTP mode listen address
MCP_PORT=7799              # HTTP mode port
LOG_LEVEL=INFO             # Log level

# AWS S3 Configuration (Optional)
# Not needed if AWS CLI is configured or running on EC2 with appropriate IAM role
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_DEFAULT_REGION=us-east-1

# Cache Configuration
CACHE_ENABLED=true
CACHE_TTL=300

# Default Data Source
DEFAULT_SOURCE_TYPE=s3  # s3, url, or local

Log Format

Logs contain detailed debugging information:

2025-12-05 10:30:45 - INFO     - [server.py:243:generate_report] - spark-eventlog-mcp - Generating html report

Format: Timestamp - Level - [Filename:Line:Function] - Logger Name - Message

Data Source Support

S3

{
    "source_type": "s3",
    "path": "s3://bucket-name/path/to/eventlogs/"
}

HTTP URL

{
    "source_type": "url",
    "path": "https://example.com/eventlog.zip"
}

Local File

{
    "source_type": "local",
    "path": "/path/to/local/eventlog.zip"
}

Report Features

Generated HTML reports include:

  • 📊 Application Overview (task counts, success rate, duration)
  • 💻 Executor Resource Usage Distribution
  • 🔄 Shuffle Performance Analysis
  • ⚖️ Data Skew Detection
  • 💡 Intelligent Optimization Recommendations
  • 📈 Interactive Visualizations

Troubleshooting

Port Already in Use

# Change port
MCP_PORT=9090 python start.py

Missing Dependencies

# Reinstall dependencies
uv pip install -e .

AWS Credentials Issues

# Check AWS configuration
aws configure list

# Or configure in .env
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx

Debug Logging

# Enable DEBUG logs
LOG_LEVEL=DEBUG uv run python start.py

Tech Stack

  • FastMCP 2.0: MCP protocol support
  • FastAPI: RESTful API framework
  • Pydantic: Data validation and serialization
  • Plotly: Interactive charts
  • boto3: AWS S3 integration
  • aiofiles: Async file operations

Development

# Clone repository
git clone <repository-url>
cd spark-eventlog-mcp

# Install development dependencies
uv pip install -e .

# MCP Inspector - stdio mode
MCP_TRANSPORT="stdio" npx @modelcontextprotocol/inspector uv run python start.py

# MCP Inspector - HTTP mode
MCP_TRANSPORT="streamable-http" uv run python start.py
npx @modelcontextprotocol/inspector --cli http://localhost:7799 --transport http --method tools/list

Support

  • Documentation: Check /docs API documentation
  • Issues: Submit GitHub Issues
  • Reference: FastMCP Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_eventlog_mcp-0.1.0.tar.gz (54.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spark_eventlog_mcp-0.1.0-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file spark_eventlog_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: spark_eventlog_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 54.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spark_eventlog_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0db1877f17772d94f6e2579f27d8600e9a264dd0ec1562fa320fa86c39d38b18
MD5 bb7682e6c55ecfeafebbb0d289b3afc7
BLAKE2b-256 a1aeafdf2b67e557eaf295973d07029d80fb44e4436c2129816ca85cdfc05abc

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark_eventlog_mcp-0.1.0.tar.gz:

Publisher: publish-to-pypi.yml on yhyyz/spark-eventlog-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spark_eventlog_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for spark_eventlog_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87e0250b9e8b84ecb62068b5d1c99d7a6686d6ef02e538b0819897d6f385c399
MD5 1069d0eb047519fa0e24f9c5e8c367f1
BLAKE2b-256 c5fb59799c35e7857093d84205247aa8aaa8dc7ec3b69984c3173c8c714af254

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark_eventlog_mcp-0.1.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on yhyyz/spark-eventlog-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page