A comprehensive Spark event log analysis MCP server with performance monitoring and optimization recommendations
Project description
Spark EventLog MCP Server
中文版本 | English
A comprehensive Spark event log analysis MCP server built on FastMCP 2.0 and FastAPI, providing in-depth performance analysis, resource monitoring, and optimization recommendations.
Features
- 🌐 FastMCP & FastAPI Integration: MCP protocol support and analysis report APIs powered by FastAPI & FastMCP
- 📊 Performance Analysis: Shuffle analysis, resource utilization monitoring, task execution analysis
- 📈 Visual Reports: Auto-generated interactive HTML reports with direct browser access
- ☁️ Multiple Data Sources: Support for S3, HTTP URLs, and local files
- 💡 Intelligent Optimization: Automated optimization recommendations based on analysis results
Quick Start
MCP Client Integration
stdio Mode (Recommended for Local Development)
{
"mcpServers": {
"spark-eventlog": {
"command": "uv run python",
"args": ["/path/to/spark-eventlog-mcp/start.py"],
"env": {
"MCP_TRANSPORT": "stdio"
}
}
}
}
HTTP Mode
1. Start HTTP Server:
export MCP_TRANSPORT=streamable-http
export MCP_HOST=localhost
export MCP_PORT=7799
uv run python start.py
2. Configure Remote MCP:
{
"mcpServers": {
"spark-eventlog": {
"url": "http://localhost:7799/mcp",
"type": "http"
}
}
}
3. Access Services:
- API Documentation: http://localhost:7799/docs
- Health Check: http://localhost:7799/health
- Reports List: http://localhost:7799/api/reports
- MCP Endpoint: http://localhost:7799/mcp
Analysis Examples
Project Structure
spark-eventlog-mcp/
├── src/spark_eventlog_mcp/
│ ├── server.py # FastAPI + MCP integrated server
│ ├── core/
│ │ └── mature_data_loader.py # Data loader (S3/URL/Local)
│ ├── tools/
│ │ ├── mature_analyzer.py # Event log analyzer
│ │ └── mature_report_generator.py # HTML report generator
│ ├── models/
│ │ ├── schemas.py # Pydantic data models
│ │ └── mature_models.py # Analysis result models
│ └── utils/
│ └── helpers.py # Utility functions and logging config
├── report_data/ # Generated reports storage
├── start.py # Launch script
├── README.md # This file (English)
└── README_zh.md # Chinese version
MCP Tools
| Tool Name | Description |
|---|---|
parse_eventlog |
Parse event logs (S3/URL/Local) |
analyze_performance |
Execute performance analysis |
generate_report |
Generate visual reports |
get_optimization_suggestions |
Get optimization recommendations |
get_analysis_status |
Query current analysis status |
clear_session |
Clear session cache |
RESTful API Endpoints
Basic Endpoints
GET /- Service informationGET /health- Health checkGET /docs- API documentation (Swagger UI)
Report Management
GET /api/reports- List all reportsGET /api/reports/{filename}- View HTML reportGET /reports/{filename}- Direct access to report filesDELETE /api/reports/{filename}- Delete report
MCP Tool Calls
POST /mcp- MCP protocol endpoint
Configuration
Environment Variables
# Server Configuration
MCP_TRANSPORT=http # stdio or streamable-http
MCP_HOST=0.0.0.0 # HTTP mode listen address
MCP_PORT=7799 # HTTP mode port
LOG_LEVEL=INFO # Log level
# AWS S3 Configuration (Optional)
# Not needed if AWS CLI is configured or running on EC2 with appropriate IAM role
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_DEFAULT_REGION=us-east-1
# Cache Configuration
CACHE_ENABLED=true
CACHE_TTL=300
# Default Data Source
DEFAULT_SOURCE_TYPE=s3 # s3, url, or local
Log Format
Logs contain detailed debugging information:
2025-12-05 10:30:45 - INFO - [server.py:243:generate_report] - spark-eventlog-mcp - Generating html report
Format: Timestamp - Level - [Filename:Line:Function] - Logger Name - Message
Data Source Support
S3
{
"source_type": "s3",
"path": "s3://bucket-name/path/to/eventlogs/"
}
HTTP URL
{
"source_type": "url",
"path": "https://example.com/eventlog.zip"
}
Local File
{
"source_type": "local",
"path": "/path/to/local/eventlog.zip"
}
Report Features
Generated HTML reports include:
- 📊 Application Overview (task counts, success rate, duration)
- 💻 Executor Resource Usage Distribution
- 🔄 Shuffle Performance Analysis
- ⚖️ Data Skew Detection
- 💡 Intelligent Optimization Recommendations
- 📈 Interactive Visualizations
Troubleshooting
Port Already in Use
# Change port
MCP_PORT=9090 python start.py
Missing Dependencies
# Reinstall dependencies
uv pip install -e .
AWS Credentials Issues
# Check AWS configuration
aws configure list
# Or configure in .env
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
Debug Logging
# Enable DEBUG logs
LOG_LEVEL=DEBUG uv run python start.py
Tech Stack
- FastMCP 2.0: MCP protocol support
- FastAPI: RESTful API framework
- Pydantic: Data validation and serialization
- Plotly: Interactive charts
- boto3: AWS S3 integration
- aiofiles: Async file operations
Development
# Clone repository
git clone <repository-url>
cd spark-eventlog-mcp
# Install development dependencies
uv pip install -e .
# MCP Inspector - stdio mode
MCP_TRANSPORT="stdio" npx @modelcontextprotocol/inspector uv run python start.py
# MCP Inspector - HTTP mode
MCP_TRANSPORT="streamable-http" uv run python start.py
npx @modelcontextprotocol/inspector --cli http://localhost:7799 --transport http --method tools/list
Support
- Documentation: Check
/docsAPI documentation - Issues: Submit GitHub Issues
- Reference: FastMCP Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spark_eventlog_mcp-0.1.0.tar.gz.
File metadata
- Download URL: spark_eventlog_mcp-0.1.0.tar.gz
- Upload date:
- Size: 54.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0db1877f17772d94f6e2579f27d8600e9a264dd0ec1562fa320fa86c39d38b18
|
|
| MD5 |
bb7682e6c55ecfeafebbb0d289b3afc7
|
|
| BLAKE2b-256 |
a1aeafdf2b67e557eaf295973d07029d80fb44e4436c2129816ca85cdfc05abc
|
Provenance
The following attestation bundles were made for spark_eventlog_mcp-0.1.0.tar.gz:
Publisher:
publish-to-pypi.yml on yhyyz/spark-eventlog-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
spark_eventlog_mcp-0.1.0.tar.gz -
Subject digest:
0db1877f17772d94f6e2579f27d8600e9a264dd0ec1562fa320fa86c39d38b18 - Sigstore transparency entry: 746267099
- Sigstore integration time:
-
Permalink:
yhyyz/spark-eventlog-mcp@8edd2040dacfedb03cc924ff683b09233336dcd9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/yhyyz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@8edd2040dacfedb03cc924ff683b09233336dcd9 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file spark_eventlog_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: spark_eventlog_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 53.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87e0250b9e8b84ecb62068b5d1c99d7a6686d6ef02e538b0819897d6f385c399
|
|
| MD5 |
1069d0eb047519fa0e24f9c5e8c367f1
|
|
| BLAKE2b-256 |
c5fb59799c35e7857093d84205247aa8aaa8dc7ec3b69984c3173c8c714af254
|
Provenance
The following attestation bundles were made for spark_eventlog_mcp-0.1.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on yhyyz/spark-eventlog-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
spark_eventlog_mcp-0.1.0-py3-none-any.whl -
Subject digest:
87e0250b9e8b84ecb62068b5d1c99d7a6686d6ef02e538b0819897d6f385c399 - Sigstore transparency entry: 746267119
- Sigstore integration time:
-
Permalink:
yhyyz/spark-eventlog-mcp@8edd2040dacfedb03cc924ff683b09233336dcd9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/yhyyz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@8edd2040dacfedb03cc924ff683b09233336dcd9 -
Trigger Event:
workflow_dispatch
-
Statement type: