A comprehensive Model Context Protocol (MCP) server that enables LLMs to execute pandas code through a standardized workflow for data analysis and visualization.
Project description
Pandas-MCP Server
A comprehensive Model Context Protocol (MCP) server that enables LLMs to execute pandas code through a standardized workflow for data analysis and visualization.
๐ฏ MCP Server Overview
The Pandas-MCP Server is designed as a Model Context Protocol (MCP) server that provides LLMs with powerful data processing capabilities. MCP is a standardized protocol that allows AI models to interact with external tools and services in a secure, structured way.
๐ ๏ธ Installation
Prerequisites
- Python 3.8+
- pip package manager
- Git (for cloning the repository)
Step 1: Clone the Repository
git clone <repository-url>
cd pandas-mcp-server
Step 2: Install Dependencies
pip install -r requirements.txt
Step 3: Verify Installation
# Test the CLI interface
python cli.py
# Or test the MCP server directly
python server.py
Dependencies
- pandas>=2.0.0 - Data manipulation and analysis
- fastmcp>=1.0.0 - MCP server framework
- chardet>=5.0.0 - Character encoding detection
- psutil - System monitoring for memory optimization
Claude Desktop Configuration
Add this configuration to your Claude Desktop settings:
{
"mcpServers": {
"pandas-server": {
"type": "stdio",
"command": "python",
"args": ["/path/to/your/pandas-mcp-server/server.py"]
}
}
}
Note: Replace /path/to/your/pandas-mcp-server/server.py with the actual path where you cloned the repository.
Example paths:
- Windows:
"C:\\Users\\YourName\\pandas-mcp-server\\server.py" - macOS/Linux:
"/home/username/pandas-mcp-server/server.py"
Configuration File Location
- Windows:
%APPDATA%\Claude\claude_desktop_config.json - macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Verification
After configuration, restart Claude Desktop. The server should appear in the MCP tools list with three available tools:
read_metadata_tool- File analysisrun_pandas_code_tool- Code executiongenerate_chartjs_tool- Chart generation
๐ Workflow
The pandas MCP server follows a structured three-step workflow for data analysis and visualization:
Step 1: Read File Metadata
LLM calls read_metadata_tool to understand the file structure:
- Extract file type, size, encoding, and column information
- Get data types, sample values, and statistical summaries
- Receive data quality warnings and suggested operations
- Understand the dataset structure before processing
Step 2: Execute Pandas Operations
LLM calls run_pandas_code_tool based on metadata analysis:
- Formulate pandas operations using the understood file structure
- Execute data processing, filtering, aggregation, or analysis
- Receive results in DataFrame, Series, or dictionary format
- Get optimized output with memory management
Step 3: Generate Visualizations
LLM calls generate_chartjs_tool to create interactive charts:
- Transform processed data into Chart.js compatible format
- Generate interactive HTML charts with customization controls
- Create bar, line, or pie charts based on data characteristics
- Output responsive visualizations for analysis presentation
๐ MCP Server Tools
The server exposes three main tools for LLM integration:
1. read_metadata_tool - File Analysis
Extract comprehensive metadata from Excel and CSV files including:
- File type, size, encoding, and structure
- Column names, data types, and sample values
- Statistical summaries (null counts, unique values, min/max/mean)
- Data quality warnings and suggested operations
- Memory-optimized processing for large files
MCP Tool Usage:
{
"tool": "read_metadata_tool",
"args": {
"file_path": "/path/to/sales_data.xlsx"
}
}
2. run_pandas_code_tool - Secure Code Execution
Execute pandas operations with:
- Security filtering against malicious code
- Memory optimization for large datasets
- Comprehensive error handling and debugging
- Support for DataFrame, Series, and dictionary results
MCP Tool Usage:
{
"tool": "run_pandas_code_tool",
"args": {
"code": "import pandas as pd\ndf = pd.read_excel('/path/to/data.xlsx')\nresult = df.groupby('Region')['Sales'].sum()"
}
}
3. generate_chartjs_tool - Interactive Visualizations
Generate interactive charts with Chart.js:
- Bar charts - For categorical comparisons
- Line charts - For trend analysis
- Pie charts - For proportional data
- Interactive HTML templates with customization controls
MCP Tool Usage:
{
"tool": "generate_chartjs_tool",
"args": {
"data": {
"columns": [
{
"name": "Region",
"type": "string",
"examples": ["North", "South", "East", "West"]
},
{
"name": "Sales",
"type": "number",
"examples": [15000, 12000, 18000, 9000]
}
]
},
"chart_types": ["bar"],
"title": "Sales by Region"
}
}
๐ Usage
CLI Interface (Testing & Development)
The cli.py provides a convenient command-line interface for testing the MCP server functionality without requiring an MCP client:
Interactive Mode
python cli.py
Launches a guided menu system with:
- Step-by-step workflow guidance
- Automatic input validation
- Clear error messages
- Support for file paths with spaces
Command-Line Mode
# Read metadata
python cli.py metadata data.xlsx
# Execute pandas code
python cli.py execute analysis.py
# Generate charts
python cli.py chart data.json --type bar --title "Sales Analysis"
๐ Code Logic & Architecture
Core Components
1. Server Architecture (server.py)
- FastMCP Integration: Uses FastMCP framework for MCP protocol implementation
- Logging System: Unified logging with rotation and memory tracking
- Tool Registration: Exposes three main tools with proper error handling
- Memory Monitoring: Tracks memory usage before/after operations
2. Metadata Processing (core/metadata.py)
Key Logic:
- File validation (existence, size limits)
- Encoding detection for CSV files
- Memory-optimized data processing (100-row samples)
- Comprehensive statistical analysis
- Data quality assessment and warnings
Memory Optimization:
- Uses
categorydtype for string columns with low cardinality - Converts float64 to float32 for memory efficiency
- Processes only first 100 rows for metadata extraction
- Forces garbage collection after processing
3. Code Execution (core/execution.py)
Security Features:
- Blacklist filtering for dangerous operations
- Sandboxed execution environment
- Output capture and error handling
- Memory monitoring for large results
Execution Flow:
- Security check against BLACKLIST patterns
- Syntax validation through compilation
- Code execution in isolated environment
- Result formatting and memory optimization
- Output capture and error reporting
4. Chart Generation (core/visualization.py)
Architecture:
- Template-based HTML generation
- Chart.js integration via CDN
- Interactive controls for customization
- Automatic file naming and organization
Chart Types:
- Bar Charts: Categorical data with bar width and Y-axis controls
- Line Charts: Trend analysis with line styling options
- Pie Charts: Proportional data with donut hole and percentage display
5. Chart Generators (core/chart_generators/)
Base Class (base.py):
- Abstract base class for all chart generators
- Template management and file I/O
- Common chart configuration
Specific Generators:
BarChartGenerator: Bar charts with interactive controlsLineChartGenerator: Line charts with tension and stylingPieChartGenerator: Pie charts with legend and percentage options
Data Flow Architecture
User Input โ Security Check โ Processing โ Result โ Output
โ โ โ โ โ
CLI/MCP โ BLACKLIST โ Memory Opt โ Format โ Log/Display
Memory Management Strategy
- Chunked Processing: Large files processed in 10KB chunks
- Type Optimization: Automatic dtype conversion (float64โfloat32, objectโcategory)
- Limited Sampling: Only first 100 rows processed for metadata
- Garbage Collection: Forced cleanup after major operations
- Memory Monitoring: PSutil integration for tracking usage
๐ Project Structure
pandas-mcp-server/
โโโ server.py # MCP server implementation
โโโ cli.py # CLI interface for testing
โโโ requirements.txt # Python dependencies
โโโ core/ # Core functionality
โ โโโ config.py # Configuration and constants
โ โโโ data_types.py # Data type utilities
โ โโโ metadata.py # File metadata extraction
โ โโโ execution.py # Pandas code execution
โ โโโ visualization.py # Chart generation orchestration
โ โโโ chart_generators/ # Chart-specific implementations
โ โโโ __init__.py
โ โโโ base.py # Base chart generator
โ โโโ bar.py # Bar chart generator
โ โโโ line.py # Line chart generator
โ โโโ pie.py # Pie chart generator
โ โโโ templates/ # HTML templates for charts
โโโ charts/ # Generated chart files
โโโ logs/ # Application logs
โโโ csv_metadata_format.md # CSV metadata documentation
โโโ test_*.py # Test files
๐ง Configuration
Core Configuration (core/config.py)
- MAX_FILE_SIZE: 100MB file size limit
- BLACKLIST: Security restrictions for code execution
- CHARTS_DIR: Directory for generated charts
- Logging: Comprehensive logging with rotation
Security Features
- Code execution sandboxing
- Blacklisted operations (file system, network, eval)
- Memory usage monitoring
- Input validation and sanitization
๐ Chart Generation Details
Template System
Charts are generated using HTML templates with:
- Chart.js integration via CDN
- Interactive controls for customization
- Responsive design with mobile support
- Real-time parameter adjustment
Chart Types
Bar Charts
- Interactive controls for bar width and Y-axis scaling
- Responsive design with zoom capabilities
- Data labels and tooltips
- Multiple dataset support
Line Charts
- Multiple line series support
- Adjustable line tension and styling
- Point size and style customization
- Stepped line options
Pie Charts
- Interactive donut hole adjustment
- Percentage/value toggle display
- Legend positioning and styling
- Border width and color controls
๐งช Testing
Running Tests
# Test metadata extraction
python test_metadata.py
# Test pandas code execution
python test_execution.py
# Test chart generation
python test_generate_barchart.py
# Test all chart types
python test_generate_pyecharts.py
Test Data Requirements
- Excel files (.xlsx) with multiple sheets
- CSV files with various encodings
- JSON files with structured data for chart generation
๐ Performance Optimization
Memory Management
- Chunked processing for large files
- Automatic garbage collection
- Memory usage logging
- Dataset size limits
File Processing
- Optimized dtype inference
- Category encoding for string columns
- Float32 precision for numeric data
- Streaming CSV reading
๐ Logging
Log Structure
- mcp_server.log: Main application log
- memory_usage: Memory consumption tracking
- metadata: File processing details
Log Levels
- DEBUG: Detailed processing information
- INFO: General operation status
- WARNING: Non-critical issues
- ERROR: Processing failures
๐ Troubleshooting
Common Issues
MCP Connection Issues
- Verify server path in Claude Desktop configuration
- Check Python environment and dependencies
- Ensure server.py is executable
- Review MCP server logs for connection errors
File Not Found
- Verify file path is absolute
- Check file permissions
- Ensure file exists before processing
Memory Issues
- Reduce file size or use chunked processing
- Monitor memory usage in logs
- Consider data sampling for large datasets
Chart Generation Errors
- Verify data structure matches expected format
- Check for required columns (string + numeric)
- Ensure Chart.js CDN accessibility
Debug Mode
Enable debug logging by setting environment variable:
export LOG_LEVEL=DEBUG
python server.py
๐ค Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
๐ Additional Documentation
- CSV Metadata Format: See
csv_metadata_format.mdfor detailed CSV processing documentation
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Support
For issues and questions:
- Check the troubleshooting section
- Review log files in the
logs/directory - Open an issue on GitHub with reproduction steps
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iflow_mcp_pandas_mcp_server-0.1.0.tar.gz.
File metadata
- Download URL: iflow_mcp_pandas_mcp_server-0.1.0.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42590b237cfa7fa0d6958cd32208a45ba62a34cee7e5e8e51c3153bfd9c99343
|
|
| MD5 |
45107724f0b28813134a832c5d1b7b83
|
|
| BLAKE2b-256 |
b6c9ea9e592fe10c4d5ef47d07991d373e1287c3753ea439c7a12efe631ecd13
|
File details
Details for the file iflow_mcp_pandas_mcp_server-0.1.0-py3-none-any.whl.
File metadata
- Download URL: iflow_mcp_pandas_mcp_server-0.1.0-py3-none-any.whl
- Upload date:
- Size: 48.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d03f89e771f760f32646dcc39e0032a76699ba485afaa477a59016090d467f87
|
|
| MD5 |
dbab48903acedda34e6ccc47fbdd308c
|
|
| BLAKE2b-256 |
c19afe08a8539840e9a56b0637ec64d51d5f6d2521dc882745604642532e8394
|