DataBeak: MCP server for comprehensive CSV file operations with pandas-based tools
Project description
DataBeak
AI-Powered CSV Processing via Model Context Protocol
Transform how AI assistants work with CSV data. DataBeak provides 40+ specialized tools for data manipulation, analysis, and validation through the Model Context Protocol (MCP).
Features
- 🔄 Complete Data Operations - Load, transform, analyze, and export CSV data
- 📊 Advanced Analytics - Statistics, correlations, outlier detection, data profiling
- ✅ Data Validation - Schema validation, quality scoring, anomaly detection
- 🎯 Stateless Design - Clean MCP architecture with external context management
- ⚡ High Performance - Handles large datasets with streaming and chunking
- 🔒 Session Management - Multi-user support with isolated sessions
- 🌟 Code Quality - Zero ruff violations, 100% mypy compliance, perfect MCP documentation standards, comprehensive test coverage
Getting Started
The fastest way to use DataBeak is with uvx (no installation required):
For Claude Desktop
Add this to your MCP Settings file:
{
"mcpServers": {
"databeak": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/jonpspri/databeak.git",
"databeak"
]
}
}
}
For Other AI Clients
DataBeak works with Continue, Cline, Windsurf, and Zed. See the installation guide for specific configuration examples.
HTTP Mode (Advanced)
For HTTP-based AI clients or custom deployments:
# Run in HTTP mode
uv run databeak --transport http --host 0.0.0.0 --port 8000
# Access server at http://localhost:8000/mcp
# Health check at http://localhost:8000/health
Quick Test
Once configured, ask your AI assistant:
"Load a CSV file and show me basic statistics"
"Remove duplicate rows and export as Excel"
"Find outliers in the price column"
Documentation
- Installation Guide - Setup for all AI clients
- Quick Start Tutorial - Learn in 10 minutes
- API Reference - All 40+ tools documented
- Architecture - Technical details
Environment Variables
| Variable | Default | Description |
|---|---|---|
DATABEAK_MAX_FILE_SIZE_MB |
1024 | Maximum file size |
DATABEAK_CSV_HISTORY_DIR |
"." | History storage location |
DATABEAK_SESSION_TIMEOUT |
3600 | Session timeout (seconds) |
Known Limitations
DataBeak is designed for interactive CSV processing with AI assistants. Be aware of these constraints:
- File Size: Maximum 1024MB per file (configurable via
DATABEAK_MAX_FILE_SIZE_MB) - Session Management: Maximum 100 concurrent sessions, 1-hour timeout (configurable)
- Memory: Large datasets may require significant memory; monitor with
system_infotool - CSV Dialects: Assumes standard CSV format; complex dialects may require pre-processing
- Concurrency: Single-threaded processing per session; parallel sessions supported
- Data Types: Automatic type inference; complex types may need explicit conversion
- URL Loading: HTTPS only; blocks private networks (127.0.0.1, 192.168.x.x, 10.x.x.x) for security
For production deployments with larger datasets, consider adjusting environment variables and monitoring resource usage.
Contributing
We welcome contributions! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with tests
- Run quality checks:
uv run -m pytest - Submit a pull request
Note: All changes must go through pull requests. Direct commits to main
are blocked by pre-commit hooks.
Development
# Setup development environment
git clone https://github.com/jonpspri/databeak.git
cd databeak
uv sync
# Run the server locally
uv run databeak
# Run tests
uv run -m pytest tests/unit/ # Unit tests (primary)
uv run -m pytest # All tests
# Run quality checks
uv run ruff check
uv run mypy src/databeak/
Testing Structure
DataBeak implements comprehensive unit and integration testing:
- Unit Tests (
tests/unit/) - 940+ fast, isolated module tests - Integration Tests (
tests/integration/) - 43 FastMCP Client-based protocol tests across 7 test files - E2E Tests (
tests/e2e/) - Planned: Complete workflow validation
Test Execution:
uv run pytest -n auto tests/unit/ # Run unit tests (940+ tests)
uv run pytest -n auto tests/integration/ # Run integration tests (43 tests)
uv run pytest -n auto --cov=src/databeak # Run with coverage analysis
See Testing Guide for comprehensive testing details.
License
Apache 2.0 - see LICENSE file.
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: jonpspri.github.io/databeak
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databeak-0.1.1.tar.gz.
File metadata
- Download URL: databeak-0.1.1.tar.gz
- Upload date:
- Size: 189.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fda72ff4cdf23ab34f9a72a173df77e31792e4f3087d56287b5360a54a9495e
|
|
| MD5 |
3a1fc7ba3e9cc9156bd2e3ec9b8c8af8
|
|
| BLAKE2b-256 |
1d1e717a611b33c91c9271f29ec92cd8c25712642ab5468cd6343382bf12f061
|
Provenance
The following attestation bundles were made for databeak-0.1.1.tar.gz:
Publisher:
publish.yml on jonpspri/databeak
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databeak-0.1.1.tar.gz -
Subject digest:
5fda72ff4cdf23ab34f9a72a173df77e31792e4f3087d56287b5360a54a9495e - Sigstore transparency entry: 585365315
- Sigstore integration time:
-
Permalink:
jonpspri/databeak@392c90180345ba79ae866f0599072ddd9cfa0cd4 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/jonpspri
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@392c90180345ba79ae866f0599072ddd9cfa0cd4 -
Trigger Event:
release
-
Statement type:
File details
Details for the file databeak-0.1.1-py3-none-any.whl.
File metadata
- Download URL: databeak-0.1.1-py3-none-any.whl
- Upload date:
- Size: 102.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92bce9c8dca9a93618c36893c8bd2e79c03aaeb1979eab46aa276ddbaae28ea8
|
|
| MD5 |
da6edf9af9f04e7b08083a679b08505e
|
|
| BLAKE2b-256 |
0d54b3e8151064d7bda00378fa644b2f113d70f76edd2acd9d3c9f7d81b1f49c
|
Provenance
The following attestation bundles were made for databeak-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on jonpspri/databeak
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databeak-0.1.1-py3-none-any.whl -
Subject digest:
92bce9c8dca9a93618c36893c8bd2e79c03aaeb1979eab46aa276ddbaae28ea8 - Sigstore transparency entry: 585365334
- Sigstore integration time:
-
Permalink:
jonpspri/databeak@392c90180345ba79ae866f0599072ddd9cfa0cd4 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/jonpspri
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@392c90180345ba79ae866f0599072ddd9cfa0cd4 -
Trigger Event:
release
-
Statement type: