DataBeak: MCP server for comprehensive CSV file operations with pandas-based tools
Project description
DataBeak
AI-Powered CSV Processing via Model Context Protocol
Transform how AI assistants work with CSV data. DataBeak provides 40+ specialized tools for data manipulation, analysis, and validation through the Model Context Protocol (MCP).
Features
- 🔄 Complete Data Operations - Load, transform, analyze, and export CSV data
- 📊 Advanced Analytics - Statistics, correlations, outlier detection, data profiling
- ✅ Data Validation - Schema validation, quality scoring, anomaly detection
- 🎯 Stateless Design - Clean MCP architecture with external context management
- ⚡ High Performance - Handles large datasets with streaming and chunking
- 🔒 Session Management - Multi-user support with isolated sessions
- 🌟 Code Quality - Zero ruff violations, 100% mypy compliance, perfect MCP documentation standards, comprehensive test coverage
Getting Started
The fastest way to use DataBeak is with uvx (no installation required):
For Claude Desktop
Add this to your MCP Settings file:
{
"mcpServers": {
"databeak": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/jonpspri/databeak.git",
"databeak"
]
}
}
}
For Other AI Clients
DataBeak works with Continue, Cline, Windsurf, and Zed. See the installation guide for specific configuration examples.
Docker Deployment
For production deployments or HTTP-based AI clients:
# Quick start with Docker Compose
docker-compose up -d
# Access server at http://localhost:8000/mcp
# Health check at http://localhost:8000/health
See the Docker deployment guide for production configuration, scaling, and security considerations.
Quick Test
Once configured, ask your AI assistant:
"Load a CSV file and show me basic statistics"
"Remove duplicate rows and export as Excel"
"Find outliers in the price column"
Documentation
- Installation Guide - Setup for all AI clients
- Quick Start Tutorial - Learn in 10 minutes
- API Reference - All 40+ tools documented
- Architecture - Technical details
Environment Variables
| Variable | Default | Description |
|---|---|---|
DATABEAK_MAX_FILE_SIZE_MB |
1024 | Maximum file size |
DATABEAK_CSV_HISTORY_DIR |
"." | History storage location |
DATABEAK_SESSION_TIMEOUT |
3600 | Session timeout (seconds) |
Contributing
We welcome contributions! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with tests
- Run quality checks:
uv run -m pytest - Submit a pull request
Note: All changes must go through pull requests. Direct commits to main
are blocked by pre-commit hooks.
Development
# Setup development environment
git clone https://github.com/jonpspri/databeak.git
cd databeak
uv sync
# Run the server locally
uv run databeak
# Run tests
uv run -m pytest tests/unit/ # Unit tests (primary)
uv run -m pytest # All tests
# Run quality checks
uv run ruff check
uv run mypy src/databeak/
Testing Structure
DataBeak currently focuses on comprehensive unit testing with future plans for integration and E2E testing:
- Unit Tests (
tests/unit/) - Fast, isolated module tests (current focus) - Integration Tests (
tests/integration/) - Future: FastMCP Client-based testing - E2E Tests (
tests/e2e/) - Future: Complete workflow validation
Current Test Execution:
uv run pytest -n auto tests/unit/ # Run unit tests (primary)
uv run pytest -n auto --cov=src/databeak # Run with coverage analysis
See Testing Guide for comprehensive testing details.
License
Apache 2.0 - see LICENSE file.
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: jonpspri.github.io/databeak
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databeak-0.0.4.tar.gz.
File metadata
- Download URL: databeak-0.0.4.tar.gz
- Upload date:
- Size: 206.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3585a3834d23c6af75e92ae361a35ccfeda28f802662c117e03ae7e8eec9693a
|
|
| MD5 |
133870e9730a7caf9e4fc4ac039de196
|
|
| BLAKE2b-256 |
af31a7b1b6d61d2ad5d22b18260099fb0bb84a8496a97e1254cc0f4407652b77
|
Provenance
The following attestation bundles were made for databeak-0.0.4.tar.gz:
Publisher:
publish.yml on jonpspri/databeak
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databeak-0.0.4.tar.gz -
Subject digest:
3585a3834d23c6af75e92ae361a35ccfeda28f802662c117e03ae7e8eec9693a - Sigstore transparency entry: 572169894
- Sigstore integration time:
-
Permalink:
jonpspri/databeak@d2de17bead9d4935fe7498ae12077ff1e63077d3 -
Branch / Tag:
refs/tags/v0.0.4 - Owner: https://github.com/jonpspri
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d2de17bead9d4935fe7498ae12077ff1e63077d3 -
Trigger Event:
release
-
Statement type:
File details
Details for the file databeak-0.0.4-py3-none-any.whl.
File metadata
- Download URL: databeak-0.0.4-py3-none-any.whl
- Upload date:
- Size: 115.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31f788cc08c012d36d58b333c688001934b25e0522623b3923da938c89fbf7da
|
|
| MD5 |
0893fc8dd28379d48d70670b1c750934
|
|
| BLAKE2b-256 |
e9209956b64744598b8854c532490e098b86caa0f4b4461bf03e70f7ea172052
|
Provenance
The following attestation bundles were made for databeak-0.0.4-py3-none-any.whl:
Publisher:
publish.yml on jonpspri/databeak
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databeak-0.0.4-py3-none-any.whl -
Subject digest:
31f788cc08c012d36d58b333c688001934b25e0522623b3923da938c89fbf7da - Sigstore transparency entry: 572169950
- Sigstore integration time:
-
Permalink:
jonpspri/databeak@d2de17bead9d4935fe7498ae12077ff1e63077d3 -
Branch / Tag:
refs/tags/v0.0.4 - Owner: https://github.com/jonpspri
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d2de17bead9d4935fe7498ae12077ff1e63077d3 -
Trigger Event:
release
-
Statement type: