Skip to main content

DataBeak: MCP server for comprehensive CSV file operations with pandas-based tools

Project description

DataBeak

AI-Powered CSV Processing via Model Context Protocol

Transform how AI assistants work with CSV data. DataBeak provides 40+ specialized tools for data manipulation, analysis, and validation through the Model Context Protocol (MCP).

Features

  • 🔄 Complete Data Operations - Load, transform, analyze, and export CSV data
  • 📊 Advanced Analytics - Statistics, correlations, outlier detection, data profiling
  • Data Validation - Schema validation, quality scoring, anomaly detection
  • 🎯 Stateless Design - Clean MCP architecture with external context management
  • High Performance - Handles large datasets with streaming and chunking
  • 🔒 Session Management - Multi-user support with isolated sessions
  • 🌟 Code Quality - Zero ruff violations, 100% mypy compliance, perfect MCP documentation standards, comprehensive test coverage

Getting Started

The fastest way to use DataBeak is with uvx (no installation required):

For Claude Desktop

Add this to your MCP Settings file:

{
  "mcpServers": {
    "databeak": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/jonpspri/databeak.git",
        "databeak"
      ]
    }
  }
}

For Other AI Clients

DataBeak works with Continue, Cline, Windsurf, and Zed. See the installation guide for specific configuration examples.

Docker Deployment

For production deployments or HTTP-based AI clients:

# Quick start with Docker Compose
docker-compose up -d

# Access server at http://localhost:8000/mcp
# Health check at http://localhost:8000/health

See the Docker deployment guide for production configuration, scaling, and security considerations.

Quick Test

Once configured, ask your AI assistant:

"Load a CSV file and show me basic statistics"
"Remove duplicate rows and export as Excel"
"Find outliers in the price column"

Documentation

📚 Complete Documentation

Environment Variables

Variable Default Description
DATABEAK_MAX_FILE_SIZE_MB 1024 Maximum file size
DATABEAK_CSV_HISTORY_DIR "." History storage location
DATABEAK_SESSION_TIMEOUT 3600 Session timeout (seconds)

Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes with tests
  4. Run quality checks: uv run -m pytest
  5. Submit a pull request

Note: All changes must go through pull requests. Direct commits to main are blocked by pre-commit hooks.

Development

# Setup development environment
git clone https://github.com/jonpspri/databeak.git
cd databeak
uv sync

# Run the server locally
uv run databeak

# Run tests
uv run -m pytest tests/unit/          # Unit tests (primary)
uv run -m pytest                      # All tests

# Run quality checks
uv run ruff check
uv run mypy src/databeak/

Testing Structure

DataBeak currently focuses on comprehensive unit testing with future plans for integration and E2E testing:

  • Unit Tests (tests/unit/) - Fast, isolated module tests (current focus)
  • Integration Tests (tests/integration/) - Future: FastMCP Client-based testing
  • E2E Tests (tests/e2e/) - Future: Complete workflow validation

Current Test Execution:

uv run pytest -n auto tests/unit/          # Run unit tests (primary)
uv run pytest -n auto --cov=src/databeak   # Run with coverage analysis

See Testing Guide for comprehensive testing details.

License

Apache 2.0 - see LICENSE file.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databeak-0.0.4.tar.gz (206.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databeak-0.0.4-py3-none-any.whl (115.6 kB view details)

Uploaded Python 3

File details

Details for the file databeak-0.0.4.tar.gz.

File metadata

  • Download URL: databeak-0.0.4.tar.gz
  • Upload date:
  • Size: 206.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databeak-0.0.4.tar.gz
Algorithm Hash digest
SHA256 3585a3834d23c6af75e92ae361a35ccfeda28f802662c117e03ae7e8eec9693a
MD5 133870e9730a7caf9e4fc4ac039de196
BLAKE2b-256 af31a7b1b6d61d2ad5d22b18260099fb0bb84a8496a97e1254cc0f4407652b77

See more details on using hashes here.

Provenance

The following attestation bundles were made for databeak-0.0.4.tar.gz:

Publisher: publish.yml on jonpspri/databeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file databeak-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: databeak-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 115.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databeak-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 31f788cc08c012d36d58b333c688001934b25e0522623b3923da938c89fbf7da
MD5 0893fc8dd28379d48d70670b1c750934
BLAKE2b-256 e9209956b64744598b8854c532490e098b86caa0f4b4461bf03e70f7ea172052

See more details on using hashes here.

Provenance

The following attestation bundles were made for databeak-0.0.4-py3-none-any.whl:

Publisher: publish.yml on jonpspri/databeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page