Skip to main content

DataBeak: MCP server for comprehensive CSV file operations with pandas-based tools

Project description

DataBeak

AI-Powered CSV Processing via Model Context Protocol

Transform how AI assistants work with CSV data. DataBeak provides 40+ specialized tools for data manipulation, analysis, and validation through the Model Context Protocol (MCP).

Features

  • 🔄 Complete Data Operations - Load, transform, analyze, and export CSV data
  • 📊 Advanced Analytics - Statistics, correlations, outlier detection, data profiling
  • Data Validation - Schema validation, quality scoring, anomaly detection
  • 💾 Auto-Save & History - Never lose work with configurable strategies and undo/redo
  • High Performance - Handles large datasets with streaming and chunking
  • 🔒 Session Management - Multi-user support with isolated sessions
  • 🌟 Production Quality - Zero ruff violations, 100% mypy compliance, comprehensive test coverage

Getting Started

The fastest way to use DataBeak is with uvx (no installation required):

For Claude Desktop

Add this to your MCP Settings file:

{
  "mcpServers": {
    "databeak": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/jonpspri/databeak.git",
        "databeak"
      ]
    }
  }
}

For Other AI Clients

DataBeak works with Continue, Cline, Windsurf, and Zed. See the installation guide for specific configuration examples.

Quick Test

Once configured, ask your AI assistant:

"Load a CSV file and show me basic statistics"
"Remove duplicate rows and export as Excel"
"Find outliers in the price column"

Documentation

📚 Complete Documentation

Environment Variables

Variable Default Description
DATABEAK_MAX_FILE_SIZE_MB 1024 Maximum file size
DATABEAK_CSV_HISTORY_DIR "." History storage location
DATABEAK_SESSION_TIMEOUT 3600 Session timeout (seconds)

Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes with tests
  4. Run quality checks: uv run -m pytest
  5. Submit a pull request

Note: All changes must go through pull requests. Direct commits to main are blocked by pre-commit hooks.

Development

# Setup development environment
git clone https://github.com/jonpspri/databeak.git
cd databeak
uv sync

# Run the server locally
uv run databeak

# Run tests
uv run -m pytest tests/unit/          # Unit tests (primary)
uv run -m pytest                      # All tests

# Run quality checks
uv run ruff check
uv run mypy

Testing Structure

DataBeak currently focuses on comprehensive unit testing with future plans for integration and E2E testing:

  • Unit Tests (tests/unit/) - Fast, isolated module tests (current focus)
  • Integration Tests (tests/integration/) - Future: FastMCP Client-based testing
  • E2E Tests (tests/e2e/) - Future: Complete workflow validation

Current Test Execution:

uv run pytest -n auto tests/unit/          # Run unit tests (primary)
uv run pytest -n auto --cov=src/databeak   # Run with coverage analysis

See Testing Guide for comprehensive testing details.

License

Apache 2.0 - see LICENSE file.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databeak-0.0.2.tar.gz (238.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databeak-0.0.2-py3-none-any.whl (133.6 kB view details)

Uploaded Python 3

File details

Details for the file databeak-0.0.2.tar.gz.

File metadata

  • Download URL: databeak-0.0.2.tar.gz
  • Upload date:
  • Size: 238.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databeak-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b2dc7292bd2ac20cfa108087833f3d2b5e2d710ef05f001eb465ea5178a31008
MD5 83d053c46d362b60d2e6aa3be43e872a
BLAKE2b-256 d2b9981cef073f4bb1afce8b80b09c318474b254e3bf02b90602580b962c3859

See more details on using hashes here.

Provenance

The following attestation bundles were made for databeak-0.0.2.tar.gz:

Publisher: publish.yml on jonpspri/databeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file databeak-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: databeak-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 133.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databeak-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b0f357b0406ef7d5505e7f459b7493c0fcd52be5f1f36ab437da0955a5ee01ff
MD5 194acfc6433c98075687fd9a662923b1
BLAKE2b-256 eb926e2318dd97d2b4bc6679532d2e7a781b40dbcf18f12063cbc84cea778c83

See more details on using hashes here.

Provenance

The following attestation bundles were made for databeak-0.0.2-py3-none-any.whl:

Publisher: publish.yml on jonpspri/databeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page