Skip to main content

DataBeak: MCP server for comprehensive CSV file operations with pandas-based tools

Project description

DataBeak

Tests codecov Python 3.12+ License Code style: ruff

AI-Powered CSV Processing via Model Context Protocol

Transform how AI assistants work with CSV data. DataBeak provides 40+ specialized tools for data manipulation, analysis, and validation through the Model Context Protocol (MCP).

Features

  • 🔄 Complete Data Operations - Load, transform, analyze, and export CSV data
  • 📊 Advanced Analytics - Statistics, correlations, outlier detection, data profiling
  • Data Validation - Schema validation, quality scoring, anomaly detection
  • 🎯 Stateless Design - Clean MCP architecture with external context management
  • High Performance - Handles large datasets with streaming and chunking
  • 🔒 Session Management - Multi-user support with isolated sessions
  • 🌟 Code Quality - Zero ruff violations, 100% mypy compliance, perfect MCP documentation standards, comprehensive test coverage

Getting Started

The fastest way to use DataBeak is with uvx (no installation required):

For Claude Desktop

Add this to your MCP Settings file:

{
  "mcpServers": {
    "databeak": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/jonpspri/databeak.git",
        "databeak"
      ]
    }
  }
}

For Other AI Clients

DataBeak works with Continue, Cline, Windsurf, and Zed. See the installation guide for specific configuration examples.

HTTP Mode (Advanced)

For HTTP-based AI clients or custom deployments:

# Run in HTTP mode
uv run databeak --transport http --host 0.0.0.0 --port 8000

# Access server at http://localhost:8000/mcp
# Health check at http://localhost:8000/health

Quick Test

Once configured, ask your AI assistant:

"Load a CSV file and show me basic statistics"
"Remove duplicate rows and export as Excel"
"Find outliers in the price column"

Documentation

📚 Complete Documentation

Environment Variables

Variable Default Description
DATABEAK_MAX_FILE_SIZE_MB 1024 Maximum file size
DATABEAK_CSV_HISTORY_DIR "." History storage location
DATABEAK_SESSION_TIMEOUT 3600 Session timeout (seconds)

Known Limitations

DataBeak is designed for interactive CSV processing with AI assistants. Be aware of these constraints:

  • File Size: Maximum 1024MB per file (configurable via DATABEAK_MAX_FILE_SIZE_MB)
  • Session Management: Maximum 100 concurrent sessions, 1-hour timeout (configurable)
  • Memory: Large datasets may require significant memory; monitor with system_info tool
  • CSV Dialects: Assumes standard CSV format; complex dialects may require pre-processing
  • Concurrency: Single-threaded processing per session; parallel sessions supported
  • Data Types: Automatic type inference; complex types may need explicit conversion
  • URL Loading: HTTPS only; blocks private networks (127.0.0.1, 192.168.x.x, 10.x.x.x) for security

For production deployments with larger datasets, consider adjusting environment variables and monitoring resource usage.

Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes with tests
  4. Run quality checks: uv run -m pytest
  5. Submit a pull request

Note: All changes must go through pull requests. Direct commits to main are blocked by pre-commit hooks.

Development

# Setup development environment
git clone https://github.com/jonpspri/databeak.git
cd databeak
uv sync

# Run the server locally
uv run databeak

# Run tests
uv run -m pytest tests/unit/          # Unit tests (primary)
uv run -m pytest                      # All tests

# Run quality checks
uv run ruff check
uv run mypy src/databeak/

Testing Structure

DataBeak implements comprehensive unit and integration testing:

  • Unit Tests (tests/unit/) - 940+ fast, isolated module tests
  • Integration Tests (tests/integration/) - 43 FastMCP Client-based protocol tests across 7 test files
  • E2E Tests (tests/e2e/) - Planned: Complete workflow validation

Test Execution:

uv run pytest -n auto tests/unit/          # Run unit tests (940+ tests)
uv run pytest -n auto tests/integration/   # Run integration tests (43 tests)
uv run pytest -n auto --cov=src/databeak   # Run with coverage analysis

See Testing Guide for comprehensive testing details.

License

Apache 2.0 - see LICENSE file.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databeak-0.1.1.tar.gz (189.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databeak-0.1.1-py3-none-any.whl (102.2 kB view details)

Uploaded Python 3

File details

Details for the file databeak-0.1.1.tar.gz.

File metadata

  • Download URL: databeak-0.1.1.tar.gz
  • Upload date:
  • Size: 189.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databeak-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5fda72ff4cdf23ab34f9a72a173df77e31792e4f3087d56287b5360a54a9495e
MD5 3a1fc7ba3e9cc9156bd2e3ec9b8c8af8
BLAKE2b-256 1d1e717a611b33c91c9271f29ec92cd8c25712642ab5468cd6343382bf12f061

See more details on using hashes here.

Provenance

The following attestation bundles were made for databeak-0.1.1.tar.gz:

Publisher: publish.yml on jonpspri/databeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file databeak-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: databeak-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 102.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databeak-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 92bce9c8dca9a93618c36893c8bd2e79c03aaeb1979eab46aa276ddbaae28ea8
MD5 da6edf9af9f04e7b08083a679b08505e
BLAKE2b-256 0d54b3e8151064d7bda00378fa644b2f113d70f76edd2acd9d3c9f7d81b1f49c

See more details on using hashes here.

Provenance

The following attestation bundles were made for databeak-0.1.1-py3-none-any.whl:

Publisher: publish.yml on jonpspri/databeak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page