Skip to main content

MCP server for chDB — the in-process SQL OLAP engine powered by ClickHouse. Lets AI agents query Parquet, CSV, JSON, and pandas DataFrames with one tool.

Project description

chdb-mcp

PyPI CI License Python

An MCP server for chDB, the in-process SQL OLAP engine powered by ClickHouse. Lets agents (Claude Desktop, Cursor, VS Code, Codex CLI, Cline, …) query Parquet, CSV, JSON, and pandas DataFrames with one tool — no separate server, no Docker.

Why chdb-mcp?

  • Full ClickHouse engine, in-process. 1000+ functions (windowFunnel, quantilesTDigest, geoToH3, the -If/-State/-Merge combinators), typed JSON with O(1) sub-column reads, native vectors, MergeTree storage.
  • Drop-in pandas API. import datastore as pd covers ~300 pandas-shaped methods compiled to ClickHouse SQL. v1.0 adds dataframe_query() for zero-copy Python(df).
  • ~80 formats and 12+ source connectors in core. Parquet, CSV, JSON, Avro, ORC, Arrow, Protobuf, plus s3(), mongodb(), postgresql(), mysql(), iceberg(), deltaLake() — no INSTALL/LOAD chain.
  • Federate to remote ClickHouse in one statement. (v0.5) remoteSecure('cluster:9440', 'db.table', ...) joins local Parquet with a production ClickHouse cluster in one optimised plan.
  • Same SQL as your warehouse. Copy-paste ClickHouse production queries into the agent prompt — no dialect bridge.

Install

pip install chdb-mcp

Connect

Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{ "mcpServers": { "chdb": { "command": "chdb-mcp" } } }

Cursor / VS Code — same JSON in ~/.cursor/mcp.json etc.; one-click badges land in v0.2.

Codex CLI / Claude Code / Copilot / Droid — use the cross-IDE bundle chdb-agent-plugin.

Tools (v0.1)

Tool Description
query(sql, format) Run any read-only SQL on the in-process session
list_databases() Enumerate visible databases
list_tables(database) List tables in a database
describe_table(database, table) Column types for a table
query_file(path, sql, format) Query a Parquet/CSV/JSON file via the {file} placeholder
get_sample_data(database, table, limit) First N rows of a table

Read-only by default — SET readonly=2 blocks INSERT/CREATE/DROP/ALTER while keeping file()/url()/s3() usable. Set CHDB_MCP_WRITE=1 to drop the guard. See Security model.

In query_file, {file} is replaced with file('path', 'format') before execution:

query_file(
    path="/data/sales.parquet",
    sql="SELECT region, sum(revenue) FROM {file} GROUP BY region",
    format="Parquet",
)

Configuration

Variable Default Effect
CHDB_MCP_WRITE unset If 1, allows INSERT/CREATE/DROP/ALTER
CHDB_MCP_MAX_RESULT_BYTES 1048576 Per-tool result truncation threshold
CHDB_MCP_FILE_ALLOWLIST empty :-separated path prefixes for query_file(); symlinks resolved on both sides. Advisory — see Security model.
CHDB_MCP_SESSION_PATH empty Persistent session directory (default: ephemeral)

Security model

Protects against: accidental writes (readonly=2), runaway result sizes (per-tool truncation), SQL-identifier injection in list_tables / describe_table / get_sample_data arguments (whitelist regex + escaping).

Does NOT protect against:

  • Filesystem reach. CHDB_MCP_FILE_ALLOWLIST only guards query_file(); the query() tool accepts arbitrary SQL, and chDB exposes file() / url() / s3() / remote() directly. A determined caller bypasses the allowlist. Use OS-level isolation (macOS App Sandbox, Linux namespaces, Docker with a read-only mount) for real sandboxing.
  • SQL audit. Only the readonly guard — no allow/deny list of statements. Treat the agent as having full SELECT access to anything chDB can reach.
  • Resource limits. No memory / CPU / wall-clock caps in v0.1. Use ulimit / cgroups if needed.

For agents acting on untrusted input, run in a throwaway container.

Roadmap

  • v0.5query_remote_clickhouse() federation tool
  • v1.0attach_file(), dataframe_query() (zero-copy Python(df)), HTTP/SSE transport with Bearer auth, .mcpb bundle for Claude Desktop one-click install

Troubleshooting

macOS: "Server disconnected" in Claude Desktop

If ~/Library/Logs/Claude/mcp-server-chdb.log shows PermissionError: Operation not permitted on pyvenv.cfg, your venv sits under a TCC-protected directory (~/Downloads, ~/Documents, ~/Desktop) — Claude Desktop subprocesses can't read those paths.

Fix: install elsewhere. Recommended is uvx (zero-config, isolated under ~/.local/share/uv/):

{ "mcpServers": { "chdb": { "command": "uvx", "args": ["chdb-mcp"] } } }

Or build a venv yourself under ~/.local/share/chdb-mcp/.venv and point Claude Desktop at its chdb-mcp binary.

query_file returns "path is not under any prefix"

The allowlist resolves symlinks on both sides (so /tmp matches /private/tmp on macOS). If you still hit this, check the resolved form printed in the error against python -c "from pathlib import Path; print(Path('YOUR_PATH').resolve())".

"Cannot execute query in readonly mode"

SET readonly=2 blocks DDL/DML by design. Rewrite as a pure SELECT, or restart with CHDB_MCP_WRITE=1.

Per-server logs

~/Library/Logs/Claude/mcp-server-chdb.log   # startup diagnostics + stderr
~/Library/Logs/Claude/mcp.log                # all servers' JSON-RPC traffic

Development

git clone https://github.com/chdb-io/chdb-mcp && cd chdb-mcp
pip install -e ".[dev]"
pytest && ruff check src tests

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chdb_mcp-0.1.0.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chdb_mcp-0.1.0-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file chdb_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: chdb_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chdb_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0a85a01e0b4c99c0934845a0a182f7f09b6821a14ef1f342030bc5bfa7db42a1
MD5 0f600ef63b53bc8482f1e9211df7617c
BLAKE2b-256 ad51c02b2f9b570100c24ff9efbe05020be2c28f1df51c8f169efef8440cd167

See more details on using hashes here.

Provenance

The following attestation bundles were made for chdb_mcp-0.1.0.tar.gz:

Publisher: publish.yml on chdb-io/chdb-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chdb_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: chdb_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chdb_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c37c06839c16a3600aabe6a7e81acd8d4fc9547fd03ab90c948583455b1b935
MD5 bd0764d058e5a39998d628e597667eb8
BLAKE2b-256 5cc7bb5734d145d498fa4d8d4b3834f624e1709bec4481d2bdcfa3129a774b5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for chdb_mcp-0.1.0-py3-none-any.whl:

Publisher: publish.yml on chdb-io/chdb-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page