MCP server for chDB — the in-process SQL OLAP engine powered by ClickHouse. Lets AI agents query Parquet, CSV, JSON, and pandas DataFrames with one tool.
Project description
chdb-mcp
An MCP server for chDB, the in-process SQL OLAP engine powered by ClickHouse. Lets agents (Claude Desktop, Cursor, VS Code, Codex CLI, Cline, …) query Parquet, CSV, JSON, and pandas DataFrames with one tool — no separate server, no Docker.
Why chdb-mcp?
- Full ClickHouse engine, in-process. 1000+ functions (
windowFunnel,quantilesTDigest,geoToH3, the-If/-State/-Mergecombinators), typedJSONwith O(1) sub-column reads, native vectors,MergeTreestorage. - Drop-in pandas API.
import datastore as pdcovers ~300 pandas-shaped methods compiled to ClickHouse SQL. v1.0 addsdataframe_query()for zero-copyPython(df). - ~80 formats and 12+ source connectors in core. Parquet, CSV, JSON, Avro, ORC, Arrow, Protobuf, plus
s3(),mongodb(),postgresql(),mysql(),iceberg(),deltaLake()— noINSTALL/LOADchain. - Federate to remote ClickHouse in one statement. (v0.5)
remoteSecure('cluster:9440', 'db.table', ...)joins local Parquet with a production ClickHouse cluster in one optimised plan. - Same SQL as your warehouse. Copy-paste ClickHouse production queries into the agent prompt — no dialect bridge.
Install
pip install chdb-mcp
Connect
Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{ "mcpServers": { "chdb": { "command": "chdb-mcp" } } }
Cursor / VS Code — same JSON in ~/.cursor/mcp.json etc.; one-click badges land in v0.2.
Codex CLI / Claude Code / Copilot / Droid — use the cross-IDE bundle chdb-agent-plugin.
Tools (v0.1)
| Tool | Description |
|---|---|
query(sql, format) |
Run any read-only SQL on the in-process session |
list_databases() |
Enumerate visible databases |
list_tables(database) |
List tables in a database |
describe_table(database, table) |
Column types for a table |
query_file(path, sql, format) |
Query a Parquet/CSV/JSON file via the {file} placeholder |
get_sample_data(database, table, limit) |
First N rows of a table |
Read-only by default — SET readonly=2 blocks INSERT/CREATE/DROP/ALTER while keeping file()/url()/s3() usable. Set CHDB_MCP_WRITE=1 to drop the guard. See Security model.
In query_file, {file} is replaced with file('path', 'format') before execution:
query_file(
path="/data/sales.parquet",
sql="SELECT region, sum(revenue) FROM {file} GROUP BY region",
format="Parquet",
)
Configuration
| Variable | Default | Effect |
|---|---|---|
CHDB_MCP_WRITE |
unset | If 1, allows INSERT/CREATE/DROP/ALTER |
CHDB_MCP_MAX_RESULT_BYTES |
1048576 |
Per-tool result truncation threshold |
CHDB_MCP_FILE_ALLOWLIST |
empty | :-separated path prefixes for query_file(); symlinks resolved on both sides. Advisory — see Security model. |
CHDB_MCP_SESSION_PATH |
empty | Persistent session directory (default: ephemeral) |
Security model
Protects against: accidental writes (readonly=2), runaway result sizes (per-tool truncation), SQL-identifier injection in list_tables / describe_table / get_sample_data arguments (whitelist regex + escaping).
Does NOT protect against:
- Filesystem reach.
CHDB_MCP_FILE_ALLOWLISTonly guardsquery_file(); thequery()tool accepts arbitrary SQL, and chDB exposesfile()/url()/s3()/remote()directly. A determined caller bypasses the allowlist. Use OS-level isolation (macOS App Sandbox, Linux namespaces, Docker with a read-only mount) for real sandboxing. - SQL audit. Only the readonly guard — no allow/deny list of statements. Treat the agent as having full
SELECTaccess to anything chDB can reach. - Resource limits. No memory / CPU / wall-clock caps in v0.1. Use
ulimit/cgroupsif needed.
For agents acting on untrusted input, run in a throwaway container.
Roadmap
- v0.5 —
query_remote_clickhouse()federation tool - v1.0 —
attach_file(),dataframe_query()(zero-copyPython(df)), HTTP/SSE transport with Bearer auth,.mcpbbundle for Claude Desktop one-click install
Troubleshooting
macOS: "Server disconnected" in Claude Desktop
If ~/Library/Logs/Claude/mcp-server-chdb.log shows PermissionError: Operation not permitted on pyvenv.cfg, your venv sits under a TCC-protected directory (~/Downloads, ~/Documents, ~/Desktop) — Claude Desktop subprocesses can't read those paths.
Fix: install elsewhere. Recommended is uvx (zero-config, isolated under ~/.local/share/uv/):
{ "mcpServers": { "chdb": { "command": "uvx", "args": ["chdb-mcp"] } } }
Or build a venv yourself under ~/.local/share/chdb-mcp/.venv and point Claude Desktop at its chdb-mcp binary.
query_file returns "path is not under any prefix"
The allowlist resolves symlinks on both sides (so /tmp matches /private/tmp on macOS). If you still hit this, check the resolved form printed in the error against python -c "from pathlib import Path; print(Path('YOUR_PATH').resolve())".
"Cannot execute query in readonly mode"
SET readonly=2 blocks DDL/DML by design. Rewrite as a pure SELECT, or restart with CHDB_MCP_WRITE=1.
Per-server logs
~/Library/Logs/Claude/mcp-server-chdb.log # startup diagnostics + stderr
~/Library/Logs/Claude/mcp.log # all servers' JSON-RPC traffic
Development
git clone https://github.com/chdb-io/chdb-mcp && cd chdb-mcp
pip install -e ".[dev]"
pytest && ruff check src tests
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chdb_mcp-0.1.0.tar.gz.
File metadata
- Download URL: chdb_mcp-0.1.0.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a85a01e0b4c99c0934845a0a182f7f09b6821a14ef1f342030bc5bfa7db42a1
|
|
| MD5 |
0f600ef63b53bc8482f1e9211df7617c
|
|
| BLAKE2b-256 |
ad51c02b2f9b570100c24ff9efbe05020be2c28f1df51c8f169efef8440cd167
|
Provenance
The following attestation bundles were made for chdb_mcp-0.1.0.tar.gz:
Publisher:
publish.yml on chdb-io/chdb-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chdb_mcp-0.1.0.tar.gz -
Subject digest:
0a85a01e0b4c99c0934845a0a182f7f09b6821a14ef1f342030bc5bfa7db42a1 - Sigstore transparency entry: 1565994580
- Sigstore integration time:
-
Permalink:
chdb-io/chdb-mcp@934a9177300d6496e9905f39ea695186c8950e43 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/chdb-io
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@934a9177300d6496e9905f39ea695186c8950e43 -
Trigger Event:
release
-
Statement type:
File details
Details for the file chdb_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: chdb_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c37c06839c16a3600aabe6a7e81acd8d4fc9547fd03ab90c948583455b1b935
|
|
| MD5 |
bd0764d058e5a39998d628e597667eb8
|
|
| BLAKE2b-256 |
5cc7bb5734d145d498fa4d8d4b3834f624e1709bec4481d2bdcfa3129a774b5d
|
Provenance
The following attestation bundles were made for chdb_mcp-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on chdb-io/chdb-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chdb_mcp-0.1.0-py3-none-any.whl -
Subject digest:
5c37c06839c16a3600aabe6a7e81acd8d4fc9547fd03ab90c948583455b1b935 - Sigstore transparency entry: 1565994590
- Sigstore integration time:
-
Permalink:
chdb-io/chdb-mcp@934a9177300d6496e9905f39ea695186c8950e43 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/chdb-io
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@934a9177300d6496e9905f39ea695186c8950e43 -
Trigger Event:
release
-
Statement type: