Skip to main content

Lightweight build of chdb-core (in-process OLAP SQL engine powered by ClickHouse), trimmed for size-sensitive deployments such as serverless, containers and edge

Project description

Build X86 PyPI Downloads Discord Twitter

chdb-core

chdb-core is the foundational engine of the chDB ecosystem — an in-process SQL OLAP Engine powered by ClickHouse [^1]

Table of Contents


chDB Ecosystem

The chDB project is split into two packages:

Package Role Install
chdb-core (this repo) C++ engine + Session / Connection / DB-API interfaces pip install chdb-core
chDB Pandas-compatible DataStore API built on top of chdb-core pip install chdb
┌───────────────────────────────────────────┐
│          chDB (pip install chdb)          │
│  ┌─────────────────────────────────────┐  │
│  │  DataStore: pandas-like lazy API    │  │
│  │  QueryPlanner / dual-engine exec    │  │
│  └──────────────────┬──────────────────┘  │
│                     │                     │
│  ┌──────────────────▼──────────────────┐  │
│  │  chdb-core (pip install chdb-core)  │  │
│  │  C++ ClickHouse Engine              │  │
│  │  Session / Connection / DB-API      │  │
│  │  query() / UDF / Stream             │  │
│  └─────────────────────────────────────┘  │
└───────────────────────────────────────────┘

chdb-core provides everything you need to run SQL queries with ClickHouse performance — no server required. If you want a higher-level pandas-compatible DataFrame API, install chDB instead.


Features

  • In-process SQL OLAP Engine, powered by ClickHouse
  • No need to install ClickHouse
  • Minimized data copy from C++ to Python with python memoryview
  • Input & Output support Parquet, CSV, JSON, Arrow, ORC and 60+ more formats
  • Session and Connection management with stateful queries
  • Streaming query support for constant-memory processing
  • Python DB-API 2.0 compliance
  • User Defined Functions (UDF) support
  • AI-assisted SQL generation

Architecture


Installation

Currently, chdb-core supports Python 3.9+ on macOS and Linux (x86_64 and ARM64).

pip install chdb-core

Quick Start

import chdb

result = chdb.query("SELECT version()", "Pretty")
print(result)

API Reference

One-shot Query

The simplest way to run SQL — no session or connection needed:

import chdb

# Basic query with CSV output (default)
result = chdb.query("SELECT 1, 'hello'")
print(result)

# Pandas DataFrame output
df = chdb.query("SELECT number, number * 2 AS double FROM numbers(10)", "DataFrame")
print(df)

# Parameterized queries
df = chdb.query(
    "SELECT toDate({base_date:String}) + number AS date "
    "FROM numbers({total_days:UInt64}) "
    "LIMIT {items_per_page:UInt64}",
    "DataFrame",
    params={"base_date": "2025-01-01", "total_days": 10, "items_per_page": 5},
)
print(df)
Query on Files (Parquet, CSV, JSON, Arrow, ORC and 60+)
import chdb

res = chdb.query('SELECT * FROM file("data.parquet", Parquet)', "JSON")
print(res)

res = chdb.query('SELECT * FROM file("data.csv", CSV)', "CSV")
print(res)

# Query result statistics
print(f"SQL read {res.rows_read()} rows, {res.bytes_read()} bytes, "
      f"storage read {res.storage_rows_read()} rows, {res.storage_bytes_read()} bytes, "
      f"elapsed {res.elapsed()} seconds")

# Pandas DataFrame output
chdb.query('SELECT * FROM file("data.parquet", Parquet)', "Dataframe")
Connection API

Connection-based API for cursor-style interaction, supporting both in-memory and file-based databases:

import chdb

conn = chdb.connect(":memory:")
cur = conn.cursor()

cur.execute("CREATE TABLE test (id UInt32, name String) ENGINE = Memory")
cur.execute("INSERT INTO test VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie')")
cur.execute("SELECT * FROM test ORDER BY id")

print(cur.fetchone())      # (1, 'Alice')
print(cur.fetchmany(2))    # ((2, 'Bob'), (3, 'Charlie'))
print(cur.column_names())  # ['id', 'name']
print(cur.column_types())  # ['UInt32', 'String']

# Use the cursor as an iterator
cur.execute("SELECT number FROM system.numbers LIMIT 3")
for row in cur:
    print(row)

# Always close resources when done
cur.close()
conn.close()

For more details, see examples/connect.py.

# File-based persistent database
conn = chdb.connect("mydata.db")
conn.query("CREATE TABLE IF NOT EXISTS logs (ts DateTime, msg String) ENGINE = MergeTree ORDER BY ts")
conn.query("INSERT INTO logs VALUES (now(), 'started')")
result = conn.query("SELECT * FROM logs", "Pretty")
print(result)
conn.close()
Stateful Session

Sessions provide a higher-level API with automatic resource management:

from chdb import session as chs

sess = chs.Session()
sess.query("CREATE DATABASE IF NOT EXISTS db_xxx ENGINE = Atomic")
sess.query("CREATE TABLE IF NOT EXISTS db_xxx.log_table (x String, y Int) ENGINE = Log")
sess.query("INSERT INTO db_xxx.log_table VALUES ('a', 1), ('b', 3), ('c', 2), ('d', 5)")
sess.query("CREATE VIEW db_xxx.view_xxx AS SELECT * FROM db_xxx.log_table LIMIT 4")

print(sess.query("SELECT * FROM db_xxx.view_xxx", "Pretty"))

see also: test_stateful.py.

Streaming Query

Process large datasets with constant memory usage through chunked streaming:

from chdb import session as chs

sess = chs.Session()

rows_cnt = 0
with sess.send_query("SELECT * FROM numbers(200000)", "CSV") as stream_result:
    for chunk in stream_result:
        rows_cnt += chunk.rows_read()

print(rows_cnt) # 200000

# Example 2: Manual iteration with fetch()
rows_cnt = 0
stream_result = sess.send_query("SELECT * FROM numbers(200000)", "CSV")
while True:
    chunk = stream_result.fetch()
    if chunk is None:
        break
    rows_cnt += chunk.rows_read()

print(rows_cnt) # 200000

For more details, see test_streaming_query.py.

Python DB-API 2.0
import chdb.dbapi as dbapi
print("chdb driver version: {0}".format(dbapi.get_client_info()))

conn1 = dbapi.connect()
cur1 = conn1.cursor()
cur1.execute('select version()')
print("description: ", cur1.description)
print("data: ", cur1.fetchone())
cur1.close()
conn1.close()
Query on Table (Pandas DataFrame, Parquet file/bytes, Arrow bytes)
import chdb.dataframe as cdf
import pandas as pd
# Join 2 DataFrames
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': ["one", "two", "three"]})
df2 = pd.DataFrame({'c': [1, 2, 3], 'd': ["①", "②", "③"]})
ret_tbl = cdf.query(sql="select * from __tbl1__ t1 join __tbl2__ t2 on t1.a = t2.c",
                  tbl1=df1, tbl2=df2)
print(ret_tbl)
# Query on the DataFrame Table
print(ret_tbl.query('select b, sum(a) from __table__ group by b'))
# Pandas DataFrames are automatically registered as temporary tables in ClickHouse
chdb.query("SELECT * FROM Python(df1) t1 JOIN Python(df2) t2 ON t1.a = t2.c").show()
Python Table Engine

Query on Pandas DataFrame

import chdb
import pandas as pd
df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
    }
)

chdb.query("SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b").show()

Query on Arrow Table

import chdb
import pyarrow as pa
arrow_table = pa.table(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
    }
)

chdb.query("SELECT b, sum(a) FROM Python(arrow_table) GROUP BY b ORDER BY b").show()

see also: test_query_py.py.

User Defined Functions (UDF)

chDB supports native Python UDFs that run in-process with full type safety.

import chdb
from chdb.session import Session
from chdb.sqltypes import INT64, STRING

sess = Session()

# Using the @chdb.func decorator
@chdb.func([INT64, INT64], INT64)
def add(a, b):
    return a + b

print(sess.query("SELECT add(12, 22)"))

# With type annotations (types inferred automatically)
@chdb.func()
def multiply(a: int, b: int) -> int:
    return a * b

print(sess.query("SELECT multiply(3, 7)"))

# Using chdb.create_function directly
chdb.create_function("strlen", len, arg_types=[STRING], return_type=INT64)
print(sess.query("SELECT strlen('hello')"))

# Remove a registered function
chdb.drop_function("strlen")

Key features:

  • Type-safe: supports INT64, FLOAT64, STRING, BOOL, DATETIME64, etc. See chdb.sqltypes for full list.
  • Type inference: automatically inferred from Python annotations (int, str, bool, etc.)
  • NULL handling: on_null=NullHandling.SKIP (default) skips the function call and returns NULL; NullHandling.PASS passes None to the function.
  • Exception handling: on_error=ExceptionHandling.PROPAGATE (default) raises the error to the caller; ExceptionHandling.IGNORE returns NULL for that row and continues.

See also: test_func_udf.py, test_func_udf_types.py.

Query Progress
import chdb

# Auto-detect: TTY progress in terminal, progress bar in notebook
conn = chdb.connect(":memory:?progress=auto")
conn.query("SELECT sum(number) FROM numbers_mt(1e10) GROUP BY number % 10 SETTINGS max_threads=4")

Progress options: progress=auto | progress=tty | progress=err | progress=off

AI-assisted SQL Generation
import chdb

conn = chdb.connect("file::memory:?ai_provider=openai&ai_model=gpt-4o-mini")
conn.query("CREATE TABLE nums (n UInt32) ENGINE = Memory")
conn.query("INSERT INTO nums VALUES (1), (2), (3)")

sql = conn.generate_sql("Select all rows from nums ordered by n desc")
print(sql)  # SELECT * FROM nums ORDER BY n DESC

print(conn.ask("List the numbers table", format="Pretty"))
Command Line

python3 -m chdb SQL [OutputFormat]

python3 -m chdb "SELECT 1,'abc'" Pretty

For more examples, see examples and tests.


Demos and Examples


Benchmark

---

Documentation


AI Coding Agent Skill

chDB provides an AI Skill that teaches AI coding agents (Cursor, Claude Code, etc.) chDB's multi-source data analytics API. Install it so your AI assistant can write correct chDB code out of the box:

curl -sL https://raw.githubusercontent.com/chdb-io/chdb/main/install_skill.sh | bash

Events


Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  • Help test and report bugs
  • Help improve documentation
  • Help improve code quality and performance

Bindings

We welcome bindings for other languages, please refer to bindings for more details.


Version Guide

Please refer to VERSION-GUIDE.md for more details.


Paper


License

Apache 2.0, see LICENSE for more information.


Acknowledgments

chDB is mainly based on ClickHouse [^1] for trade mark and other reasons, I named it chDB.


Contact


[^1]: ClickHouse® is a trademark of ClickHouse Inc. All trademarks, service marks, and logos mentioned or depicted are the property of their respective owners. The use of any third-party trademarks, brand names, product names, and company names does not imply endorsement, affiliation, or association with the respective owners.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

chdb_core_lite-26.5.0-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (51.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

chdb_core_lite-26.5.0-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (45.8 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

chdb_core_lite-26.5.0-cp39-abi3-macosx_11_0_arm64.whl (42.6 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

chdb_core_lite-26.5.0-cp39-abi3-macosx_10_15_x86_64.whl (49.3 MB view details)

Uploaded CPython 3.9+macOS 10.15+ x86-64

File details

Details for the file chdb_core_lite-26.5.0-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for chdb_core_lite-26.5.0-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 d49fb1a92a36d9cd1ccc071d6ece97ce51c6764963983c4e6b2bb463378faabf
MD5 67c2601a89f684d7862f12e45fbb3e19
BLAKE2b-256 a0829c8ea554e26d151940d6124130add62f04dad021054eb9c3ba9d90785f22

See more details on using hashes here.

File details

Details for the file chdb_core_lite-26.5.0-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for chdb_core_lite-26.5.0-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 b769ebe731bf43b4dbac4345981c1eb4a4acf1b0e08b6379e71551d72a1cd09d
MD5 33615091a64b010d5d6c7d610eca591c
BLAKE2b-256 4f252bcc11b87188e36d171ead9423eb3181de11396bdb2f9b6a6605e52f28d3

See more details on using hashes here.

File details

Details for the file chdb_core_lite-26.5.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chdb_core_lite-26.5.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e01b408cbb221ad785b8d200581c2017ffd80725529392aaf5482eeaf894fdef
MD5 2a87becd60eae947b3a8439c4df311d4
BLAKE2b-256 43c35224576515f703bdd89db1596c224a04d6f0e1382ebfdde658aa3e281325

See more details on using hashes here.

File details

Details for the file chdb_core_lite-26.5.0-cp39-abi3-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for chdb_core_lite-26.5.0-cp39-abi3-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 d64c3406a09b32385a562b06cdb1c87a93d49dbc64cc05bd90e2281d91eb9962
MD5 682ed2cefd9278682100572b8c2c2563
BLAKE2b-256 32610ecd6fb91167b1aa3fd65698b25a7b9bc20c5a8d52b0194f2e5757313a62

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page