Minimal Cython-based DuckDB Bindings
Project description
bareduckdb
Simplified, Dynamically Linked DuckDB Python Bindings — Fast, simple, and free-threaded.
Overview
bareduckdb provides extensible and easy to build Python bindings to DuckDB using Cython.
- Simple ~2k lines of C++ and ~2k lines of Python - easy to extend or customize
- Arrow-first data conversion supporting Polars, PyArrow, and Pandas
- Support for latest Python features Free threading, subinterpreters, ABI3 and asyncio
- Dynamically linked to DuckDB's official library
- Experimental Enhancements
Experimental Enhancements
- Explicit Stream vs Materialization Modes - At connection & execution time, select whether you want materialized arrow_tables or streaming arrow_readers.
- Arrow Deadlock Detection - certain use cases involving reuse of Arrow Readers can cause deadlocks
- Table Statistics - Extracts and passes table statistics at registration time
- Polars - No PyArrow Required - Polars can be read and produced without importing / installing PyArrow
- Polars - Native LazyFrame Pushdown - whereas DuckDB collects() LazyFrames before pushdown, bareduckdb pushes down native Polars predicates
- Inline Registration - bareduckdb.execute("query", data={...}) allows registration at call time
- User Defined Table Functions - extracts UDTFs at parse time and executes registered functions
- **Appender - Row by Row ** Exposes DuckDB's appender API for fast sequential writes to duckdb databases
Installation
From PyPI
pip install bareduckdb
From Source
git clone --recurse-submodules https://github.com/paultiq/bareduckdb.git
cd bareduckdb
uv sync -v # or: pip install -e .
Basic Usage
import bareduckdb
# Connect to in-memory database
conn = bareduckdb.connect()
# Execute query and get Arrow Table
result = conn.execute("SELECT 42 as answer").arrow_table()
print(result)
# Convert to Polars/Pandas/PyArrow
df_polars = conn.execute("SELECT * FROM range(100)").pl()
df_pandas = conn.execute("SELECT * FROM range(100)").df()
Async API
import asyncio
from bareduckdb.aio import connect_async
async def run_query():
async with await connect_async() as conn:
result = await conn.execute("SELECT * FROM generate_series(1, 1000)")
return result
result = asyncio.run(run_query())
Polars Integration
import bareduckdb
import polars as pl
conn = bareduckdb.connect()
# Polars -> DuckDB (Arrow Capsule protocol)
df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
conn.register("my_table", df)
# DuckDB -> Polars (direct conversion)
result = conn.execute("SELECT * FROM my_table", output_type="polars")
Architecture
Design Principles
- Keep it in Python — Business logic lives in Python, not Cython/C++
- No GIL interaction from DuckDB threads — All Python operations happen before/after query execution
- Semantic Versioning — Strict stability guarantees
- Arrow-first — All data types map through Arrow's type system
Why Arrow-First?
By forcing all conversions through Arrow, bareduckdb achieves:
- Consistent type mappings across Polars/Pandas/PyArrow
- Reduced code complexity (no per-library conversion paths)
- Better memory efficiency (zero-copy where possible)
- Future-proof (Arrow is the lingua franca for columnar data)
Thread Safety & Free-Threading
Free-threading support (Python 3.13+):
- No global locks in critical paths
- DuckDB threads never acquire the GIL
- Safe concurrent query execution in
--disable-gilmode - Atomic operations for Arrow stream coordination
APIs
bareduckdb provides multiple API layers for different use cases:
1. Core API (bareduckdb.core)
Minimal, no-frills interface for maximum performance.
from bareduckdb.core import Connection
conn = Connection()
result = conn.execute("SELECT 1")
2. Async API (bareduckdb.aio)
Non-blocking operations with async/await.
from bareduckdb.aio import connect_async
conn = await connect_async()
result = await conn.execute("SELECT 1")
3. Compatibility API (bareduckdb.compat)
Familiar interface similar to duckdb-python (with intentional differences).
import bareduckdb
conn = bareduckdb.connect()
result = conn.sql("SELECT 1") # Eager execution
4. DBAPI 2.0 (bareduckdb.dbapi)
Standard Python database interface for compatibility with tools like SQLAlchemy.
from bareduckdb.dbapi import connect
conn = connect()
cursor = conn.cursor()
cursor.execute("SELECT 1")
Key Differences
Experimental Features
When pyarrow is installed, two experimental features are available -
Arrow Statistics and Cardinality
In duckdb-python, Arrow Tables, Readers and Capsules are all converted to Streams via DataSet->Scanner->Reader. These Streams have no cardinality (number of rows) nor statistics (such as: min max, number of distinct values, contains nulls).
Cardinality is used at determining whether to use TopN, which significantly speeds up (w/ less memory) "order by X limit N" queries when N is small relative to size of table. Statistics are used for query planning by the optimizer.
In bareduckdb, Arrow Tables are registered directly (as Tables, not Streams) and used by arrow_scan_dataset which can then retrieve cardinality and column level statistics.
Statistics Options:
The register() method accepts a statistics parameter to control which columns have statistics computed:
import bareduckdb
conn = bareduckdb.connect()
# No statistics (fastest registration, default)
conn.register("table", df, statistics=None)
# Numeric columns only (recommended for most use cases)
conn.register("table", df, statistics="numeric")
# All columns (slowest - includes string min/max)
conn.register("table", df, statistics=True)
# Specific columns by name
conn.register("table", df, statistics=["id", "price", "date"])
# Regex pattern to match column names
conn.register("table", df, statistics=".*_id") # all columns ending with _id
Setting a Default:
Configure the default statistics mode at connection level:
# All register() calls will use numeric statistics by default
conn = bareduckdb.connect(default_statistics="numeric")
conn.register("table1", df1) # uses numeric stats
conn.register("table2", df2) # uses numeric stats
conn.register("table3", df3, statistics=False) # override: no stats
Performance Impact (500K rows, 2 numeric + 2 string columns):
| Mode | Registration Time | Use Case |
|---|---|---|
None |
~0.4ms | No filter pushdown needed |
"numeric" |
~10ms | JOIN/filter on numeric columns |
True |
~22ms | Filter pushdown on all columns |
The "numeric" option provides the best balance: fast registration with statistics for the columns most commonly used in filters and JOINs (IDs, dates, prices).
Arrow Pushdown
Arrow projection and filter pushdowns are implemented using the Arrow C++ library. Pushdowns are only implemented for Tables currently.
Relational API
- Use Ibis
Replacement Scans
Automatically discover Arrow tables in the caller's scope without explicit registration:
import bareduckdb
import pyarrow as pa
conn = bareduckdb.connect(enable_replacement_scan=True)
my_data = pa.table({"a": [1, 2, 3], "b": [4, 5, 6]})
result = conn.execute("SELECT * FROM my_data").arrow_table()
Customization: Override _get_replacement(name) method for custom discovery logic (e.g., loading from disk, fetching from API).
Manual Registration: Use .register() for explicit control or .execute(..., data={"name": df}) for inline registration.
Not (Yet?) Supported
- No Python UDFs (scalar functions)
- No fsspec integration
User Defined Table Functions
Table functions execute in Python before query execution, enabling data generation and connection injection without GIL interaction:
import bareduckdb
import pyarrow as pa
def generate_data(rows: int, multiplier: int = 1) -> pa.Table:
return pa.table({
"id": range(rows),
"value": [i * multiplier for i in range(rows)]
})
conn = bareduckdb.connect()
conn.register_udtf("generate_data", generate_data)
result = conn.execute("""
SELECT * FROM generate_data(100, 10)
WHERE value > 500
""").arrow_table()
Features:
- AST-based query preprocessing - pure Python
- Connection injection: Add
connparameter to access connection during execution - Supports any Arrow-compatible object: PyArrow Table, Polars DataFrame, Pandas DataFrame
Arrow Enhancements
- Deadlock detection
Type Mappings
All types convert through Arrow:
- UUIDs: Returned as strings (Arrow doesn't have native UUID type)
- Decimals: Arrow
Decimal128/Decimal256 - Timestamps: Arrow
Timestampwith timezone preservation - Nested Types: Struct/List/Map fully supported
Development
Building from Source
# Clone with submodules (sparse checkout is automatic)
git clone --recurse-submodules https://github.com/iqmo-org/bareduckdb.git
cd bareduckdb
# Install development dependencies
uv sync
# Build in development mode
pip install -e .
* Note 1: DuckDB submodule version must match the library version. * Note 2: PyArrow version must match the runtime version for Table registration / Pushdown
Disclaimer
For official Python bindings, see: https://github.com/duckdb/duckdb-python
License
bareduckdb is licensed under the MIT License. See LICENSE for details.
All original copyrights are retained by their respective owners, including DuckDB and DuckDB-Python
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bareduckdb-0.8.144-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: bareduckdb-0.8.144-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 32.6 MB
- Tags: CPython 3.14t, manylinux: glibc 2.26+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ddfaad603e0a8910b874f840a7c0691f574e1952f1594de36152f4d58cf74d9
|
|
| MD5 |
e6c2a85d126c267b552d72cf29012615
|
|
| BLAKE2b-256 |
293c20b2ae1ca18cdc538d3e5f2ddd5a76a441d823a0ab02d1902c310c780c8c
|
Provenance
The following attestation bundles were made for bareduckdb-0.8.144-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl:
Publisher:
build_wheels.yml on iqmo-org/bareduckdb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bareduckdb-0.8.144-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl -
Subject digest:
3ddfaad603e0a8910b874f840a7c0691f574e1952f1594de36152f4d58cf74d9 - Sigstore transparency entry: 865514766
- Sigstore integration time:
-
Permalink:
iqmo-org/bareduckdb@b6154a057c51f48a59d6ab3b13dcd45415e4ab70 -
Branch / Tag:
refs/tags/v0.8.144 - Owner: https://github.com/iqmo-org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build_wheels.yml@b6154a057c51f48a59d6ab3b13dcd45415e4ab70 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file bareduckdb-0.8.144-cp314-cp314t-macosx_11_0_arm64.whl.
File metadata
- Download URL: bareduckdb-0.8.144-cp314-cp314t-macosx_11_0_arm64.whl
- Upload date:
- Size: 34.6 MB
- Tags: CPython 3.14t, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
129014bcfaca969e0a9b00e6dae6ce17eafa96b3394aa93f69b433b5f7aad34e
|
|
| MD5 |
836ac3babd0972fa3e3f7acd6e874108
|
|
| BLAKE2b-256 |
4a4f1fa830abc9f2d688fce4e1b8287cbb722aff96049f0b4c080ca23f696f93
|
Provenance
The following attestation bundles were made for bareduckdb-0.8.144-cp314-cp314t-macosx_11_0_arm64.whl:
Publisher:
build_wheels.yml on iqmo-org/bareduckdb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bareduckdb-0.8.144-cp314-cp314t-macosx_11_0_arm64.whl -
Subject digest:
129014bcfaca969e0a9b00e6dae6ce17eafa96b3394aa93f69b433b5f7aad34e - Sigstore transparency entry: 865514666
- Sigstore integration time:
-
Permalink:
iqmo-org/bareduckdb@b6154a057c51f48a59d6ab3b13dcd45415e4ab70 -
Branch / Tag:
refs/tags/v0.8.144 - Owner: https://github.com/iqmo-org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build_wheels.yml@b6154a057c51f48a59d6ab3b13dcd45415e4ab70 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file bareduckdb-0.8.144-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: bareduckdb-0.8.144-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 32.5 MB
- Tags: CPython 3.12+, manylinux: glibc 2.26+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0ba23efea1a82bb0e11b590fed8afc136d4e31a4c63cc425e9c923148754bbd
|
|
| MD5 |
f6c86a7a245943176b96d0c730b93697
|
|
| BLAKE2b-256 |
e19b95fd860d73c76d781e884f4f49321ade4034e266e713433f9b3bcd486e85
|
Provenance
The following attestation bundles were made for bareduckdb-0.8.144-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl:
Publisher:
build_wheels.yml on iqmo-org/bareduckdb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bareduckdb-0.8.144-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl -
Subject digest:
e0ba23efea1a82bb0e11b590fed8afc136d4e31a4c63cc425e9c923148754bbd - Sigstore transparency entry: 865514579
- Sigstore integration time:
-
Permalink:
iqmo-org/bareduckdb@b6154a057c51f48a59d6ab3b13dcd45415e4ab70 -
Branch / Tag:
refs/tags/v0.8.144 - Owner: https://github.com/iqmo-org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build_wheels.yml@b6154a057c51f48a59d6ab3b13dcd45415e4ab70 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file bareduckdb-0.8.144-cp312-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: bareduckdb-0.8.144-cp312-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 34.5 MB
- Tags: CPython 3.12+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0234f4e0abd9975fc326be2dc2bb8056f134fd982fb9af5c8b7bb956c5859d32
|
|
| MD5 |
08cc4fbbac0a8203bb1215d6cee65825
|
|
| BLAKE2b-256 |
13563c51269f386111ed52037898491486ea799e0b0fe857b1d847b1096002aa
|
Provenance
The following attestation bundles were made for bareduckdb-0.8.144-cp312-abi3-macosx_11_0_arm64.whl:
Publisher:
build_wheels.yml on iqmo-org/bareduckdb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bareduckdb-0.8.144-cp312-abi3-macosx_11_0_arm64.whl -
Subject digest:
0234f4e0abd9975fc326be2dc2bb8056f134fd982fb9af5c8b7bb956c5859d32 - Sigstore transparency entry: 865514829
- Sigstore integration time:
-
Permalink:
iqmo-org/bareduckdb@b6154a057c51f48a59d6ab3b13dcd45415e4ab70 -
Branch / Tag:
refs/tags/v0.8.144 - Owner: https://github.com/iqmo-org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build_wheels.yml@b6154a057c51f48a59d6ab3b13dcd45415e4ab70 -
Trigger Event:
workflow_dispatch
-
Statement type: