Skip to main content

Minimal Cython-based DuckDB Bindings

Project description

bareduckdb

Simplified, Dynamically Linked DuckDB Python Bindings — Fast, simple, and free-threaded.

PyPI version Python 3.12+ License: MIT

Overview

bareduckdb provides extensible and easy to build Python bindings to DuckDB using Cython.

  • Simple ~2k lines of C++ and ~2k lines of Python - easy to extend or customize
  • Arrow-first data conversion supporting Polars, PyArrow, and Pandas
  • Support for latest Python features Free threading, subinterpreters, ABI3 and asyncio
  • Dynamically linked to DuckDB's official library
  • Experimental Enhancements

Experimental Enhancements

  • Explicit Stream vs Materialization Modes - At connection & execution time, select whether you want materialized arrow_tables or streaming arrow_readers.
  • Arrow Deadlock Detection - certain use cases involving reuse of Arrow Readers can cause deadlocks
  • Table Statistics - Extracts and passes table statistics at registration time
  • Polars - No PyArrow Required - Polars can be read and produced without importing / installing PyArrow
  • Polars - Native LazyFrame Pushdown - whereas DuckDB collects() LazyFrames before pushdown, bareduckdb pushes down native Polars predicates
  • Inline Registration - bareduckdb.execute("query", data={...}) allows registration at call time
  • User Defined Table Functions - extracts UDTFs at parse time and executes registered functions
  • **Appender - Row by Row ** Exposes DuckDB's appender API for fast sequential writes to duckdb databases

Installation

From PyPI

pip install bareduckdb

From Source

git clone --recurse-submodules https://github.com/paultiq/bareduckdb.git
cd bareduckdb
uv sync -v # or: pip install -e .

Basic Usage

import bareduckdb

# Connect to in-memory database
conn = bareduckdb.connect()

# Execute query and get Arrow Table
result = conn.execute("SELECT 42 as answer").arrow_table()
print(result)

# Convert to Polars/Pandas/PyArrow
df_polars = conn.execute("SELECT * FROM range(100)").pl()
df_pandas = conn.execute("SELECT * FROM range(100)").df()

Async API

import asyncio
from bareduckdb.aio import connect_async

async def run_query():
    async with await connect_async() as conn:
        result = await conn.execute("SELECT * FROM generate_series(1, 1000)")
        return result

result = asyncio.run(run_query())

Polars Integration

import bareduckdb
import polars as pl

conn = bareduckdb.connect()

# Polars -> DuckDB (Arrow Capsule protocol)
df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
conn.register("my_table", df)

# DuckDB -> Polars (direct conversion)
result = conn.execute("SELECT * FROM my_table", output_type="polars")

Architecture

Design Principles

  1. Keep it in Python — Business logic lives in Python, not Cython/C++
  2. No GIL interaction from DuckDB threads — All Python operations happen before/after query execution
  3. Semantic Versioning — Strict stability guarantees
  4. Arrow-first — All data types map through Arrow's type system

Why Arrow-First?

By forcing all conversions through Arrow, bareduckdb achieves:

  • Consistent type mappings across Polars/Pandas/PyArrow
  • Reduced code complexity (no per-library conversion paths)
  • Better memory efficiency (zero-copy where possible)
  • Future-proof (Arrow is the lingua franca for columnar data)

Thread Safety & Free-Threading

Free-threading support (Python 3.13+):

  • No global locks in critical paths
  • DuckDB threads never acquire the GIL
  • Safe concurrent query execution in --disable-gil mode
  • Atomic operations for Arrow stream coordination

APIs

bareduckdb provides multiple API layers for different use cases:

1. Core API (bareduckdb.core)

Minimal, no-frills interface for maximum performance.

from bareduckdb.core import Connection
conn = Connection()
result = conn.execute("SELECT 1")

2. Async API (bareduckdb.aio)

Non-blocking operations with async/await.

from bareduckdb.aio import connect_async
conn = await connect_async()
result = await conn.execute("SELECT 1")

3. Compatibility API (bareduckdb.compat)

Familiar interface similar to duckdb-python (with intentional differences).

import bareduckdb
conn = bareduckdb.connect()
result = conn.sql("SELECT 1")  # Eager execution

4. DBAPI 2.0 (bareduckdb.dbapi)

Standard Python database interface for compatibility with tools like SQLAlchemy.

from bareduckdb.dbapi import connect
conn = connect()
cursor = conn.cursor()
cursor.execute("SELECT 1")

Key Differences

Experimental Features

When pyarrow is installed, two experimental features are available -

Arrow Statistics and Cardinality

In duckdb-python, Arrow Tables, Readers and Capsules are all converted to Streams via DataSet->Scanner->Reader. These Streams have no cardinality (number of rows) nor statistics (such as: min max, number of distinct values, contains nulls).

Cardinality is used at determining whether to use TopN, which significantly speeds up (w/ less memory) "order by X limit N" queries when N is small relative to size of table. Statistics are used for query planning by the optimizer.

In bareduckdb, Arrow Tables are registered directly (as Tables, not Streams) and used by arrow_scan_dataset which can then retrieve cardinality and column level statistics.

Statistics Options:

The register() method accepts a statistics parameter to control which columns have statistics computed:

import bareduckdb

conn = bareduckdb.connect()

# No statistics (fastest registration, default)
conn.register("table", df, statistics=None)

# Numeric columns only (recommended for most use cases)
conn.register("table", df, statistics="numeric")

# All columns (slowest - includes string min/max)
conn.register("table", df, statistics=True)

# Specific columns by name
conn.register("table", df, statistics=["id", "price", "date"])

# Regex pattern to match column names
conn.register("table", df, statistics=".*_id")  # all columns ending with _id

Setting a Default:

Configure the default statistics mode at connection level:

# All register() calls will use numeric statistics by default
conn = bareduckdb.connect(default_statistics="numeric")
conn.register("table1", df1)  # uses numeric stats
conn.register("table2", df2)  # uses numeric stats
conn.register("table3", df3, statistics=False)  # override: no stats

Performance Impact (500K rows, 2 numeric + 2 string columns):

Mode Registration Time Use Case
None ~0.4ms No filter pushdown needed
"numeric" ~10ms JOIN/filter on numeric columns
True ~22ms Filter pushdown on all columns

The "numeric" option provides the best balance: fast registration with statistics for the columns most commonly used in filters and JOINs (IDs, dates, prices).

Arrow Pushdown

Arrow projection and filter pushdowns are implemented using the Arrow C++ library. Pushdowns are only implemented for Tables currently.

Relational API

Replacement Scans

Automatically discover Arrow tables in the caller's scope without explicit registration:

import bareduckdb
import pyarrow as pa

conn = bareduckdb.connect(enable_replacement_scan=True)
my_data = pa.table({"a": [1, 2, 3], "b": [4, 5, 6]})

result = conn.execute("SELECT * FROM my_data").arrow_table()

Customization: Override _get_replacement(name) method for custom discovery logic (e.g., loading from disk, fetching from API).

Manual Registration: Use .register() for explicit control or .execute(..., data={"name": df}) for inline registration.

Not (Yet?) Supported

  • No Python UDFs (scalar functions)
  • No fsspec integration

User Defined Table Functions

Table functions execute in Python before query execution, enabling data generation and connection injection without GIL interaction:

import bareduckdb
import pyarrow as pa

def generate_data(rows: int, multiplier: int = 1) -> pa.Table:
    return pa.table({
        "id": range(rows),
        "value": [i * multiplier for i in range(rows)]
    })

conn = bareduckdb.connect()
conn.register_udtf("generate_data", generate_data)

result = conn.execute("""
    SELECT * FROM generate_data(100, 10)
    WHERE value > 500
""").arrow_table()

Features:

  • AST-based query preprocessing - pure Python
  • Connection injection: Add conn parameter to access connection during execution
  • Supports any Arrow-compatible object: PyArrow Table, Polars DataFrame, Pandas DataFrame

Arrow Enhancements

  • Deadlock detection

Type Mappings

All types convert through Arrow:

  • UUIDs: Returned as strings (Arrow doesn't have native UUID type)
  • Decimals: Arrow Decimal128/Decimal256
  • Timestamps: Arrow Timestamp with timezone preservation
  • Nested Types: Struct/List/Map fully supported

Development

Building from Source

# Clone with submodules (sparse checkout is automatic)
git clone --recurse-submodules https://github.com/iqmo-org/bareduckdb.git
cd bareduckdb

# Install development dependencies
uv sync

# Build in development mode
pip install -e .

* Note 1: DuckDB submodule version must match the library version. * Note 2: PyArrow version must match the runtime version for Table registration / Pushdown

Disclaimer

For official Python bindings, see: https://github.com/duckdb/duckdb-python

License

bareduckdb is licensed under the MIT License. See LICENSE for details.

All original copyrights are retained by their respective owners, including DuckDB and DuckDB-Python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bareduckdb-0.8.144-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (32.6 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

bareduckdb-0.8.144-cp314-cp314t-macosx_11_0_arm64.whl (34.6 MB view details)

Uploaded CPython 3.14tmacOS 11.0+ ARM64

bareduckdb-0.8.144-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (32.5 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

bareduckdb-0.8.144-cp312-abi3-macosx_11_0_arm64.whl (34.5 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

File details

Details for the file bareduckdb-0.8.144-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for bareduckdb-0.8.144-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3ddfaad603e0a8910b874f840a7c0691f574e1952f1594de36152f4d58cf74d9
MD5 e6c2a85d126c267b552d72cf29012615
BLAKE2b-256 293c20b2ae1ca18cdc538d3e5f2ddd5a76a441d823a0ab02d1902c310c780c8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for bareduckdb-0.8.144-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl:

Publisher: build_wheels.yml on iqmo-org/bareduckdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bareduckdb-0.8.144-cp314-cp314t-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bareduckdb-0.8.144-cp314-cp314t-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 129014bcfaca969e0a9b00e6dae6ce17eafa96b3394aa93f69b433b5f7aad34e
MD5 836ac3babd0972fa3e3f7acd6e874108
BLAKE2b-256 4a4f1fa830abc9f2d688fce4e1b8287cbb722aff96049f0b4c080ca23f696f93

See more details on using hashes here.

Provenance

The following attestation bundles were made for bareduckdb-0.8.144-cp314-cp314t-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on iqmo-org/bareduckdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bareduckdb-0.8.144-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for bareduckdb-0.8.144-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e0ba23efea1a82bb0e11b590fed8afc136d4e31a4c63cc425e9c923148754bbd
MD5 f6c86a7a245943176b96d0c730b93697
BLAKE2b-256 e19b95fd860d73c76d781e884f4f49321ade4034e266e713433f9b3bcd486e85

See more details on using hashes here.

Provenance

The following attestation bundles were made for bareduckdb-0.8.144-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl:

Publisher: build_wheels.yml on iqmo-org/bareduckdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bareduckdb-0.8.144-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bareduckdb-0.8.144-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0234f4e0abd9975fc326be2dc2bb8056f134fd982fb9af5c8b7bb956c5859d32
MD5 08cc4fbbac0a8203bb1215d6cee65825
BLAKE2b-256 13563c51269f386111ed52037898491486ea799e0b0fe857b1d847b1096002aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for bareduckdb-0.8.144-cp312-abi3-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on iqmo-org/bareduckdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page