Skip to main content

Minimal Cython-based DuckDB Bindings

Project description

bareduckdb

Simplified, Dynamically Linked DuckDB Python Bindings — Fast, simple, and free-threaded.

Python 3.12+ License: MIT


Overview

bareduckdb provides extensible and easy to build Python bindings to DuckDB using Cython.

  • Simple ~2k lines of C++ and ~2k lines of Python - easy to extend or customize
  • Arrow-first data conversion supporting Polars, PyArrow, and Pandas
  • Support for latest Python features Free threading, subinterpreters, ABI3 and asyncio
  • Dynamically linked to DuckDB's official library
  • Experimental Enhancements

Experimental Enhancements

  • Explicit Stream vs Materialization Modes - At connection & execution time, select whether you want materialized arrow_tables or streaming arrow_readers.
  • Arrow Deadlock Detection - certain use cases involving reuse of Arrow Readers can cause deadlocks
  • Table Statistics - Extracts and passes table statistics at registration time
  • Polars - No PyArrow Required - Polars can be read and produced without importing / installing PyArrow
  • Polars - Native LazyFrame Pushdown - whereas DuckDB collects() LazyFrames before pushdown, bareduckdb pushes down native Polars predicates
  • Inline Registration - bareduckdb.execute("query", data={...}) allows registration at call time
  • User Defined Table Functions - extracts UDTFs at parse time and executes registered functions
  • **Appender - Row by Row ** Exposes DuckDB's appender API for fast sequential writes to duckdb databases

Installation

From PyPI

pip install bareduckdb

From Source

git clone --recurse-submodules https://github.com/paultiq/bareduckdb.git
cd bareduckdb
uv sync -v # or: pip install -e .

Basic Usage

import bareduckdb

# Connect to in-memory database
conn = bareduckdb.connect()

# Execute query and get Arrow Table
result = conn.execute("SELECT 42 as answer").arrow_table()
print(result)

# Convert to Polars/Pandas/PyArrow
df_polars = conn.execute("SELECT * FROM range(100)").pl()
df_pandas = conn.execute("SELECT * FROM range(100)").df()

Async API

import asyncio
from bareduckdb.aio import connect_async

async def run_query():
    async with await connect_async() as conn:
        result = await conn.execute("SELECT * FROM generate_series(1, 1000)")
        return result

result = asyncio.run(run_query())

Polars Integration

import bareduckdb
import polars as pl

conn = bareduckdb.connect()

# Polars -> DuckDB (Arrow Capsule protocol)
df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
conn.register("my_table", df)

# DuckDB -> Polars (direct conversion)
result = conn.execute("SELECT * FROM my_table", output_type="polars")

Architecture

Design Principles

  1. Keep it in Python — Business logic lives in Python, not Cython/C++
  2. No GIL interaction from DuckDB threads — All Python operations happen before/after query execution
  3. Semantic Versioning — Strict stability guarantees
  4. Arrow-first — All data types map through Arrow's type system

Why Arrow-First?

By forcing all conversions through Arrow, bareduckdb achieves:

  • Consistent type mappings across Polars/Pandas/PyArrow
  • Reduced code complexity (no per-library conversion paths)
  • Better memory efficiency (zero-copy where possible)
  • Future-proof (Arrow is the lingua franca for columnar data)

Thread Safety & Free-Threading

Free-threading support (Python 3.13+):

  • No global locks in critical paths
  • DuckDB threads never acquire the GIL
  • Safe concurrent query execution in --disable-gil mode
  • Atomic operations for Arrow stream coordination

APIs

bareduckdb provides multiple API layers for different use cases:

1. Core API (bareduckdb.core)

Minimal, no-frills interface for maximum performance.

from bareduckdb.core import Connection
conn = Connection()
result = conn.execute("SELECT 1")

2. Async API (bareduckdb.aio)

Non-blocking operations with async/await.

from bareduckdb.aio import connect_async
conn = await connect_async()
result = await conn.execute("SELECT 1")

3. Compatibility API (bareduckdb.compat)

Familiar interface similar to duckdb-python (with intentional differences).

import bareduckdb
conn = bareduckdb.connect()
result = conn.sql("SELECT 1")  # Eager execution

4. DBAPI 2.0 (bareduckdb.dbapi)

Standard Python database interface for compatibility with tools like SQLAlchemy.

from bareduckdb.dbapi import connect
conn = connect()
cursor = conn.cursor()
cursor.execute("SELECT 1")

Key Differences

Experimental Features

When pyarrow is installed, two experimental features are available -

Arrow Statistics and Cardinality

In duckdb-python, Arrow Tables, Readers and Capsules are all converted to Streams via DataSet->Scanner->Reader. These Streams have no cardinality (number of rows) nor statistics (such as: min max, number of distinct values, contains nulls).

Cardinality is used at determining whether to use TopN, which significantly speeds up (w/ less memory) "order by X limit N" queries when N is small relative to size of table. Statistics are used for query planning by the optimizer.

In bareduckdb, Arrow Tables are registered directly (as Tables, not Streams) and used by arrow_scan_dataset which can then retrieve cardinality and column level statistics.

Statistics Options:

The register() method accepts a statistics parameter to control which columns have statistics computed:

import bareduckdb

conn = bareduckdb.connect()

# No statistics (fastest registration, default)
conn.register("table", df, statistics=None)

# Numeric columns only (recommended for most use cases)
conn.register("table", df, statistics="numeric")

# All columns (slowest - includes string min/max)
conn.register("table", df, statistics=True)

# Specific columns by name
conn.register("table", df, statistics=["id", "price", "date"])

# Regex pattern to match column names
conn.register("table", df, statistics=".*_id")  # all columns ending with _id

Setting a Default:

Configure the default statistics mode at connection level:

# All register() calls will use numeric statistics by default
conn = bareduckdb.connect(default_statistics="numeric")
conn.register("table1", df1)  # uses numeric stats
conn.register("table2", df2)  # uses numeric stats
conn.register("table3", df3, statistics=False)  # override: no stats

Performance Impact (500K rows, 2 numeric + 2 string columns):

Mode Registration Time Use Case
None ~0.4ms No filter pushdown needed
"numeric" ~10ms JOIN/filter on numeric columns
True ~22ms Filter pushdown on all columns

The "numeric" option provides the best balance: fast registration with statistics for the columns most commonly used in filters and JOINs (IDs, dates, prices).

Arrow Pushdown

Arrow projection and filter pushdowns are implemented using the Arrow C++ library. Pushdowns are only implemented for Tables currently.

Relational API

Replacement Scans

Automatically discover Arrow tables in the caller's scope without explicit registration:

import bareduckdb
import pyarrow as pa

conn = bareduckdb.connect(enable_replacement_scan=True)
my_data = pa.table({"a": [1, 2, 3], "b": [4, 5, 6]})

result = conn.execute("SELECT * FROM my_data").arrow_table()

Customization: Override _get_replacement(name) method for custom discovery logic (e.g., loading from disk, fetching from API).

Manual Registration: Use .register() for explicit control or .execute(..., data={"name": df}) for inline registration.

Not (Yet?) Supported

  • No Python UDFs (scalar functions)
  • No fsspec integration

User Defined Table Functions

Table functions execute in Python before query execution, enabling data generation and connection injection without GIL interaction:

import bareduckdb
import pyarrow as pa

def generate_data(rows: int, multiplier: int = 1) -> pa.Table:
    return pa.table({
        "id": range(rows),
        "value": [i * multiplier for i in range(rows)]
    })

conn = bareduckdb.connect()
conn.register_udtf("generate_data", generate_data)

result = conn.execute("""
    SELECT * FROM generate_data(100, 10)
    WHERE value > 500
""").arrow_table()

Features:

  • AST-based query preprocessing - pure Python
  • Connection injection: Add conn parameter to access connection during execution
  • Supports any Arrow-compatible object: PyArrow Table, Polars DataFrame, Pandas DataFrame

Arrow Enhancements

  • Deadlock detection

Type Mappings

All types convert through Arrow:

  • UUIDs: Returned as strings (Arrow doesn't have native UUID type)
  • Decimals: Arrow Decimal128/Decimal256
  • Timestamps: Arrow Timestamp with timezone preservation
  • Nested Types: Struct/List/Map fully supported

Development

Building from Source

# Clone with submodules (sparse checkout is automatic)
git clone --recurse-submodules https://github.com/iqmo-org/bareduckdb.git
cd bareduckdb

# Install development dependencies
uv sync

# Build in development mode
pip install -e .

* Note 1: DuckDB submodule version must match the library version. * Note 2: PyArrow version must match the runtime version for Table registration / Pushdown

Disclaimer

For official Python bindings, see: https://github.com/duckdb/duckdb-python

License

bareduckdb is licensed under the MIT License. See LICENSE for details.

All original copyrights are retained by their respective owners, including DuckDB and DuckDB-Python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bareduckdb-0.6.143-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (32.8 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

bareduckdb-0.6.143-cp314-cp314t-macosx_11_0_arm64.whl (34.7 MB view details)

Uploaded CPython 3.14tmacOS 11.0+ ARM64

bareduckdb-0.6.143-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (32.6 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.26+ x86-64manylinux: glibc 2.28+ x86-64

bareduckdb-0.6.143-cp312-abi3-macosx_11_0_arm64.whl (34.6 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

File details

Details for the file bareduckdb-0.6.143-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for bareduckdb-0.6.143-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a03d0905349aa8cc65778023998c05e538e7c036b8c69790600b7dfe803104d9
MD5 0e5dcd63a89fa9b90c9c7fff8075a7a2
BLAKE2b-256 b01b5effd761617ffb4725b2149ae0b159236ab33099ce5dcdded28467f8f153

See more details on using hashes here.

Provenance

The following attestation bundles were made for bareduckdb-0.6.143-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl:

Publisher: build_wheels.yml on iqmo-org/bareduckdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bareduckdb-0.6.143-cp314-cp314t-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bareduckdb-0.6.143-cp314-cp314t-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f3e3d69aef6f402886bd9b063d206e346b7266a78c4f9e97aced79fb625c9c03
MD5 87e7c97a2ba1e9043ff1693f52b8618e
BLAKE2b-256 16be92d28f32e1786c044ce23437b788c20fc9b8a0652359e2b6b996fb20700d

See more details on using hashes here.

Provenance

The following attestation bundles were made for bareduckdb-0.6.143-cp314-cp314t-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on iqmo-org/bareduckdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bareduckdb-0.6.143-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for bareduckdb-0.6.143-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a377d4b435464acf6429d6158c58c2c605d486e6515829cfa736976a5c770f86
MD5 04bf78014691f6889ce9803ffb2818ab
BLAKE2b-256 5fd866c3353fa8d931ba3f267a8b3b26e4ea1a2443a0b03bd1c65b215996c1ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for bareduckdb-0.6.143-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl:

Publisher: build_wheels.yml on iqmo-org/bareduckdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bareduckdb-0.6.143-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bareduckdb-0.6.143-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 97eed78b44e3480124079c571b1dbe1318f4e3132e7fcacd0407347c0b6a461d
MD5 3e27736187b73e8f32a91687ab065bfe
BLAKE2b-256 7a7736c20907281b4ab75058573635b0ab75ab78394a3a8511d011918527bd74

See more details on using hashes here.

Provenance

The following attestation bundles were made for bareduckdb-0.6.143-cp312-abi3-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on iqmo-org/bareduckdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page