Skip to main content

Pythonic, typed wrappers for DuckDB relations and tables (Duck+).

Project description

Duck+ (duckplus)

Duck+ is a user-friendly companion to DuckDB for Python projects that want typed helpers, predictable joins, and safe table operations. It wraps DuckDB relations so you can compose analytics pipelines with readable Python while still generating explicit SQL under the hood.


What you get

  • Typed relational wrappersDuckRel keeps transformations immutable and chainable.
  • Safe table workflowsDuckTable owns inserts, appends, and idempotent ingestion strategies.
  • Explicit joins and casing rules – column names stay intact, projections are deliberate, and collisions fail loudly unless you opt in to suffixes.
  • Optional helpers – secrets management, a read-only CLI, and HTML previews stay in extras so the core package remains lightweight.

Install in seconds

Duck+ targets Python 3.12+ and DuckDB 1.3.0 or newer.

uv pip install duckplus

For development, clone the repository and run uv sync to create the managed environment with test and typing dependencies. Build the documentation locally with uv run sphinx-build -b html docs/source docs/_build/html, then open docs/_build/html/index.html in your browser to preview the site.


Quickstart

from duckplus import DuckRel, col, connect

with connect() as conn:
    rel = DuckRel(
        conn.raw.sql(
            """
            SELECT *
            FROM (VALUES
                (1, 'Alpha', 10),
                (2, 'Beta', 5),
                (3, 'Gamma', 8)
            ) AS t(id, name, score)
            """
        )
    )

    top_scores = (
        rel
        .filter(col("score") >= 8)
        .project({"id": col("id"), "name": col("name"), "score": col("score")})
        .order_by((col("score"), "desc"))
    )

    print(top_scores.materialize().require_table().to_pylist())

This snippet opens an in-memory DuckDB connection, builds a relation, filters rows with fluent column expressions, and materializes results safely.


Core workflows

Connect and manage context

from duckplus import connect

with connect(path="analytics.duckdb") as conn:
    rel = conn.relation("SELECT 42 AS answer")
    print(rel.to_df())

Connections default to in-memory databases. Pass path for file-backed workloads; Duck+ keeps them read-only by default.

Transform relations with DuckRel

deduped = (
    rel
    .distinct()
    .project({"score": "AVG(score)"}, group_by=["name"])
    .order_by(score="desc")
)

DuckRel methods always return new relations and validate column names with case-aware lookups.

Compose with column expressions

The :func:duckplus.col helper returns a ColumnExpression that validates column references before rendering SQL. Provide an optional duck_type=duckplus.ducktypes.* marker to opt into static type hints and runtime validation for aggregates and ordering. Use the helper to build filters, projections, aggregates, and ordering clauses without hand-writing identifiers:

from duckplus import AggregateExpression, col, ducktypes

team = col("team", duck_type=ducktypes.Varchar)
score = col("score", duck_type=ducktypes.Integer)

ranked = (
    rel
    .filter(score > 0)
    .add_columns(double_score=score)
    .aggregate(
        team,
        total_score=AggregateExpression.sum(score),
        peak_score=AggregateExpression.max(score),
    )
    .order_by((col("total_score", duck_type=ducktypes.Integer), "desc"))
)

Typed column expressions propagate through the pipeline. A relation exposes column_type_markers so you can inspect the currently declared DuckDB logical types. Untyped expressions default to :class:duckplus.ducktypes.Unknown, letting you opt into validation incrementally:

typed = rel.project({"team": team, "score": score})
print([marker.describe() for marker in typed.column_type_markers])
# ['VARCHAR', 'INTEGER']

Declaring an incompatible type raises a helpful error before execution. For example, projecting col("score", duck_type=ducktypes.Varchar) on a numeric column fails fast with Projection column 'score' is typed as INTEGER but was declared as VARCHAR.

Typed metadata also powers DuckRel.fetch_typed(), which turns the current relation into Python tuples using the declared :mod:duckplus.ducktypes markers as type hints. The method always returns every projected column so you can rely on the stored schema. Columns without explicit typing fall back to Any; Python hints are derived from the declared DuckDB markers so there's a single source of truth for both runtime validation and static analysis:

rows = typed.fetch_typed()
# list[tuple[str, int]]

totals = (
    typed
    .aggregate(team, total=AggregateExpression.sum(score))
    .order_by((team, "asc"))
)
print(totals.fetch_typed())
# [('alpha', 200), ('beta', 80)]

Explore typed pipelines

End-to-end demos that exercise typed projections, aggregates, and fetches live in :mod:duckplus.examples.typed_pipeline_demos. They build a small in-memory orders dataset, carry DuckDB markers through transformations, and surface the results with fetch_typed() so consumers receive precise Python annotations.

from duckplus import connect
from duckplus.examples import typed_pipeline_demos

with connect() as conn:
    orders = typed_pipeline_demos.typed_orders_demo_relation(conn)

    print(typed_pipeline_demos.priority_order_snapshot(orders))
    # [(1, 'north', 'Alice', 120, 5, datetime.date(2024, 1, 1), True), ...]

    print(typed_pipeline_demos.regional_revenue_summary(orders))
    # [('east', 1, 15), ('north', 3, 365), ('south', 1, 98), ('west', 1, 155)]

    taxed = typed_pipeline_demos.apply_manual_tax_projection(orders)
    print(typed_pipeline_demos.describe_markers(taxed))
    # ['INTEGER', 'VARCHAR', 'VARCHAR', 'UNKNOWN', 'INTEGER', 'DATE', 'BOOLEAN']

The helper functions double as living documentation—the automated tests execute them to ensure guides stay accurate as the API evolves.

Release 0.0.7 reliability demos

Release 0.0.7 introduced :mod:duckplus.examples.reliability_demos, a collection of production-grade helpers that demonstrate how to combine typed column dictionaries, DuckTable idempotent writes, and Arrow materialisation for resilient pipelines:

  • priority_dispatch_payload builds an ordered alert payload with explicit revenue thresholds.
  • incremental_fact_ingest shows how to hydrate a fact table while inserting only unseen keys.
  • customer_spike_detector and regional_order_kpis provide ready-made guardrails that leverage aggregate filters without losing type metadata.
  • arrow_priority_snapshot materializes a cache-friendly Arrow table for fast fan-out.
  • lean_projection_shortcut keeps curated projections short by reusing typed expressions even after string-based transforms.

Each helper returns plain Python structures so orchestration layers can assert on counts, payloads, or schema drift with zero additional wiring.

Aggregate with AggregateExpression

import duckdb
from duckplus import AggregateExpression, DuckRel, col

with duckdb.connect() as conn:
    sales = DuckRel(
        conn.sql(
            """
            SELECT *
            FROM (VALUES
                ('north', 50, DATE '2024-01-03'),
                ('north', 60, DATE '2024-01-02'),
                ('south', 30, DATE '2024-01-01'),
                ('east', 20, DATE '2024-01-04'),
                ('west', 70, DATE '2024-01-05')
            ) AS t(region, amount, sale_date)
            """
        )
    )

    rollup = (
        sales.aggregate(
            col("region"),
            total_amount=AggregateExpression.sum(col("amount")),
            non_north=AggregateExpression.sum(col("amount")).with_filter(col("region") != "north"),
            first_sale_amount=(
                AggregateExpression.function("first", col("amount")).with_order_by((col("sale_date"), "asc"))
            ),
        )
        .order_by((col("region"), "asc"))
    )

    print(rollup.relation.fetchall())

This produces alphabetized totals, a filtered sum, and the first sale per region without hand-writing aggregate SQL. See docs/source/aggregate_demos.rst for a tested, larger set of aggregate examples.

Promote to tables with DuckTable

materialized = deduped.materialize().require_table()
table = materialized.to_table("scores")
table.insert_antijoin(deduped, keys=["name"])

Table wrappers provide append/insert helpers that guard against duplicates and respect column names.

Join with confidence

from duckplus import JoinProjection, JoinSpec, column

spec = JoinSpec(equal_keys=[("order_id", "id")])

projection = JoinProjection(allow_collisions=False)
joined = orders.natural_join(customers, project=projection)

# Add additional join predicates with column comparisons when needed.
currency_safe = orders.left_outer(
    customers,
    JoinSpec(
        equal_keys=[("order_id", "id")],
        predicates=[column("order_date") >= column("customer_since")],
    ),
    project=projection,
)

suffixes = JoinProjection(allow_collisions=True)
safe = orders.left_outer(customers, spec, project=suffixes)

Join helpers project columns explicitly, drop duplicate right-side keys, and raise when collisions would occur. Opt into suffixes through JoinProjection(allow_collisions=True) when needed, and use column() to declare predicates that compare two columns without writing raw SQL.


Extras worth knowing

DataFrame interop

Install optional extras when you want pandas or Polars integration:

uv pip install "duckplus[pandas]"      # pandas DataFrame support
uv pip install "duckplus[polars]"      # Polars DataFrame support

Once installed, relations expose familiar helpers:

df = rel.df()            # pandas.DataFrame
pl_frame = rel.pl()      # polars.DataFrame

from duckplus import DuckRel
rel_from_df = DuckRel.from_pandas(df)
rel_from_pl = DuckRel.from_polars(pl_frame)

Attempting to call these helpers without the matching extra raises a clear ModuleNotFoundError explaining how to install the dependency.

Command line interface

uv run duckplus sql "SELECT 42 AS answer"
uv run duckplus schema "SELECT 1 AS id, 'alpha' AS label"
uv run duckplus --repl

The CLI provides read-only helpers for quick exploration. Point it at a DuckDB file with --database path/to/file.duckdb when needed.

HTML previews

from duckplus import DuckRel, connect, to_html

with connect() as conn:
    rel = DuckRel(conn.raw.sql("SELECT 1 AS id, 'Alice & Bob' AS name"))

html = to_html(rel, max_rows=10, null_display="∅", class_="preview")

to_html renders safe, escaped previews with optional styling hooks.


Documentation workflow

The documentation site is published automatically to GitHub Pages by the Docs workflow. Every push to main and each pull request runs uv sync, builds the Sphinx project, and uploads the generated HTML as a GitHub Pages artifact. Pages serves the most recent deployment at https://isaacnfairplay.github.io/duck/, and workflow summaries include preview links you can share for review.

If a deployment fails:

  1. Open the Actions → Docs run for the failing commit or pull request.
  2. Review the build logs, especially the uv run sphinx-build step for Sphinx warnings promoted to errors.
  3. Re-run the job from the Actions UI after fixing the problem to publish an updated preview.

For a local preview outside CI, run:

uv sync
uv run sphinx-build -b html docs/source docs/_build/html
python -m webbrowser docs/_build/html/index.html  # optional helper to open the preview

Learn more

  • Review the API reference for detailed method docs and typing information.
  • Explore unit tests under tests/ to see edge cases and best practices.

If you run into questions or want to suggest improvements, open an issue or pull request. We welcome contributions that keep Duck+ reliable for the long haul.


License

Duck+ is available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckplus-0.0.9.tar.gz (82.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duckplus-0.0.9-py3-none-any.whl (63.1 kB view details)

Uploaded Python 3

File details

Details for the file duckplus-0.0.9.tar.gz.

File metadata

  • Download URL: duckplus-0.0.9.tar.gz
  • Upload date:
  • Size: 82.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duckplus-0.0.9.tar.gz
Algorithm Hash digest
SHA256 d7306591176b2b9d0f479d5cfc449c564cb3f4dc6a26e13c5a369ae5ab6d5d06
MD5 ed11f8e8dcaad7e28f5119a434862a96
BLAKE2b-256 32032b061a21f2a818bdca5b7b47b78296bc7fe8298aa370b2c0896d89467df6

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckplus-0.0.9.tar.gz:

Publisher: python-publish.yml on isaacnfairplay/duck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file duckplus-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: duckplus-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 63.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duckplus-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 0e6e52b7573316b787089452d49fa14b233faf3db71c1e7d2d44954d1502394a
MD5 bd114e7ea14583d8bd8d0afeeede13aa
BLAKE2b-256 58f51a0147600a03f60f4be1a3b9815f8b0dcae1f288c1024c89afb68ffa2265

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckplus-0.0.9-py3-none-any.whl:

Publisher: python-publish.yml on isaacnfairplay/duck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page