Skip to main content

Pythonic, typed wrappers for DuckDB relations and tables (Duck+).

Project description

Duck+ (duckplus)

Duck+ is a user-friendly companion to DuckDB for Python projects that want typed helpers, predictable joins, and safe table operations. It wraps DuckDB relations so you can compose analytics pipelines with readable Python while still generating explicit SQL under the hood.


What you get

  • Typed relational wrappersDuckRel keeps transformations immutable and chainable.
  • Safe table workflowsDuckTable owns inserts, appends, and idempotent ingestion strategies.
  • Explicit joins and casing rules – column names stay intact, projections are deliberate, and collisions fail loudly unless you opt in to suffixes.
  • Optional helpers – secrets management, a read-only CLI, and HTML previews stay in extras so the core package remains lightweight.

Install in seconds

Duck+ targets Python 3.12+ and DuckDB 1.3.0 or newer.

uv pip install duckplus

For development, clone the repository and run uv sync to create the managed environment with test and typing dependencies. Build the documentation locally with uv run sphinx-build -b html docs/source docs/_build/html, then open docs/_build/html/index.html in your browser to preview the site.


Quickstart

from duckplus import DuckRel, col, connect

with connect() as conn:
    rel = DuckRel(
        conn.raw.sql(
            """
            SELECT *
            FROM (VALUES
                (1, 'Alpha', 10),
                (2, 'Beta', 5),
                (3, 'Gamma', 8)
            ) AS t(id, name, score)
            """
        )
    )

    top_scores = (
        rel
        .filter(col("score") >= 8)
        .project({"id": col("id"), "name": col("name"), "score": col("score")})
        .order_by((col("score"), "desc"))
    )

    print(top_scores.materialize().require_table().to_pylist())

This snippet opens an in-memory DuckDB connection, builds a relation, filters rows with fluent column expressions, and materializes results safely.


Core workflows

Connect and manage context

from duckplus import connect

with connect(path="analytics.duckdb") as conn:
    rel = conn.relation("SELECT 42 AS answer")
    print(rel.to_df())

Connections default to in-memory databases. Pass path for file-backed workloads; Duck+ keeps them read-only by default.

Transform relations with DuckRel

deduped = (
    rel
    .distinct()
    .project({"score": "AVG(score)"}, group_by=["name"])
    .order_by(score="desc")
)

DuckRel methods always return new relations and validate column names with case-aware lookups.

Compose with column expressions

The :func:duckplus.col helper returns a ColumnExpression that validates column references before rendering SQL. Provide an optional duck_type=duckplus.ducktypes.* marker to opt into static type hints and runtime validation for aggregates and ordering. Use the helper to build filters, projections, aggregates, and ordering clauses without hand-writing identifiers:

from duckplus import AggregateExpression, col, ducktypes

team = col("team", duck_type=ducktypes.Varchar)
score = col("score", duck_type=ducktypes.Integer)

ranked = (
    rel
    .filter(score > 0)
    .add_columns(double_score=score)
    .aggregate(
        team,
        total_score=AggregateExpression.sum(score),
        peak_score=AggregateExpression.max(score),
    )
    .order_by((col("total_score", duck_type=ducktypes.Integer), "desc"))
)

Typed column expressions propagate through the pipeline. A relation exposes column_type_markers so you can inspect the currently declared DuckDB logical types. Untyped expressions default to :class:duckplus.ducktypes.Unknown, letting you opt into validation incrementally:

typed = rel.project({"team": team, "score": score})
print([marker.describe() for marker in typed.column_type_markers])
# ['VARCHAR', 'INTEGER']

Declaring an incompatible type raises a helpful error before execution. For example, projecting col("score", duck_type=ducktypes.Varchar) on a numeric column fails fast with Projection column 'score' is typed as INTEGER but was declared as VARCHAR.

Typed metadata also powers DuckRel.fetch_typed(), which turns the current relation into Python tuples using the declared :mod:duckplus.ducktypes markers as type hints. The method always returns every projected column so you can rely on the stored schema. Columns without explicit typing fall back to Any; Python hints are derived from the declared DuckDB markers so there's a single source of truth for both runtime validation and static analysis:

rows = typed.fetch_typed()
# list[tuple[str, int]]

totals = (
    typed
    .aggregate(team, total=AggregateExpression.sum(score))
    .order_by((team, "asc"))
)
print(totals.fetch_typed())
# [('alpha', 200), ('beta', 80)]

Explore typed pipelines

End-to-end demos that exercise typed projections, aggregates, and fetches live in :mod:duckplus.examples.typed_pipeline_demos. They build a small in-memory orders dataset, carry DuckDB markers through transformations, and surface the results with fetch_typed() so consumers receive precise Python annotations.

from duckplus import connect
from duckplus.examples import typed_pipeline_demos

with connect() as conn:
    orders = typed_pipeline_demos.typed_orders_demo_relation(conn)

    print(typed_pipeline_demos.priority_order_snapshot(orders))
    # [(1, 'north', 'Alice', 120, 5, datetime.date(2024, 1, 1), True), ...]

    print(typed_pipeline_demos.regional_revenue_summary(orders))
    # [('east', 1, 15), ('north', 3, 365), ('south', 1, 98), ('west', 1, 155)]

    taxed = typed_pipeline_demos.apply_manual_tax_projection(orders)
    print(typed_pipeline_demos.describe_markers(taxed))
    # ['INTEGER', 'VARCHAR', 'VARCHAR', 'UNKNOWN', 'INTEGER', 'DATE', 'BOOLEAN']

The helper functions double as living documentation—the automated tests execute them to ensure guides stay accurate as the API evolves.

Release 0.0.7 reliability demos

Release 0.0.7 introduced :mod:duckplus.examples.reliability_demos, a collection of production-grade helpers that demonstrate how to combine typed column dictionaries, DuckTable idempotent writes, and Arrow materialisation for resilient pipelines:

  • priority_dispatch_payload builds an ordered alert payload with explicit revenue thresholds.
  • incremental_fact_ingest shows how to hydrate a fact table while inserting only unseen keys.
  • customer_spike_detector and regional_order_kpis provide ready-made guardrails that leverage aggregate filters without losing type metadata.
  • arrow_priority_snapshot materializes a cache-friendly Arrow table for fast fan-out.
  • lean_projection_shortcut keeps curated projections short by reusing typed expressions even after string-based transforms.

Each helper returns plain Python structures so orchestration layers can assert on counts, payloads, or schema drift with zero additional wiring.

Aggregate with AggregateExpression

import duckdb
from duckplus import AggregateExpression, DuckRel, col

with duckdb.connect() as conn:
    sales = DuckRel(
        conn.sql(
            """
            SELECT *
            FROM (VALUES
                ('north', 50, DATE '2024-01-03'),
                ('north', 60, DATE '2024-01-02'),
                ('south', 30, DATE '2024-01-01'),
                ('east', 20, DATE '2024-01-04'),
                ('west', 70, DATE '2024-01-05')
            ) AS t(region, amount, sale_date)
            """
        )
    )

    rollup = (
        sales.aggregate(
            col("region"),
            total_amount=AggregateExpression.sum(col("amount")),
            non_north=AggregateExpression.sum(col("amount")).with_filter(col("region") != "north"),
            first_sale_amount=(
                AggregateExpression.function("first", col("amount")).with_order_by((col("sale_date"), "asc"))
            ),
        )
        .order_by((col("region"), "asc"))
    )

    print(rollup.relation.fetchall())

This produces alphabetized totals, a filtered sum, and the first sale per region without hand-writing aggregate SQL. See docs/source/aggregate_demos.rst for a tested, larger set of aggregate examples.

Promote to tables with DuckTable

materialized = deduped.materialize().require_table()
table = materialized.to_table("scores")
table.insert_antijoin(deduped, keys=["name"])

Table wrappers provide append/insert helpers that guard against duplicates and respect column names.

Join with confidence

from duckplus import JoinProjection, JoinSpec, column

spec = JoinSpec(equal_keys=[("order_id", "id")])

projection = JoinProjection(allow_collisions=False)
joined = orders.natural_join(customers, project=projection)

# Add additional join predicates with column comparisons when needed.
currency_safe = orders.left_outer(
    customers,
    JoinSpec(
        equal_keys=[("order_id", "id")],
        predicates=[column("order_date") >= column("customer_since")],
    ),
    project=projection,
)

suffixes = JoinProjection(allow_collisions=True)
safe = orders.left_outer(customers, spec, project=suffixes)

Join helpers project columns explicitly, drop duplicate right-side keys, and raise when collisions would occur. Opt into suffixes through JoinProjection(allow_collisions=True) when needed, and use column() to declare predicates that compare two columns without writing raw SQL.


Extras worth knowing

DataFrame interop

Install optional extras when you want pandas or Polars integration:

uv pip install "duckplus[pandas]"      # pandas DataFrame support
uv pip install "duckplus[polars]"      # Polars DataFrame support

Once installed, relations expose familiar helpers:

df = rel.df()            # pandas.DataFrame
pl_frame = rel.pl()      # polars.DataFrame

from duckplus import DuckRel
rel_from_df = DuckRel.from_pandas(df)
rel_from_pl = DuckRel.from_polars(pl_frame)

Attempting to call these helpers without the matching extra raises a clear ModuleNotFoundError explaining how to install the dependency.

Command line interface

uv run duckplus sql "SELECT 42 AS answer"
uv run duckplus schema "SELECT 1 AS id, 'alpha' AS label"
uv run duckplus --repl

The CLI provides read-only helpers for quick exploration. Point it at a DuckDB file with --database path/to/file.duckdb when needed.

HTML previews

from duckplus import DuckRel, connect, to_html

with connect() as conn:
    rel = DuckRel(conn.raw.sql("SELECT 1 AS id, 'Alice & Bob' AS name"))

html = to_html(rel, max_rows=10, null_display="∅", class_="preview")

to_html renders safe, escaped previews with optional styling hooks.


Documentation workflow

The documentation site is published automatically to GitHub Pages by the Docs workflow. Every push to main and each pull request runs uv sync, builds the Sphinx project, and deploys the generated HTML to the gh-pages branch. The latest deployment is always available at https://isaacnfairplay.github.io/duck/, and workflow summaries include preview links you can share for review.

If a deployment fails:

  1. Open the Actions → Docs run for the failing commit or pull request.
  2. Review the build logs, especially the uv run sphinx-build step for Sphinx warnings promoted to errors.
  3. Re-run the job from the Actions UI after fixing the problem to publish an updated preview.

For a local preview outside CI, run:

uv sync
uv run sphinx-build -b html docs/source docs/_build/html
python -m webbrowser docs/_build/html/index.html  # optional helper to open the preview

Learn more

  • Review the API reference for detailed method docs and typing information.
  • Explore unit tests under tests/ to see edge cases and best practices.

If you run into questions or want to suggest improvements, open an issue or pull request. We welcome contributions that keep Duck+ reliable for the long haul.


License

Duck+ is available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckplus-0.0.7.tar.gz (82.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duckplus-0.0.7-py3-none-any.whl (63.1 kB view details)

Uploaded Python 3

File details

Details for the file duckplus-0.0.7.tar.gz.

File metadata

  • Download URL: duckplus-0.0.7.tar.gz
  • Upload date:
  • Size: 82.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duckplus-0.0.7.tar.gz
Algorithm Hash digest
SHA256 ce3df86eb3d641d6903154ec7e5fdad052f815dee736dbb71235887aeabc9668
MD5 b2db43762ed8f352f23a5b00f0ae8e19
BLAKE2b-256 9964a43f0149b84aadd0e3acf56ecd9538da8df75b1f9cb21c47cb7185b2158f

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckplus-0.0.7.tar.gz:

Publisher: python-publish.yml on isaacnfairplay/duck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file duckplus-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: duckplus-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 63.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duckplus-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4686b594640c9558e44212f20f2f554d26e5dc82e74dd4887d4f0bb25c21c513
MD5 4dc931a582a3493b53baadbd212501ab
BLAKE2b-256 d744ab41aaceb4c0e3aea315a1f7a58cdbdd57a1edbb20579b56792275520730

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckplus-0.0.7-py3-none-any.whl:

Publisher: python-publish.yml on isaacnfairplay/duck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page