A SQL-based Python dataframe library for ergonomic interactive data analysis and exploration.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ajfriend

These details have not been verified by PyPI

Project links

documentation

Project description

Duckboat

Ugly to some, but gets the job done.

GitHub | Docs | PyPI

Duckboat is a SQL-based Python dataframe library for ergonomic interactive data analysis and exploration.

pip install duckboat

Duckboat allows you to chain SQL snippets (meaning you can usually omit select * and from ...) to incrementally and lazily build up complex queries.

Duckboat is a light wrapper around the DuckDB relational API, so expressions are evaluated lazily and optimized by DuckDB prior to execution. The resulting queries are fast, avoiding the need to materialize intermediate tables or perform data transfers. You can leverage all the SQL syntax improvements provided by DuckDB: 1 2 3

Examples

import duckboat as uck

uck.do(
    'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv',
    "where sex = 'female' ",
    'where year > 2008',
    'select *, cast(body_mass_g as double) as grams',
    'select species, island, avg(grams) as avg_grams group by 1,2',
    'select * replace (round(avg_grams, 1) as avg_grams)',
    'order by avg_grams',
)

┌───────────┬───────────┬───────────┐
│  species  │  island   │ avg_grams │
│  varchar  │  varchar  │  double   │
├───────────┼───────────┼───────────┤
│ Adelie    │ Torgersen │    3193.8 │
│ Adelie    │ Dream     │    3357.5 │
│ Adelie    │ Biscoe    │    3446.9 │
│ Chinstrap │ Dream     │    3522.9 │
│ Gentoo    │ Biscoe    │    4786.3 │
└───────────┴───────────┴───────────┘

To and from other data formats

We can translate to and from other data formats like Pandas DataFrames, Polars, or Arrow Tables.

import pandas as pd

df = pd.DataFrame({'a': [0]})
t = uck.do(df)
t

┌───────┐
│   a   │
│ int64 │
├───────┤
│     0 │
└───────┘

Translate back to a pandas dataframe:

t.do('pandas')

You can mix duckboat with pandas or polars mid-workflow. Do the heavy lifting in SQL, pop into pandas for fiddly column operations, then come back:

df = t.do('where body_mass_g between 3500 and 4000', 'pandas')
df = df.rename(columns=str.upper)
result = uck.do(df, 'select SPECIES, count(*) as n group by 1')

Chaining expressions

You can chain calls to Table.do():

f = 'select a + 1 as a'
t.do(f).do(f).do(f)

┌───────┐
│   a   │
│ int64 │
├───────┤
│     3 │
└───────┘

Alternatively, Table.do() accepts a sequence of arguments:

t.do(f, f, f)

It also accepts lists of expressions, and will apply them recursively:

fs = [f, f, f]
t.do(fs)

Note, you could also still call this as:

t.do(*fs)

Use lists to group expressions, which Duckboat will apply recursively:

t.do(f, [f], [f, [[f, f], f]])

┌───────┐
│   a   │
│ int64 │
├───────┤
│     6 │
└───────┘

Duckboat will also apply functions:

def foo(x):
    return x.do('select a + 2 as a')

# the following are equivalent
foo(t)
t.do(foo)

Of course, you can mix functions, SQL strings, and lists:

uck.do(df, foo, [f, foo])

┌───────┐
│   a   │
│ int64 │
├───────┤
│     5 │
└───────┘

Joins

Pass a dict to register named tables, then write SQL that references them:

orders = pd.DataFrame({'id': [1, 2, 3], 'customer_id': [10, 20, 10], 'amount': [5.0, 12.0, 8.0]})
customers = pd.DataFrame({'id': [10, 20], 'name': ['Alice', 'Bob']})

uck.do(
    {'orders': orders, 'customers': customers},
    '''
    select c.name, sum(o.amount) as total
    from orders o
    join customers c on o.customer_id = c.id
    group by 1
    ''',
)

┌─────────┬────────┐
│  name   │ total  │
│ varchar │ double │
├─────────┼────────┤
│ Alice   │   13.0 │
│ Bob     │   12.0 │
└─────────┴────────┘

You can also join mid-chain. The current table is always available as _:

t1.do(
    'where total_amount > 0',
    {'zones': zones_df},
    'join zones on zid = zones.id',
    'select zone_name, avg(total_amount) group by 1',
)

Since from _ is always prepended, you can also self-join by aliasing both sides directly:

t.do('as a join _ as b using (hexid)')

Or use uck.rename() to give the current table a name and write full SQL:

t.do(
    uck.rename('trips'),
    'from trips as a join trips as b using (hexid)',
)

Dispatch rules

do() dispatches on the type of each argument.

SQL:

t.do('where x > 5')                # SQL snippet (from _ is prepended)
t.do('queries/transform.sql')      # .sql file path (loaded and executed)

Composition:

t.do(my_func)                      # callable — receives Table, returns Table
t.do([step1, step2, step3])        # list — applied recursively as a pipeline
t.do({'zones': zones_df})          # dict — registers named tables for next step
t.do(uck.rename('trips'))          # rename — gives _ a name, removes auto-wrap

Output:

t.do('select count(*)', int)       # Python int
t.do('select distinct a', list)    # Python list
t.do('limit 1', dict)              # Python dict
t.do('pandas')                     # Pandas DataFrame
t.do('arrow')                      # PyArrow Table

Display:

t.do('hide')                       # suppress repr (useful for large lazy tables)
t.do('show')                       # re-enable repr

Objects

Table

Table wraps a DuckDB DuckDBPyRelation. The easiest way to create one is through do():

t = uck.do('data.parquet')
t = uck.do(pd.DataFrame({'x': [1, 2, 3]}))

You can also use the Table constructor directly:

t = uck.Table('data.parquet')

.do() chains operations and dispatches on argument type (strings, functions, lists, dicts, type conversions). Access the underlying DuckDB relation with t.rel.

Eager evaluation and `hide`/`show`

Calling repr() on a Table triggers query evaluation. In Jupyter, this happens when an object is the last expression in a cell. In IDEs like Positron, the variable explorer proactively inspects objects, which can trigger expensive computations.

Use hide() to suppress evaluation:

big = uck.Table('huge_dataset.parquet').do('hide')
# Positron's variable explorer will see: <Table(..., _hide=True)>
# instead of evaluating the full query

Call show() (or .do('show')) when you're ready to see results.

Philosophy

Duckboat bets that SQL is already the right language for tabular data manipulation -- you just need a way to compose SQL snippets into pipelines. This results in a mixture of Python and SQL that is semantically similar to Google's Pipe Syntax for SQL.

Strengths:

Zero new API to learn. If you know SQL, you know duckboat. There are no new method chains, expression builders, or DSLs to memorize.
Minimal surface area. The library is essentially Table and .do(). The codebase is small and stays out of your way.
Snippet composability. SQL fragments chain naturally through do(), letting you build complex queries incrementally and interactively.

Tradeoffs:

No IDE autocomplete on column names. Column references live inside SQL strings, so you don't get tab-completion or type checking. Typos surface at runtime, not in your editor.
Discoverability. The do() dispatch conventions (int, list, "pandas", "hide", etc.) are terse but must be learned -- they can't be discovered through autocomplete.

Where duckboat fits best:

Duckboat is ideal for interactive exploration and notebook workflows, especially for teams already fluent in SQL. If you need strong static analysis, IDE support, or production-grade type safety, a fluent API like Polars or Ibis may be a better fit. If some operation is easier in another library, duckboat makes it straightforward to translate between them via Pandas, Arrow, or Polars.

Feedback

I'd love to hear any feedback on the approach here, so feel free to reach out through Issues or Discussions.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ajfriend

These details have not been verified by PyPI

Project links

documentation

Release history Release notifications | RSS feed

0.21.0

Mar 27, 2026

0.20.0

Mar 26, 2026

This version

0.19.0

Mar 26, 2026

0.18.1

Mar 25, 2026

0.17.0

Mar 1, 2025

0.16.0

Feb 28, 2025

0.15.0

Dec 29, 2024

0.14.0

Dec 27, 2024

0.13.0

Dec 27, 2024

0.12.0

Dec 27, 2024

0.11.0

Dec 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckboat-0.19.0.tar.gz (8.7 kB view details)

Uploaded Mar 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

duckboat-0.19.0-py3-none-any.whl (10.6 kB view details)

Uploaded Mar 26, 2026 Python 3

File details

Details for the file duckboat-0.19.0.tar.gz.

File metadata

Download URL: duckboat-0.19.0.tar.gz
Upload date: Mar 26, 2026
Size: 8.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duckboat-0.19.0.tar.gz
Algorithm	Hash digest
SHA256	`1db9a3a9f765dffc2717cb4053efeec527610bb1c7d965b7c565844ee7bbe76e`
MD5	`a088161bac5b88683f29f7503026f517`
BLAKE2b-256	`c08a8fe093c9872caa56fff29accf1eec0ce7f975ce7951a9f420cbd3f405be6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckboat-0.19.0.tar.gz:

Publisher: pypi_publish.yml on ajfriend/duckboat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: duckboat-0.19.0.tar.gz
- Subject digest: 1db9a3a9f765dffc2717cb4053efeec527610bb1c7d965b7c565844ee7bbe76e
- Sigstore transparency entry: 1185081044
- Sigstore integration time: Mar 26, 2026
Source repository:
- Permalink: ajfriend/duckboat@fc24d21141702cfc4abf2a7eb26b819415d8afcd
- Branch / Tag: refs/tags/v0.19.0
- Owner: https://github.com/ajfriend
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi_publish.yml@fc24d21141702cfc4abf2a7eb26b819415d8afcd
- Trigger Event: release

File details

Details for the file duckboat-0.19.0-py3-none-any.whl.

File metadata

Download URL: duckboat-0.19.0-py3-none-any.whl
Upload date: Mar 26, 2026
Size: 10.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duckboat-0.19.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9ee9c0d9daafcf7a985472a0f9e9aea038cb243403f05abc7841a98e10f1140a`
MD5	`960327f86dc676038e475f66ace19416`
BLAKE2b-256	`7c2d9b0d8c246cd11019fc4554db5cc38ee8e438c749309910be82901a2187b3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckboat-0.19.0-py3-none-any.whl:

Publisher: pypi_publish.yml on ajfriend/duckboat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: duckboat-0.19.0-py3-none-any.whl
- Subject digest: 9ee9c0d9daafcf7a985472a0f9e9aea038cb243403f05abc7841a98e10f1140a
- Sigstore transparency entry: 1185081047
- Sigstore integration time: Mar 26, 2026
Source repository:
- Permalink: ajfriend/duckboat@fc24d21141702cfc4abf2a7eb26b819415d8afcd
- Branch / Tag: refs/tags/v0.19.0
- Owner: https://github.com/ajfriend
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi_publish.yml@fc24d21141702cfc4abf2a7eb26b819415d8afcd
- Trigger Event: release

duckboat 0.19.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Project description

Duckboat

Examples

To and from other data formats

Chaining expressions

Joins

Dispatch rules

Objects

Table

Eager evaluation and hide/show

Philosophy

Feedback

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Eager evaluation and `hide`/`show`