A SQL-based Python dataframe library for ergonomic interactive data analysis and exploration.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ajfriend

These details have not been verified by PyPI

Project description

Duckboat

Ugly to some, but gets the job done.

GitHub | Docs | PyPI

Duckboat is a SQL-based Python dataframe library for ergonomic interactive data analysis and exploration.

pip install duckboat

Duckboat allows you to chain SQL snippets (meaning you can usually omit select * and from ...) to incrementally and lazily build up complex queries.

Duckboat is a light wrapper around the DuckDB relational API, so expressions are evaluated lazily and optimized by DuckDB prior to execution. The resulting queries are fast, avoiding the need to materialize intermediate tables or perform data transfers. You can leverage all the SQL syntax improvements provided by DuckDB: 1 2 3

Examples

import duckboat as uck

csv = 'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv'

uck.Table(csv).do(
    "where sex = 'female' ",
    'where year > 2008',
    'select *, cast(body_mass_g as double) as grams',
    'select species, island, avg(grams) as avg_grams group by 1,2',
    'select * replace (round(avg_grams, 1) as avg_grams)',
    'order by avg_grams',
)

┌───────────┬───────────┬───────────┐
│  species  │  island   │ avg_grams │
│  varchar  │  varchar  │  double   │
├───────────┼───────────┼───────────┤
│ Adelie    │ Torgersen │    3193.8 │
│ Adelie    │ Dream     │    3357.5 │
│ Adelie    │ Biscoe    │    3446.9 │
│ Chinstrap │ Dream     │    3522.9 │
│ Gentoo    │ Biscoe    │    4786.3 │
└───────────┴───────────┴───────────┘

To and from other data formats

We can translate to and from other data formats like Pandas DataFrames, Polars, or Arrow Tables.

import pandas as pd

df = pd.DataFrame({'a': [0]})
t = uck.Table(df)
t

┌───────┐
│   a   │
│ int64 │
├───────┤
│     0 │
└───────┘

Translate back to a pandas dataframe with any of the following:

t.df()
t.hold('pandas')
t.do('pandas')

Chaining expressions

You can chain calls to Table.do():

f = 'select a + 1 as a'
t.do(f).do(f).do(f)

┌───────┐
│   a   │
│ int64 │
├───────┤
│     3 │
└───────┘

Alternatively, Table.do() accepts a sequence of arguments:

t.do(f, f, f)

It also accepts lists of expressions, and will apply them recursively:

fs = [f, f, f]
t.do(fs)

Note, you could also still call this as:

t.do(*fs)

Use lists to group expressions, which Duckboat will apply recursively:

t.do(f, [f], [f, [[f, f], f]])

┌───────┐
│   a   │
│ int64 │
├───────┤
│     6 │
└───────┘

Duckboat will also apply functions:

def foo(x):
    return x.do('select a + 2 as a')

# the following are equivalent
foo(t)
t.do(foo)

Of course, you can mix functions, SQL strings, and lists:

t.do([foo, f])

Databases and joins

TODO

Extravagant affordances

TODO

Objects

(probably in the docstrings, rather than the readme)

Table

The core functionality comes from .sql, where we allow snippets. "Shell" functionality comes from the duckboat.do() method, allowing for things like...

Database

The core functionality comes from the .sql, which loads only the Tables listed. No other Python objects are loaded. A full duckdb sql expression is expected here, with table names provided explicitly.

"Shell" functionality comes from the duckboat.do() function/method, allowing for things like...

eager

The duckboat library makes some efforts to protect against unintentionally evaluating expressions eagerly, rather than letting them rest lazily. For example, calling the __repr__ method on a Table or Database will trigger an evaluation of that object, which could include pulling a large table, or showing a large intermediate result.

in ipython or jupyterlab, we can typically avoid this by not ending a cell with an object. doing so triggers the objects __repr__. However, these tactics don't work when using an IDE like Positron, which eagerly inspects objects in the namespace to provide insight into what you're working with. This is often useful, but not always what you want when working with large datasets or expensive computations.

Philosophy

This approach results in a mixture of Python and SQL that, I think, is semantically very similar to Google's Pipe Syntax for SQL: We can leverage our existing knowledge of SQL, while making a few small changes to make it more ergonomic and composable.

When doing interactive data analysis, I find this approach easier to read and write than fluent APIs (like in Polars or Ibis) or typical Pandas code. If some operation is easier in other libraries, Duckboat makes it straightforward translate between them, either directly or through Apache Arrow.

Feedback

I'd love to hear any feedback on the approach here, so feel free to reach out through Issues or Discussions.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ajfriend

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.21.0

Mar 27, 2026

0.20.0

Mar 26, 2026

0.19.0

Mar 26, 2026

0.18.1

Mar 25, 2026

0.17.0

Mar 1, 2025

This version

0.16.0

Feb 28, 2025

0.15.0

Dec 29, 2024

0.14.0

Dec 27, 2024

0.13.0

Dec 27, 2024

0.12.0

Dec 27, 2024

0.11.0

Dec 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckboat-0.16.0.tar.gz (8.1 kB view details)

Uploaded Feb 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

duckboat-0.16.0-py3-none-any.whl (10.8 kB view details)

Uploaded Feb 28, 2025 Python 3

File details

Details for the file duckboat-0.16.0.tar.gz.

File metadata

Download URL: duckboat-0.16.0.tar.gz
Upload date: Feb 28, 2025
Size: 8.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for duckboat-0.16.0.tar.gz
Algorithm	Hash digest
SHA256	`d7cb7d2dc8877bd4eec3662b2b587ca1c183c01aacda36c8a799f4248c3007b9`
MD5	`0461aab83ac94bada841b3868230b553`
BLAKE2b-256	`e2ea6191acd56a328cde0cdca587220ad874a0ad1eda319c6604c412b83bfec9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckboat-0.16.0.tar.gz:

Publisher: pypi_publish.yml on ajfriend/duckboat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: duckboat-0.16.0.tar.gz
- Subject digest: d7cb7d2dc8877bd4eec3662b2b587ca1c183c01aacda36c8a799f4248c3007b9
- Sigstore transparency entry: 175564502
- Sigstore integration time: Feb 28, 2025
Source repository:
- Permalink: ajfriend/duckboat@777f07847d4f567babb86137fda8349e68cfae7d
- Branch / Tag: refs/tags/v0.16.0
- Owner: https://github.com/ajfriend
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi_publish.yml@777f07847d4f567babb86137fda8349e68cfae7d
- Trigger Event: release

File details

Details for the file duckboat-0.16.0-py3-none-any.whl.

File metadata

Download URL: duckboat-0.16.0-py3-none-any.whl
Upload date: Feb 28, 2025
Size: 10.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for duckboat-0.16.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7673b5156d04d109425c9624d1c29efa88096bbaf52fadb152e7a917644bbbe6`
MD5	`89d34b3f9ecd15af151f4b90143f8a9a`
BLAKE2b-256	`c1f94cb62eb4fba664fc1a2ed5b2636d162ab0575936c83e52976ab0b05c7e18`

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckboat-0.16.0-py3-none-any.whl:

Publisher: pypi_publish.yml on ajfriend/duckboat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: duckboat-0.16.0-py3-none-any.whl
- Subject digest: 7673b5156d04d109425c9624d1c29efa88096bbaf52fadb152e7a917644bbbe6
- Sigstore transparency entry: 175564503
- Sigstore integration time: Feb 28, 2025
Source repository:
- Permalink: ajfriend/duckboat@777f07847d4f567babb86137fda8349e68cfae7d
- Branch / Tag: refs/tags/v0.16.0
- Owner: https://github.com/ajfriend
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi_publish.yml@777f07847d4f567babb86137fda8349e68cfae7d
- Trigger Event: release

duckboat 0.16.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

Duckboat

Examples

To and from other data formats

Chaining expressions

Databases and joins

Extravagant affordances

Objects

Table

Database

eager

Philosophy

Feedback

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance