A SQL-based Python dataframe library for ergonomic interactive data analysis and exploration.
Project description
Duckboat
Unsightly to some, but gets the job done.
Duckboat is a SQL-based Python dataframe library for ergonomic interactive data analysis and exploration.
pip install git+https://github.com/ajfriend/duckboat
Duckboat allows you to chain SQL snippets (often omitting select * and from ...)
to incrementally and lazily build up complex queries.
Duckboat is a light wrapper around the DuckDB relational API, which is easily accessible if you'd like to use DuckDB more directly. Expressions are evaluated lazily and optimized by DuckDB, so queries are fast, avoiding materializing intermediate tables and data transfers.
import duckboat as uck
csv = 'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv'
uck.Table(csv).do(
"where sex = 'female' ",
'where year > 2008',
'select *, cast(body_mass_g as double) as grams',
'select species, island, avg(grams) as avg_grams group by 1,2',
'select * replace (round(avg_grams, 1) as avg_grams)',
'order by avg_grams',
)
┌───────────┬───────────┬───────────┐
│ species │ island │ avg_grams │
│ varchar │ varchar │ double │
├───────────┼───────────┼───────────┤
│ Adelie │ Torgersen │ 3193.8 │
│ Adelie │ Dream │ 3357.5 │
│ Adelie │ Biscoe │ 3446.9 │
│ Chinstrap │ Dream │ 3522.9 │
│ Gentoo │ Biscoe │ 4786.3 │
└───────────┴───────────┴───────────┘
Philosophy
This approach results in a mixture of Python and SQL that, I think, is semantically very similar to Google's Pipe Syntax for SQL: We can leverage our existing knowledge of SQL, while making a few small changes to make it more ergonomic and composable.
When doing interactive data analysis, I find this approach easier to read and write than fluent APIs (like in Polars or Ibis) or typical Pandas code. If some operation is easier in other libraries, Duckboat makes it straightforward translate between them, either directly or through Apache Arrow.
Feedback
I'd love to hear any feedback on the approach here, so feel free to reach out through Issues or Discussions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file duckboat-0.12.0.tar.gz.
File metadata
- Download URL: duckboat-0.12.0.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
626302d3e65cfbe18ca976aaa215c647ef0ddbb459e4eb2a5f3f6e04e373e46a
|
|
| MD5 |
18a7f6f4aa5be05bb22de733b1f97cdc
|
|
| BLAKE2b-256 |
bb622bba8f2207679d5e223ccd27776d066acd9a8cd708e2a2d64ded174fd799
|
Provenance
The following attestation bundles were made for duckboat-0.12.0.tar.gz:
Publisher:
pypi_publish.yml on ajfriend/duckboat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
duckboat-0.12.0.tar.gz -
Subject digest:
626302d3e65cfbe18ca976aaa215c647ef0ddbb459e4eb2a5f3f6e04e373e46a - Sigstore transparency entry: 157824483
- Sigstore integration time:
-
Permalink:
ajfriend/duckboat@592701def3e14b92a5624bb3f61e6b19a9bc6b62 -
Branch / Tag:
refs/tags/v0.12.0 - Owner: https://github.com/ajfriend
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_publish.yml@592701def3e14b92a5624bb3f61e6b19a9bc6b62 -
Trigger Event:
release
-
Statement type:
File details
Details for the file duckboat-0.12.0-py3-none-any.whl.
File metadata
- Download URL: duckboat-0.12.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6e3b396dafd4b9115d716f132762cfd303abd2f035c81bca150bcd69e727e42
|
|
| MD5 |
1c20222f519efe2d634a33df0fee4df7
|
|
| BLAKE2b-256 |
728c2c3cb96e350efdccc0b0b2e3e2b25285e045c6d8e562da7c4d8ac8466492
|
Provenance
The following attestation bundles were made for duckboat-0.12.0-py3-none-any.whl:
Publisher:
pypi_publish.yml on ajfriend/duckboat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
duckboat-0.12.0-py3-none-any.whl -
Subject digest:
a6e3b396dafd4b9115d716f132762cfd303abd2f035c81bca150bcd69e727e42 - Sigstore transparency entry: 157824484
- Sigstore integration time:
-
Permalink:
ajfriend/duckboat@592701def3e14b92a5624bb3f61e6b19a9bc6b62 -
Branch / Tag:
refs/tags/v0.12.0 - Owner: https://github.com/ajfriend
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_publish.yml@592701def3e14b92a5624bb3f61e6b19a9bc6b62 -
Trigger Event:
release
-
Statement type: