A SQL-based Python dataframe library for ergonomic interactive data analysis and exploration.
Project description
Duckboat
Ugly to some, but gets the job done.
Duckboat is a SQL-based Python dataframe library for ergonomic interactive data analysis and exploration.
pip install duckboat
Duckboat allows you to chain SQL snippets (meaning you can usually omit select * and from ...)
to incrementally and lazily build up complex queries.
Duckboat is a light wrapper around the DuckDB relational API, so expressions are evaluated lazily and optimized by DuckDB prior to execution. The resulting queries are fast, avoiding the need to materialize intermediate tables or perform data transfers. You can leverage all the SQL syntax improvements provided by DuckDB: 1 2 3
Examples
import duckboat as uck
uck.do(
'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv',
"where sex = 'female' ",
'where year > 2008',
'select *, cast(body_mass_g as double) as grams',
'select species, island, avg(grams) as avg_grams group by 1,2',
'select * replace (round(avg_grams, 1) as avg_grams)',
'order by avg_grams',
)
┌───────────┬───────────┬───────────┐
│ species │ island │ avg_grams │
│ varchar │ varchar │ double │
├───────────┼───────────┼───────────┤
│ Adelie │ Torgersen │ 3193.8 │
│ Adelie │ Dream │ 3357.5 │
│ Adelie │ Biscoe │ 3446.9 │
│ Chinstrap │ Dream │ 3522.9 │
│ Gentoo │ Biscoe │ 4786.3 │
└───────────┴───────────┴───────────┘
To and from other data formats
We can translate to and from other data formats like Pandas DataFrames, Polars, or Arrow Tables.
import pandas as pd
df = pd.DataFrame({'a': [0]})
t = uck.Table(df)
t
┌───────┐
│ a │
│ int64 │
├───────┤
│ 0 │
└───────┘
Translate back to a pandas dataframe with any of the following:
t.df()
t.hold('pandas')
t.do('pandas')
Chaining expressions
You can chain calls to Table.do():
f = 'select a + 1 as a'
t.do(f).do(f).do(f)
┌───────┐
│ a │
│ int64 │
├───────┤
│ 3 │
└───────┘
Alternatively, Table.do() accepts a sequence of arguments:
t.do(f, f, f)
It also accepts lists of expressions, and will apply them recursively:
fs = [f, f, f]
t.do(fs)
Note, you could also still call this as:
t.do(*fs)
Use lists to group expressions, which Duckboat will apply recursively:
t.do(f, [f], [f, [[f, f], f]])
┌───────┐
│ a │
│ int64 │
├───────┤
│ 6 │
└───────┘
Duckboat will also apply functions:
def foo(x):
return x.do('select a + 2 as a')
# the following are equivalent
foo(t)
t.do(foo)
Of course, you can mix functions, SQL strings, and lists:
uck.do(df, foo, [f, foo])
┌───────┐
│ a │
│ int64 │
├───────┤
│ 5 │
└───────┘
Databases and joins
Use Database to work with multiple named tables and write full SQL (including joins):
import pandas as pd
orders = pd.DataFrame({'id': [1, 2, 3], 'customer_id': [10, 20, 10], 'amount': [5.0, 12.0, 8.0]})
customers = pd.DataFrame({'id': [10, 20], 'name': ['Alice', 'Bob']})
db = uck.Database(orders=orders, customers=customers)
Unlike Table.sql(), Database.sql() requires explicit table names:
db.sql('''
select c.name, sum(o.amount) as total
from orders o
join customers c on o.customer_id = c.id
group by 1
''')
┌─────────┬────────┐
│ name │ total │
│ varchar │ double │
├─────────┼────────┤
│ Alice │ 13.0 │
│ Bob │ 12.0 │
└─────────┴────────┘
Database.do() works the same way, chaining full SQL expressions:
db.do(
'select c.name, o.amount from orders o join customers c on o.customer_id = c.id',
'order by amount desc',
)
You can also create a Database inline via do() by passing a dict:
uck.do(
{'orders': orders, 'customers': customers},
'select c.name, sum(o.amount) as total from orders o join customers c on o.customer_id = c.id group by 1',
)
Or alias a single table into a database for joins with itself:
t.do('alias my_table')
Extravagant affordances
do() dispatches on the type of its arguments, providing some handy shortcuts.
Extract scalars, lists, and dicts:
t.do('select count(*)', int) # returns a Python int
t.do('select distinct a', list) # returns a Python list
t.do('limit 1', dict) # returns a Python dict
Convert to other formats:
t.do('pandas') # returns a Pandas DataFrame
t.do('arrow') # returns a PyArrow Table
Apply functions:
def add_col(t):
return t.do("select *, a + 10 as b")
t.do(add_col, 'select b', int) # 10
Run .sql files:
t.do('queries/transform.sql') # loads and runs the file contents as SQL
Control display with hide and show:
big = uck.Table('huge_dataset.parquet').do('hide') # suppresses repr output
big.do('show') # re-enables repr
Objects
Table
Table wraps a DuckDB DuckDBPyRelation. Create one from a file path, URL, Pandas DataFrame, Arrow Table, or dict:
t = uck.Table('data.parquet')
t = uck.Table(pd.DataFrame({'x': [1, 2, 3]}))
The core method is .sql(), which auto-prepends from <table> so you can write SQL snippets:
t.sql('select x + 1 as x') # becomes: from <table> select x + 1 as x
.do() chains operations and dispatches on argument type (strings, functions, lists, type conversions). Access the underlying DuckDB relation with t.rel.
Database
Database is a named dictionary of tables:
db = uck.Database(orders=orders_df, customers=customers_df)
Unlike Table, Database.sql() requires explicit table names since there's no single table to auto-prepend. Only the tables in the Database are visible to the query. Access individual tables with db['table_name'].
Eager evaluation and hide/show
Calling repr() on a Table or Database triggers query evaluation. In Jupyter, this happens when an object is the last expression in a cell. In IDEs like Positron, the variable explorer proactively inspects objects, which can trigger expensive computations.
Use hide() to suppress evaluation:
big = uck.Table('huge_dataset.parquet').do('hide')
# Positron's variable explorer will see: <Table(..., _hide=True)>
# instead of evaluating the full query
Call show() (or .do('show')) when you're ready to see results.
Philosophy
Duckboat bets that SQL is already the right language for tabular data manipulation -- you just need a way to compose SQL snippets into pipelines. This results in a mixture of Python and SQL that is semantically similar to Google's Pipe Syntax for SQL.
Strengths:
- Zero new API to learn. If you know SQL, you know duckboat. There are no new method chains, expression builders, or DSLs to memorize.
- Minimal surface area. The library is essentially
Table,Database, and.do(). The codebase is small and stays out of your way. - Snippet composability. SQL fragments chain naturally through
do(), letting you build complex queries incrementally and interactively.
Tradeoffs:
- No IDE autocomplete on column names. Column references live inside SQL strings, so you don't get tab-completion or type checking. Typos surface at runtime, not in your editor.
- Discoverability. The
do()dispatch conventions (int,list,"pandas","hide", etc.) are terse but must be learned -- they can't be discovered through autocomplete.
Where duckboat fits best:
Duckboat is ideal for interactive exploration and notebook workflows, especially for teams already fluent in SQL. If you need strong static analysis, IDE support, or production-grade type safety, a fluent API like Polars or Ibis may be a better fit. If some operation is easier in another library, duckboat makes it straightforward to translate between them via Pandas, Arrow, or Polars.
Feedback
I'd love to hear any feedback on the approach here, so feel free to reach out through Issues or Discussions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file duckboat-0.18.1.tar.gz.
File metadata
- Download URL: duckboat-0.18.1.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36c8ac8b05b74ea4d01e2ea2c3bea200f2f689f6412cf69e09678cab01d3152a
|
|
| MD5 |
02c2917b12d69b55fd1a4ee6885f45d6
|
|
| BLAKE2b-256 |
f90249a0260d21735a259eb12ac2e6e582bee43f4d8d3a13de7c2598f1dc386e
|
Provenance
The following attestation bundles were made for duckboat-0.18.1.tar.gz:
Publisher:
pypi_publish.yml on ajfriend/duckboat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
duckboat-0.18.1.tar.gz -
Subject digest:
36c8ac8b05b74ea4d01e2ea2c3bea200f2f689f6412cf69e09678cab01d3152a - Sigstore transparency entry: 1182177170
- Sigstore integration time:
-
Permalink:
ajfriend/duckboat@f0332577c9b095d8120cc17972685d0b1267b598 -
Branch / Tag:
refs/tags/v0.18.1 - Owner: https://github.com/ajfriend
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_publish.yml@f0332577c9b095d8120cc17972685d0b1267b598 -
Trigger Event:
release
-
Statement type:
File details
Details for the file duckboat-0.18.1-py3-none-any.whl.
File metadata
- Download URL: duckboat-0.18.1-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c17180052287581963e1a6f6d13520f6c457971e5c5a6d5a33dcafd6d4f2333b
|
|
| MD5 |
8452fd4209b4c35fbe53a16777cc288f
|
|
| BLAKE2b-256 |
60878e1764229dc3911570a0575a1948866a3f342daea23162ebeb20067aaabc
|
Provenance
The following attestation bundles were made for duckboat-0.18.1-py3-none-any.whl:
Publisher:
pypi_publish.yml on ajfriend/duckboat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
duckboat-0.18.1-py3-none-any.whl -
Subject digest:
c17180052287581963e1a6f6d13520f6c457971e5c5a6d5a33dcafd6d4f2333b - Sigstore transparency entry: 1182177218
- Sigstore integration time:
-
Permalink:
ajfriend/duckboat@f0332577c9b095d8120cc17972685d0b1267b598 -
Branch / Tag:
refs/tags/v0.18.1 - Owner: https://github.com/ajfriend
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_publish.yml@f0332577c9b095d8120cc17972685d0b1267b598 -
Trigger Event:
release
-
Statement type: