dplyr for Python: tidy piped verbs over polars and duckdb, with real autocompletion and dplyr-verified semantics.

These details have not been verified by PyPI

Project links

Repository

Project description

dpyr

dplyr for Python. The tidyverse's verbs — filter, mutate, group_by, summarize, joins, across, tidyselect — as Python method chains, executing on polars or duckdb, with real IDE autocompletion and semantics verified against dplyr itself.

pip install dpyr        # or: uv add dpyr

from dpyr import read, col, n, desc

starwars = read("starwars.parquet")   # read() takes anything tabular:
                                      # .parquet/.csv/.arrow/.db paths, dicts,
                                      # polars/pandas frames, arrow tables,
                                      # Hugging Face datasets, numpy/torch/jax

(
    starwars
    .filter(col.height > 180, col.mass < 100)
    .mutate(bmi = col.mass / (col.height / 100) ** 2)
    .group_by(col.species)
    .summarize(
        n = n(),
        mean_bmi = col.bmi.mean(),
    )
    .arrange(desc(col.mean_bmi))
)

Evaluate that in a notebook and you see rows immediately. Typo a column name and you get the error on that line, with a did-you-mean suggestion. Wrap the same code in a pipeline and only .collect() at the end, and the whole chain runs as one fused query with predicate pushdown. That combination — schema-eager, data-lazy, display-eager — is the core design.

Two backends, one semantics

import duckdb
from dpyr import read

df  = read({"x": [1, 2, 3], "g": ["a", "a", "b"]})   # polars engine
con = duckdb.connect("warehouse.db")
tbl = read(con, "events")                            # SQL pushdown

Identical chains produce identical results on both engines — enforced by a Hypothesis fuzzer that runs random verb chains on both and compares bit-for-bit, and by differential tests against real dplyr: every spec in tests/specs/ is executed by dplyr (via oracle/run_specs.R) to produce a committed golden parquet, then replayed through dpyr on both backends. Where R and the engines genuinely disagree, the decision is documented in docs/SEMANTICS.md, not left to chance.

The dplyr you know

dplyr	dpyr
`filter(df, height > 180)`	`df.filter(col.height > 180)`
`mutate(df, bmi = mass / h^2)`	`df.mutate(bmi = col.mass / col.h ** 2)`
`summarise(df, n = n(), m = mean(x, na.rm = TRUE))`	`df.summarize(n = n(), m = col.x.mean())`
`arrange(df, desc(mass))`	`df.arrange(desc(col.mass))`
`select(df, name, starts_with("h"))`	`df.select(col.name, starts_with("h"))`
`select(df, -mass)`	`df.select(-col.mass)`
`across(where(is.numeric), mean)`	`across(where(is_numeric), "mean")`
`left_join(a, b, by = "k")`	`a.left_join(b, on = col.k)`
`pivot_longer(df, x:y)`	`df.pivot_longer([col.x, col.y])`
`if_else()`, `case_when()`, `n_distinct()`	`if_else()`, `case_when()`, `.n_unique()`
`lag()`, `lead()`, `row_number()`, `min_rank()`	`lag()`, `lead()`, `row_number()`, `min_rank()`
`cumsum()`, `dense_rank()`, `percent_rank()`	`cum_sum()`, `dense_rank()`, `percent_rank()`
`slice_min(x, n)`, `slice_max(x, n)` (ties kept)	`slice_min(col.x, n)`, `slice_max(col.x, n)`
`separate()`, `unite()`, `relocate()`	`separate()`, `unite()`, `relocate()`
`coalesce()`, `replace_na()`	`coalesce()`, `replace_na()`

Grouped mutate/filter are windowed per group, summarize peels one grouping level, joins use .x/.y suffixes and match NAs by default — the dplyr behaviors, deliberately.

Autocompletion that actually works

df.c.height — frame-bound proxy: column names complete from the live schema, and the returned expression is typed (.mean() on numerics, .str_detect() on strings; calling .mean() on a string column raises immediately, at build time).
df.filter(lambda c: c.height > 180) — lambda style for the same effect.
dpyr stubgen data/*.parquet -o schemas.py — generates typed schema modules so completion and type-checking work statically in any IDE.

The database is a destination, not just a source

db = read("warehouse.db")                 # catalog object: db.tables, db.orders
gold = db.orders.group_by(col.region).summarize(rev = col.amount.sum())
gold.to_table("gold_revenue")             # CREATE TABLE AS <sql>, fully in-engine
gold.to_view("gold_live")                 # the lazy plan as a named view
gold.write("gold.parquet")                # in-engine COPY (extension dispatch)
mem = read({"region": ["east"], "target": [1000.0]})
gold.inner_join(mem, on = col.region)     # in-memory frames bridge into duckdb
                                          # automatically (arrow, zero-copy)

Interactive by default, lazy when you need it

df.persist()           # checkpoint: materialize now (duckdb: temp table)
df.lazy()              # this frame never executes implicitly
dpyr.options.interactive = False   # global opt-out for production pipelines

Results are cached by plan hash, so re-displaying a frame in a notebook never recomputes it.

Documentation

Full guides at maximerivest.github.io/dpyr — get started, grouped data, joins, window functions, column-wise operations, reshaping, expressions & autocompletion, and the backends guide (connecting and operating polars and duckdb).

Project documents

Doc	What it pins down
docs/DESIGN.md	API design, the materialization model, autocompletion strategy, architecture
docs/SEMANTICS.md	Every deliberate decision where R, polars and duckdb disagree
docs/TESTING.md	dplyr-as-oracle goldens, backend-agreement fuzzing, Hypothesis properties
docs/ROADMAP.md	What shipped in 1.0 and what's next

License

MIT © Maxime Rivest

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

1.8.1

Jun 10, 2026

1.8.0

Jun 10, 2026

1.7.1

Jun 10, 2026

This version

1.7.0

Jun 10, 2026

1.6.0

Jun 10, 2026

1.5.0

Jun 10, 2026

1.4.0

Jun 10, 2026

1.3.0

Jun 10, 2026

1.2.0

Jun 10, 2026

1.1.0

Jun 10, 2026

1.0.0

Jun 10, 2026

0.0.1

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpyr-1.7.0.tar.gz (134.1 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dpyr-1.7.0-py3-none-any.whl (54.5 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file dpyr-1.7.0.tar.gz.

File metadata

Download URL: dpyr-1.7.0.tar.gz
Upload date: Jun 10, 2026
Size: 134.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dpyr-1.7.0.tar.gz
Algorithm	Hash digest
SHA256	`06bde744eb586d8e5bcb4733c1dc007ee3b51a66443275e6a27ae90ec9d3f603`
MD5	`31b85703cb9cb95a5e0532c65c9d0122`
BLAKE2b-256	`4c48994dcfc55bfe66ad5aefe0bd05f98c4e11fe2ea051c5730d3131a8cae827`

See more details on using hashes here.

File details

Details for the file dpyr-1.7.0-py3-none-any.whl.

File metadata

Download URL: dpyr-1.7.0-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 54.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dpyr-1.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`10f2c75098384e3ba061d48d0a818019eb2860e7bbd75c3493092b4f65980b54`
MD5	`ec4e508e5971ab5656f7c59c0900c441`
BLAKE2b-256	`9843f98838304291691e81de6f1d38f52e60e5b3d2de4fbacc6589bfbc8185b1`

See more details on using hashes here.

dpyr 1.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dpyr

Two backends, one semantics

The dplyr you know

Autocompletion that actually works

The database is a destination, not just a source

Interactive by default, lazy when you need it

Documentation

Project documents

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes