A framework for data piping in python
Project description
pipda
A framework for data piping in Python.
Inspired by siuba, dfply, plydata, and dplython. Provides simple yet powerful APIs to mimic dplyr and tidyr in Python.
API | Changelog | Documentation
Installation
pip install -U pipda
Usage
Verbs
- A verb is pipeable (able to be called like
data >> verb(...)) - A verb is dispatchable by the type of its first argument
- A verb evaluates other arguments using the first one
- A verb is passing down the context if not specified in the arguments
import pandas as pd
from pipda import (
register_verb,
register_func,
register_operator,
evaluate_expr,
Operator,
Symbolic,
Context
)
f = Symbolic()
df = pd.DataFrame({
'x': [0, 1, 2, 3],
'y': ['zero', 'one', 'two', 'three']
})
df
# x y
# 0 0 zero
# 1 1 one
# 2 2 two
# 3 3 three
@register_verb(pd.DataFrame)
def head(data, n=5):
return data.head(n)
df >> head(2)
# x y
# 0 0 zero
# 1 1 one
@register_verb(pd.DataFrame, context=Context.EVAL)
def mutate(data, **kwargs):
data = data.copy()
for key, val in kwargs.items():
data[key] = val
return data
df >> mutate(z=1)
# x y z
# 0 0 zero 1
# 1 1 one 1
# 2 2 two 1
# 3 3 three 1
df >> mutate(z=f.x)
# x y z
# 0 0 zero 0
# 1 1 one 1
# 2 2 two 2
# 3 3 three 3
Functions used as verb arguments
# verb can be used as an argument passed to another verb
# dependent=True makes the `data` argument invisible while calling
@register_verb(pd.DataFrame, context=Context.EVAL, dependent=True)
def if_else(data, cond, true, false):
cond.loc[cond.isin([True]), ] = true
cond.loc[cond.isin([False]), ] = false
return cond
# The function is then also a singledispatch generic function
df >> mutate(z=if_else(f.x>1, 20, 10))
# x y z
# 0 0 zero 10
# 1 1 one 10
# 2 2 two 20
# 3 3 three 20
# function without data argument
@register_func
def length(strings):
return [len(s) for s in strings]
df >> mutate(z=length(f.y))
# x y z
# 0 0 zero 4
# 1 1 one 3
# 2 2 two 3
# 3 3 three 5
Context
The context defines how a reference (f.A, f['A'], f.A.B) is evaluated
@register_verb(pd.DataFrame, context=Context.SELECT)
def select(df, *columns):
return df[list(columns)]
df >> select(f.x, f.y)
# x y
# 0 0 zero
# 1 1 one
# 2 2 two
# 3 3 three
How it works
data %>% verb(arg1, ..., key1=kwarg1, ...)
The above is a typical dplyr/tidyr data piping syntax.
The Python counterpart is:
data >> verb(arg1, ..., key1=kwarg1, ...)
To implement this, execution of the verb must be deferred by turning it into a VerbCall object that holds the function and its arguments. The VerbCall is not evaluated until data is piped in via >>. This detection is made possible by the executing package, which inspects the AST to determine whether a function call appears on the right-hand side of a pipe operator.
Arguments that reference columns of the data must also be deferred. For example, in dplyr (R):
data %>% mutate(z = a)
This adds a column z with values from column a. In Python, the equivalent is:
data >> mutate(z=f.a)
Here f.a is a Reference object that captures the column name without immediately fetching the data.
The Symbolic object f acts as a proxy, chaining attribute/item accesses and operator expressions into a single Expression tree. That tree is later evaluated when data and context become available.
Documentation
https://pwwang.github.io/pipda/
See datar for real-world usage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pipda-0.14.0.tar.gz.
File metadata
- Download URL: pipda-0.14.0.tar.gz
- Upload date:
- Size: 150.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03d2f4e9e9e24ed3976342205d6f64c014918fc26acc0b5d9f496a5192e393eb
|
|
| MD5 |
af7d120a2d4f27efd07764ce6483a105
|
|
| BLAKE2b-256 |
789ab3a6deb309a73ee978a02f260284dbdce7a94b76a528278dac5cc1f2d4a9
|
File details
Details for the file pipda-0.14.0-py3-none-any.whl.
File metadata
- Download URL: pipda-0.14.0-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf16b7d67fc52d5cb1be16688fee41886ffc8044df73a4a4a8fb4b7709c9f3d1
|
|
| MD5 |
f083211b9867aa7756a3b61c1dd9381a
|
|
| BLAKE2b-256 |
19405e34e34d38f1a3f5ba013fdae3789fc2fd3985ac1cb87da6153bb66235d2
|