A framework for data piping in python
Project description
pipda
A framework for data piping in python
Inspired by siuba, dfply, plydata and dplython, but with simple yet powerful APIs to mimic the dplyr
and tidyr
packages in python
API | Change Log | Documentation
Installation
pip install -U pipda
Usage
Verbs
- A verb is pipeable (able to be called like
data >> verb(...)
) - A verb is dispatchable by the type of its first argument
- A verb evaluates other arguments using the first one
- A verb is passing down the context if not specified in the arguments
import pandas as pd
from pipda import (
register_verb,
register_func,
register_operator,
evaluate_expr,
Operator,
Symbolic,
Context
)
f = Symbolic()
df = pd.DataFrame({
'x': [0, 1, 2, 3],
'y': ['zero', 'one', 'two', 'three']
})
df
# x y
# 0 0 zero
# 1 1 one
# 2 2 two
# 3 3 three
@register_verb(pd.DataFrame)
def head(data, n=5):
return data.head(n)
df >> head(2)
# x y
# 0 0 zero
# 1 1 one
@register_verb(pd.DataFrame, context=Context.EVAL)
def mutate(data, **kwargs):
data = data.copy()
for key, val in kwargs.items():
data[key] = val
return data
df >> mutate(z=1)
# x y z
# 0 0 zero 1
# 1 1 one 1
# 2 2 two 1
# 3 3 three 1
df >> mutate(z=f.x)
# x y z
# 0 0 zero 0
# 1 1 one 1
# 2 2 two 2
# 3 3 three 3
Functions used as verb arguments
# verb can be used as an argument passed to another verb
# dep=True make `data` argument invisible while calling
@register_verb(pd.DataFrame, context=Context.EVAL, dep=True)
def if_else(data, cond, true, false):
cond.loc[cond.isin([True]), ] = true
cond.loc[cond.isin([False]), ] = false
return cond
# The function is then also a singledispatch generic function
df >> mutate(z=if_else(f.x>1, 20, 10))
# x y z
# 0 0 zero 10
# 1 1 one 10
# 2 2 two 20
# 3 3 three 20
# function without data argument
@register_func
def length(strings):
return [len(s) for s in strings]
df >> mutate(z=length(f.y))
# x y z
# 0 0 zero 4
# 1 1 one 3
# 2 2 two 3
# 3 3 three 5
Context
The context defines how a reference (f.A
, f['A']
, f.A.B
is evaluated)
@register_verb(pd.DataFrame, context=Context.SELECT)
def select(df, *columns):
return df[list(columns)]
df >> select(f.x, f.y)
# x y
# 0 0 zero
# 1 1 one
# 2 2 two
# 3 3 three
How it works
data %>% verb(arg1, ..., key1=kwarg1, ...)
The above is a typical dplyr
/tidyr
data piping syntax.
The counterpart python syntax we expect is:
data >> verb(arg1, ..., key1=kwarg1, ...)
To implement that, we need to defer the execution of the verb
by turning it into a Verb
object, which holds all information of the function to be executed later. The Verb
object won't be executed until the data
is piped in. It all thanks to the executing
package to let us determine the ast nodes where the function is called. So that we are able to determine whether the function is called in a piping mode.
If an argument is referring to a column of the data and the column will be involved in the later computation, the it also needs to be deferred. For example, with dplyr
in R
:
data %>% mutate(z=a)
is trying add a column named z
with the data from column a
.
In python, we want to do the same with:
data >> mutate(z=f.a)
where f.a
is a Reference
object that carries the column information without fetching the data while python sees it immmediately.
Here the trick is f
. Like other packages, we introduced the Symbolic
object, which will connect the parts in the argument and make the whole argument an Expression
object. This object is holding the execution information, which we could use later when the piping is detected.
Documentation
https://pwwang.github.io/pipda/
See also datar for real-case usages.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pipda-0.11.1.tar.gz
.
File metadata
- Download URL: pipda-0.11.1.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.11.1 Linux/5.15.0-1030-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b261b864a3b350fea02bc6ec149195ed987efda7c445defaaad0e276912e312e |
|
MD5 | 16366e1c4d8e84ccb718b3274d44f3a5 |
|
BLAKE2b-256 | 6c91828e06fd793a0d3bb8db08a6db4af02af476606ecd179762252eb2a7a129 |
File details
Details for the file pipda-0.11.1-py3-none-any.whl
.
File metadata
- Download URL: pipda-0.11.1-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.11.1 Linux/5.15.0-1030-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3be56aff2f317fb6b1afb8f8e4678daea7b5a28f6652c531b34fb39d0155628 |
|
MD5 | fa9a78c8ddd1c31694b70695c2b93a4e |
|
BLAKE2b-256 | 284fceb91f08b86b165ff4fcc1cf8c0c3cfddd306e5474ec8ae1cfc9bd55675e |