A framework for data piping in python

These details have not been verified by PyPI

Project description

pipda

Docs building

A framework for data piping in python

Inspired by siuba, dfply, plydata and dplython, but with simple yet powerful APIs to mimic the dplyr and tidyr packages in python

API | Change Log | Playground

Installation

pip install -U pipda

Usage

Checkout datar for more detailed usages.

Verbs

Verbs are functions next to the piping sign (>>) receiving the data directly.

import pandas as pd
from pipda import (
    register_verb,
    register_func,
    register_operator,
    evaluate_expr,
    Operator,
    Symbolic,
    Context
)

f = Symbolic()

df = pd.DataFrame({
    'x': [0, 1, 2, 3],
    'y': ['zero', 'one', 'two', 'three']
})

df

#      x    y
# 0    0    zero
# 1    1    one
# 2    2    two
# 3    3    three

@register_verb(pd.DataFrame)
def head(data, n=5):
    return data.head(n)

df >> head(2)
#      x    y
# 0    0    zero
# 1    1    one

@register_verb(pd.DataFrame, context=Context.EVAL)
def mutate(data, **kwargs):
    data = data.copy()
    for key, val in kwargs.items():
        data[key] = val
    return data

df >> mutate(z=1)
#    x      y  z
# 0  0   zero  1
# 1  1    one  1
# 2  2    two  1
# 3  3  three  1

df >> mutate(z=f.x)
#    x      y  z
# 0  0   zero  0
# 1  1    one  1
# 2  2    two  2
# 3  3  three  3

# Verbs that don't compile f.a to data, but just the column name
@register_verb(pd.DataFrame, context=Context.SELECT)
def select(data, *columns):
    return data.loc[:, columns]

# f.x won't be compiled as df.x but just 'x'
df >> mutate(z=2*f.x) >> select(f.x, f.z)
#      x    z
# 0    0    0
# 1    1    2
# 2    2    4
# 3    3    6

# Compile the args inside the verb
@register_verb(pd.DataFrame, context=Context.PENDING)
def mutate_existing(data, column, value):
    column = evaluate_expr(column, data, Context.SELECT)
    value = evaluate_expr(value, data, Context.EVAL)
    data = data.copy()
    data[column] = value
    return data

# First f.x compiled as column name, and second as Series data
df2 = df >> mutate_existing(f.x, 10 * f.x)
df2
#      x    y     z
# 0    0    zero  0
# 1    10   one   2
# 2    20   two   4
# 3    30   three 6

# Evaluate the arguments by yourself
@register_verb(pd.DataFrame, context=Context.PENDING)
def mutate_existing2(data, column, value):
    column = evaluate_expr(column, data, Context.SELECT)
    value = evaluate_expr(value, df2, Context.EVAL)
    data = data.copy()
    data[column] = value
    return data

df >> mutate_existing2(f.x, 2 * f.x)
#      x    y
# 0    0    zero
# 1    20   one
# 2    40   two
# 3    60   three

# register for multiple types
@register_verb(int)
def add(data, other):
    return data + other

# add is actually a singledispatch generic function
@add.register(float)
def _(data, other):
    return data * other

1 >> add(1)
# 2
1.1 >> add(1.0)
# 1.1

# As it's a singledispatch generic function, we can do it for multiple types
# with the same logic
@register_verb(context=Context.EVAL)
def mul(data, other):
    raise NotImplementedError # not invalid until types registered

@mul.register(int)
@mul.register(float)
# or you could do @mul.register((int, float))
# context is also supported
def _(data, other):
    return data * other

3 >> mul(2)
# 6
3.2 >> mul(2)
# 6.4

Functions used in verb arguments

@register_func(context=Context.EVAL)
def if_else(data, cond, true, false):
    cond.loc[cond.isin([True]), ] = true
    cond.loc[cond.isin([False]), ] = false
    return cond

# The function is then also a singledispatch generic function

df >> mutate(z=if_else(f.x>1, 20, 10))
#    x      y   z
# 0  0   zero  10
# 1  1    one  10
# 2  2    two  20
# 3  3  three  20

# function without data argument
@register_func(None)
def length(strings):
    return [len(s) for s in strings]

df >> mutate(z=length(f.y))

#    x     y    z
# 0  0  zero    4
# 1  1   one    3
# 2  2   two    3
# 3  3 three    5

# register existing functions
from numpy import vectorize
len = register_func(None, context=Context.EVAL, func=vectorize(len))

# original function still works
print(len('abc'))

df >> mutate(z=len(f.y))

# 3
#   x     y z
# 0 0  zero 4
# 1 1   one 3
# 2 2   two 3
# 3 3 three 5

Operators

You may also redefine the behavior of the operators

@register_operator
class MyOperators(Operator):
    def xor(self, a, b):
        """Inteprete X ^ Y as pow(X, Y)."""
        return a ** b

df >> mutate(z=f.x ^ 2)
#      x    y      z
# 0    0    zero   0
# 1    1    one    1
# 2    2    two    4
# 3    3    three  9

Context

The context defines how a reference (f.A, f['A'], f.A.B is evaluated)

from pipda import ContextBase

class MyContext(ContextBase):
    name = 'my'
    def getattr(self, parent, ref):
        # double it to distinguish getattr
        return getattr(parent, ref)
    def getitem(self, parent, ref):
        return parent[ref] * 2
    @property
    def ref(self):
        # how we evaluate the ref in f[ref]
        return self


@register_verb(context=MyContext())
def mutate_mycontext(data, **kwargs):
    for key, val in kwargs.items():
        data[key] = val
    return data

df >> mutate_mycontext(z=f.x + f['x'])

#   x     y z
# 0 0  zero 0
# 1 1   one 3
# 2 2   two 6
# 3 3 three 9

# when ref in f[ref] is also needed to be evaluated
df = df >> mutate(zero=0, one=1, two=2, three=3)
df

#    x      y  z  zero  one  two  three
# 0  0   zero  0     0    1    2      3
# 1  1    one  3     0    1    2      3
# 2  2    two  6     0    1    2      3
# 3  3  three  9     0    1    2      3

df >> mutate_mycontext(m=f[f.y][:1].values[0])
# f.y returns ['zero', 'one', 'two', 'three']
# f[f.y] gets [[0, 2, 4, 6], [0, 2, 4, 6], [0, 2, 4, 6], [0, 2, 4, 6]]
# f[f.y][:1].values gets [[0, 4, 8, 16]]
# f[f.y][:1].values[0] returns [0, 8, 16, 32]
# Notes that each subscription ([]) will double the values

#    x      y  z  zero  one  two  three   m
# 0  0   zero  0     0    1    2      3   0
# 1  1    one  3     0    1    2      3   8
# 2  2    two  6     0    1    2      3  16
# 3  3  three  9     0    1    2      3  24

Calling rules

Verb calling rules

data >> verb(...)
[PIPING_VERB]
First argument should not be passed, using the data
data >> other_verb(verb(...))
other_verb(data, verb(...))
registered_func(verb(...))
[PIPING]
Try using the first argument to evaluate (FastEvalVerb), if first argument is data. Otherwise, if it is Expression object, works as a non-data Function.
verb(...)
Called independently. The verb will be called regularly anyway. The first argument will be used as data to evaluate the arguments if there are any Expression objects
verb(...) with DataEnv
First argument should not be passed in, will use the DataEnv's data to evaluate the arguments

Data function calling rules

Functions that require first argument as data argument.

data >> verb(func(...)) or verb(data, func(...))
First argument is not used. Will use data
func(...)
Called independently. The function will be called regularly anyway. Similar as Verb calling rule, but first argument will not be used for evaluation
func(...) with DataEnv
First argument not used, passed implicitly with DataEnv.

Non-data function calling rules:

data >> verb(func(...)) or verb(data, func(...))
Return a Function object waiting for evaluation
func(...)
Called regularly anyway
func(...) with DataEnv
Evaluate with DataEnv. For example: mean(f.x)

Caveats

You have to use and_ and or_ for bitwise and/or (&/|) operators, as and and or are python keywords.
Limitations:

Any limitations apply to executing to detect the AST node will apply to pipda. It may not work in some circumstances where other AST magics apply.

What if source code is not available?

executing does not work in the case where source code is not available, as there is no way to detect the AST node to check how the functions (verbs, data functions, non-data functions) are called, either they are called as a piping verb (data >> verb(...)), or they are called as an argument of a verb (data >> verb(func(...))) or even they are called independently/regularly.

In such a case, you can set the option (options.mode="piping") to assume that all registered functions are called in piping mode, so that you can do data >> verb(...) without any changes. Or with options.mode="regular", you can do verb(data, ...)

You can also use this option to enhance the performance by skipping detection of the calling environment.
Use another piping sign
```
from pipda import register_piping
register_piping('^')

# register verbs and functions
df ^ verb1(...) ^ verb2(...)
```
Allowed signs are: +, -, *, @, /, //, %, **, <<, >>, &, ^ and |.

Note that to use the new piping sign, you have to register the verbs after the new piping sign being registered.
The context

The context is only applied to the DirectReference objects or unary operators, like -f.A, +f.A, ~f.A, f.A, f['A'], [f.A, f.B], etc. Any other Expression wrapping those objects or other operators getting involved will turn the context to Context.EVAL

How it works

The verbs

data %>% verb(arg1, ..., key1=kwarg1, ...)

The above is a typical dplyr/tidyr data piping syntax.

The counterpart R syntax we expect is:

data >> verb(arg1, ..., key1=kwarg1, ...)

To implement that, we need to defer the execution of the verb by turning it into a Verb object, which holds all information of the function to be executed later. The Verb object won't be executed until the data is piped in. It all thanks to the executing package to let us determine the ast nodes where the function is called. So that we are able to determine whether the function is called in a piping mode.

If an argument is referring to a column of the data and the column will be involved in the later computation, the it also needs to be deferred. For example, with dplyr in R:

data %>% mutate(z=a)

is trying add a column named z with the data from column a.

In python, we want to do the same with:

data >> mutate(z=f.a)

where f.a is a Reference object that carries the column information without fetching the data while python sees it immmediately.

Here the trick is f. Like other packages, we introduced the Symbolic object, which will connect the parts in the argument and make the whole argument an Expression object. This object is holding the execution information, which we could use later when the piping is detected.

The functions

Then what if we want to use some functions in the arguments of the verb? For example:

data >> select(starts_with('a'))

to select the columns with names start with 'a'.

No doubt that we need to defer the execution of the function, too. The trick is that we let the function return a Function object as well, and evaluate it as the argument of the verb.

The operators

pipda also opens oppotunities to change the behavior of the operators in verb/function arguments. This allows us to mimic something like this:

data >> select(-f.a) # select all columns but `a`

To do that, we turn it into an Operator object. Just like a Verb or a Function object, the execution is deferred. By default, the operators we used are from the python standard library operator. operator.neg in the above example.

You can also define you own by subclassing the Operator class, and then register it to replace the default one by decorating it with register_operator.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.13.1

Oct 10, 2023

0.13.0

Oct 5, 2023

0.12.0

Apr 13, 2023

0.11.1

Jan 18, 2023

0.11.0

Dec 8, 2022

0.10.0

Dec 1, 2022

0.9.0

Oct 28, 2022

0.8.2

Oct 17, 2022

0.8.1

Oct 15, 2022

0.8.0

Oct 8, 2022

0.7.6

Oct 6, 2022

0.7.5

Oct 5, 2022

0.7.4

Oct 5, 2022

0.7.3

Sep 23, 2022

0.7.2

Sep 20, 2022

0.7.1

Sep 13, 2022

0.7.0

Sep 4, 2022

0.6.0

May 13, 2022

This version

0.5.9

Mar 30, 2022

0.5.8

Mar 17, 2022

0.5.7

Mar 6, 2022

0.5.6

Mar 2, 2022

0.5.5

Mar 1, 2022

0.5.4

Mar 1, 2022

0.5.3

Feb 17, 2022

0.5.2

Feb 15, 2022

0.5.1

Feb 14, 2022

0.5.0

Feb 12, 2022

0.4.5

Aug 4, 2021

0.4.4

Aug 3, 2021

0.4.3

Jul 27, 2021

0.4.2

Jul 16, 2021

0.4.1

Jul 13, 2021

0.4.0

Jul 7, 2021

0.3.0

Jul 1, 2021

0.2.9

Jun 21, 2021

0.2.8

Jun 15, 2021

0.2.7

Jun 11, 2021

0.2.6

May 28, 2021

0.2.5

May 18, 2021

0.2.4

Apr 29, 2021

0.2.3

Apr 10, 2021

0.2.2

Apr 7, 2021

0.2.1

Apr 6, 2021

0.2.0

Mar 30, 2021

0.1.5

Mar 13, 2021

0.1.4

Mar 5, 2021

0.1.3

Mar 2, 2021

0.1.2

Mar 1, 2021

0.1.1

Feb 28, 2021

0.1.0

Feb 17, 2021

0.0.6

Dec 5, 2020

0.0.4

Dec 1, 2020

0.0.3

Nov 30, 2020

0.0.1

Nov 30, 2020

0.0.0

Nov 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipda-0.5.9.tar.gz (25.9 kB view details)

Uploaded Mar 30, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pipda-0.5.9-py3-none-any.whl (24.3 kB view details)

Uploaded Mar 30, 2022 Python 3

File details

Details for the file pipda-0.5.9.tar.gz.

File metadata

Download URL: pipda-0.5.9.tar.gz
Upload date: Mar 30, 2022
Size: 25.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.13 CPython/3.10.2 Linux/5.11.0-1028-azure

File hashes

Hashes for pipda-0.5.9.tar.gz
Algorithm	Hash digest
SHA256	`0127bd2351bb2b565259106b0bbb90d32f64f9703c8356873c9c946531661572`
MD5	`2bea8174574314a35939bf915c3e0218`
BLAKE2b-256	`0bc0ae02f4eb0911c3f5160d4bf848f407cdf4a904ddcacb139f66aa43e58649`

See more details on using hashes here.

File details

Details for the file pipda-0.5.9-py3-none-any.whl.

File metadata

Download URL: pipda-0.5.9-py3-none-any.whl
Upload date: Mar 30, 2022
Size: 24.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.13 CPython/3.10.2 Linux/5.11.0-1028-azure

File hashes

Hashes for pipda-0.5.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fa8dfb2926430704aa6d9e198a7514ffa2e72435c3f05bb52e7f4ef13ac374a3`
MD5	`c3727e547056a261feda982c6912727a`
BLAKE2b-256	`97ad616b35129b58173fe203ddccb3cb7350e18247aaa4a11e17461708298b6e`

See more details on using hashes here.

pipda 0.5.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

pipda

Installation

Usage

Verbs

Functions used in verb arguments

Operators

Context

Calling rules

Verb calling rules

Data function calling rules

Non-data function calling rules:

Caveats

How it works

The verbs

The functions

The operators

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes