Skip to main content

A framework for data piping in python

Project description

pipda

Pypi Github PythonVers Codacy Codacy coverage Docs building Building

A framework for data piping in python

Inspired by siuba, dfply, plydata and dplython, but with simple yet powerful APIs to mimic the dplyr and tidyr packages in python

API | Change Log | Documentation

Installation

pip install -U pipda

Usage

Verbs

Verbs are functions next to the piping sign (>>) receiving the data directly.

import pandas as pd
from pipda import (
    register_verb,
    register_func,
    register_operator,
    evaluate_expr,
    Operator,
    Symbolic,
    Context
)

f = Symbolic()

df = pd.DataFrame({
    'x': [0, 1, 2, 3],
    'y': ['zero', 'one', 'two', 'three']
})

df

#      x    y
# 0    0    zero
# 1    1    one
# 2    2    two
# 3    3    three

@register_verb(pd.DataFrame)
def head(data, n=5):
    return data.head(n)

df >> head(2)
#      x    y
# 0    0    zero
# 1    1    one

@register_verb(pd.DataFrame, context=Context.EVAL)
def mutate(data, **kwargs):
    data = data.copy()
    for key, val in kwargs.items():
        data[key] = val
    return data

df >> mutate(z=1)
#    x      y  z
# 0  0   zero  1
# 1  1    one  1
# 2  2    two  1
# 3  3  three  1

df >> mutate(z=f.x)
#    x      y  z
# 0  0   zero  0
# 1  1    one  1
# 2  2    two  2
# 3  3  three  3

Functions used as verb arguments

# verb can be used as an argument passed to another verb
# dep=True make `data` argument invisible while calling
@register_verb(pd.DataFrame, context=Context.EVAL, dep=True)
def if_else(data, cond, true, false):
    cond.loc[cond.isin([True]), ] = true
    cond.loc[cond.isin([False]), ] = false
    return cond

# The function is then also a singledispatch generic function

df >> mutate(z=if_else(f.x>1, 20, 10))
#    x      y   z
# 0  0   zero  10
# 1  1    one  10
# 2  2    two  20
# 3  3  three  20
# function without data argument
@register_func
def length(strings):
    return [len(s) for s in strings]

df >> mutate(z=length(f.y))

#    x     y    z
# 0  0  zero    4
# 1  1   one    3
# 2  2   two    3
# 3  3 three    5

Context

The context defines how a reference (f.A, f['A'], f.A.B is evaluated)

@register_verb(pd.DataFrame, context=Context.SELECT)
def select(df, *columns):
    return df[list(columns)]

df >> select(f.x, f.y)
#    x     y
# 0  0  zero
# 1  1   one
# 2  2   two
# 3  3 three

How it works

data %>% verb(arg1, ..., key1=kwarg1, ...)

The above is a typical dplyr/tidyr data piping syntax.

The counterpart python syntax we expect is:

data >> verb(arg1, ..., key1=kwarg1, ...)

To implement that, we need to defer the execution of the verb by turning it into a Verb object, which holds all information of the function to be executed later. The Verb object won't be executed until the data is piped in. It all thanks to the executing package to let us determine the ast nodes where the function is called. So that we are able to determine whether the function is called in a piping mode.

If an argument is referring to a column of the data and the column will be involved in the later computation, the it also needs to be deferred. For example, with dplyr in R:

data %>% mutate(z=a)

is trying add a column named z with the data from column a.

In python, we want to do the same with:

data >> mutate(z=f.a)

where f.a is a Reference object that carries the column information without fetching the data while python sees it immmediately.

Here the trick is f. Like other packages, we introduced the Symbolic object, which will connect the parts in the argument and make the whole argument an Expression object. This object is holding the execution information, which we could use later when the piping is detected.

Documentation

https://pwwang.github.io/pipda/

See also datar for realcase usages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipda-0.7.2.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

pipda-0.7.2-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file pipda-0.7.2.tar.gz.

File metadata

  • Download URL: pipda-0.7.2.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.6 Linux/5.15.0-1019-azure

File hashes

Hashes for pipda-0.7.2.tar.gz
Algorithm Hash digest
SHA256 2a82508f32cd12ad0856a6e95c8362550085fef2d7e0c7d3a000e9e5f6b86ffa
MD5 11d6ebf7f42d1ba8191042d647828d43
BLAKE2b-256 13ca732d93cf199d5c97c9a317b44c8d4a9c868de604a7939adb1ccbb4ef48c5

See more details on using hashes here.

File details

Details for the file pipda-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: pipda-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.6 Linux/5.15.0-1019-azure

File hashes

Hashes for pipda-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 46848308d9cc54a72631bb8502fca97dda7f751b538428b718f0300cf7fb3358
MD5 ceef012c70e5081885b08bcf2ce931b4
BLAKE2b-256 d51cd5c59d8154a5eb41d7c77d20676fbb237f816c1209b299e8e2c534b8f1d6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page