Skip to main content

A framework for data piping in python

Project description

pipda

PyPI GitHub Codacy grade Codacy coverage Docs CI

A framework for data piping in Python.

Inspired by siuba, dfply, plydata, and dplython. Provides simple yet powerful APIs to mimic dplyr and tidyr in Python.

API | Changelog | Documentation

Installation

pip install -U pipda

Usage

Verbs

  • A verb is pipeable (able to be called like data >> verb(...))
  • A verb is dispatchable by the type of its first argument
  • A verb evaluates other arguments using the first one
  • A verb is passing down the context if not specified in the arguments
import pandas as pd
from pipda import (
    register_verb,
    register_func,
    register_operator,
    evaluate_expr,
    Operator,
    Symbolic,
    Context
)

f = Symbolic()

df = pd.DataFrame({
    'x': [0, 1, 2, 3],
    'y': ['zero', 'one', 'two', 'three']
})

df

#      x    y
# 0    0    zero
# 1    1    one
# 2    2    two
# 3    3    three

@register_verb(pd.DataFrame)
def head(data, n=5):
    return data.head(n)

df >> head(2)
#      x    y
# 0    0    zero
# 1    1    one

@register_verb(pd.DataFrame, context=Context.EVAL)
def mutate(data, **kwargs):
    data = data.copy()
    for key, val in kwargs.items():
        data[key] = val
    return data

df >> mutate(z=1)
#    x      y  z
# 0  0   zero  1
# 1  1    one  1
# 2  2    two  1
# 3  3  three  1

df >> mutate(z=f.x)
#    x      y  z
# 0  0   zero  0
# 1  1    one  1
# 2  2    two  2
# 3  3  three  3

Functions used as verb arguments

# verb can be used as an argument passed to another verb
# dependent=True makes the `data` argument invisible while calling
@register_verb(pd.DataFrame, context=Context.EVAL, dependent=True)
def if_else(data, cond, true, false):
    cond.loc[cond.isin([True]), ] = true
    cond.loc[cond.isin([False]), ] = false
    return cond

# The function is then also a singledispatch generic function

df >> mutate(z=if_else(f.x>1, 20, 10))
#    x      y   z
# 0  0   zero  10
# 1  1    one  10
# 2  2    two  20
# 3  3  three  20
# function without data argument
@register_func
def length(strings):
    return [len(s) for s in strings]

df >> mutate(z=length(f.y))

#    x     y    z
# 0  0  zero    4
# 1  1   one    3
# 2  2   two    3
# 3  3 three    5

Context

The context defines how a reference (f.A, f['A'], f.A.B) is evaluated

@register_verb(pd.DataFrame, context=Context.SELECT)
def select(df, *columns):
    return df[list(columns)]

df >> select(f.x, f.y)
#    x     y
# 0  0  zero
# 1  1   one
# 2  2   two
# 3  3 three

How it works

data %>% verb(arg1, ..., key1=kwarg1, ...)

The above is a typical dplyr/tidyr data piping syntax.

The Python counterpart is:

data >> verb(arg1, ..., key1=kwarg1, ...)

To implement this, execution of the verb must be deferred by turning it into a VerbCall object that holds the function and its arguments. The VerbCall is not evaluated until data is piped in via >>. This detection is made possible by the executing package, which inspects the AST to determine whether a function call appears on the right-hand side of a pipe operator.

Arguments that reference columns of the data must also be deferred. For example, in dplyr (R):

data %>% mutate(z = a)

This adds a column z with values from column a. In Python, the equivalent is:

data >> mutate(z=f.a)

Here f.a is a Reference object that captures the column name without immediately fetching the data.

The Symbolic object f acts as a proxy, chaining attribute/item accesses and operator expressions into a single Expression tree. That tree is later evaluated when data and context become available.

Documentation

https://pwwang.github.io/pipda/

See datar for real-world usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipda-0.14.0.tar.gz (150.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipda-0.14.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file pipda-0.14.0.tar.gz.

File metadata

  • Download URL: pipda-0.14.0.tar.gz
  • Upload date:
  • Size: 150.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pipda-0.14.0.tar.gz
Algorithm Hash digest
SHA256 03d2f4e9e9e24ed3976342205d6f64c014918fc26acc0b5d9f496a5192e393eb
MD5 af7d120a2d4f27efd07764ce6483a105
BLAKE2b-256 789ab3a6deb309a73ee978a02f260284dbdce7a94b76a528278dac5cc1f2d4a9

See more details on using hashes here.

File details

Details for the file pipda-0.14.0-py3-none-any.whl.

File metadata

  • Download URL: pipda-0.14.0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pipda-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cf16b7d67fc52d5cb1be16688fee41886ffc8044df73a4a4a8fb4b7709c9f3d1
MD5 f083211b9867aa7756a3b61c1dd9381a
BLAKE2b-256 19405e34e34d38f1a3f5ba013fdae3789fc2fd3985ac1cb87da6153bb66235d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page