Skip to main content

A framework for data piping in python

Project description

pipda

PyPI GitHub Codacy grade Codacy coverage Docs CI

A framework for data piping in Python.

Inspired by siuba, dfply, plydata, and dplython. Provides simple yet powerful APIs to mimic dplyr and tidyr in Python.

API | Changelog | Documentation

Installation

pip install -U pipda

Usage

Verbs

  • A verb is pipeable (able to be called like data >> verb(...))
  • A verb is dispatchable by the type of its first argument
  • A verb evaluates other arguments using the first one
  • A verb is passing down the context if not specified in the arguments
import pandas as pd
from pipda import (
    register_verb,
    register_func,
    register_operator,
    evaluate_expr,
    Operator,
    Symbolic,
    Context
)

f = Symbolic()

df = pd.DataFrame({
    'x': [0, 1, 2, 3],
    'y': ['zero', 'one', 'two', 'three']
})

df

#      x    y
# 0    0    zero
# 1    1    one
# 2    2    two
# 3    3    three

@register_verb(pd.DataFrame)
def head(data, n=5):
    return data.head(n)

df >> head(2)
#      x    y
# 0    0    zero
# 1    1    one

@register_verb(pd.DataFrame, context=Context.EVAL)
def mutate(data, **kwargs):
    data = data.copy()
    for key, val in kwargs.items():
        data[key] = val
    return data

df >> mutate(z=1)
#    x      y  z
# 0  0   zero  1
# 1  1    one  1
# 2  2    two  1
# 3  3  three  1

df >> mutate(z=f.x)
#    x      y  z
# 0  0   zero  0
# 1  1    one  1
# 2  2    two  2
# 3  3  three  3

Functions used as verb arguments

# verb can be used as an argument passed to another verb
# dependent=True makes the `data` argument invisible while calling
@register_verb(pd.DataFrame, context=Context.EVAL, dependent=True)
def if_else(data, cond, true, false):
    cond.loc[cond.isin([True]), ] = true
    cond.loc[cond.isin([False]), ] = false
    return cond

# The function is then also a singledispatch generic function

df >> mutate(z=if_else(f.x>1, 20, 10))
#    x      y   z
# 0  0   zero  10
# 1  1    one  10
# 2  2    two  20
# 3  3  three  20
# function without data argument
@register_func
def length(strings):
    return [len(s) for s in strings]

df >> mutate(z=length(f.y))

#    x     y    z
# 0  0  zero    4
# 1  1   one    3
# 2  2   two    3
# 3  3 three    5

Context

The context defines how a reference (f.A, f['A'], f.A.B) is evaluated

@register_verb(pd.DataFrame, context=Context.SELECT)
def select(df, *columns):
    return df[list(columns)]

df >> select(f.x, f.y)
#    x     y
# 0  0  zero
# 1  1   one
# 2  2   two
# 3  3 three

How it works

data %>% verb(arg1, ..., key1=kwarg1, ...)

The above is a typical dplyr/tidyr data piping syntax.

The Python counterpart is:

data >> verb(arg1, ..., key1=kwarg1, ...)

To implement this, execution of the verb must be deferred by turning it into a VerbCall object that holds the function and its arguments. The VerbCall is not evaluated until data is piped in via >>. This detection is made possible by the executing package, which inspects the AST to determine whether a function call appears on the right-hand side of a pipe operator.

Arguments that reference columns of the data must also be deferred. For example, in dplyr (R):

data %>% mutate(z = a)

This adds a column z with values from column a. In Python, the equivalent is:

data >> mutate(z=f.a)

Here f.a is a Reference object that captures the column name without immediately fetching the data.

The Symbolic object f acts as a proxy, chaining attribute/item accesses and operator expressions into a single Expression tree. That tree is later evaluated when data and context become available.

Documentation

https://pwwang.github.io/pipda/

See datar for real-world usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipda-0.14.1.tar.gz (150.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipda-0.14.1-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file pipda-0.14.1.tar.gz.

File metadata

  • Download URL: pipda-0.14.1.tar.gz
  • Upload date:
  • Size: 150.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pipda-0.14.1.tar.gz
Algorithm Hash digest
SHA256 d64bc344f48aa2bb5bd7dad21757c586ef6fa16fa0dc4ffb4a6882c302c1365e
MD5 b44d267b9fc94c83d72cb5ae8fe6a9b7
BLAKE2b-256 35f7de537017c3af5274c3ea63b01a67385675b73a4c5fdb1e7bbfcd5043fa4e

See more details on using hashes here.

File details

Details for the file pipda-0.14.1-py3-none-any.whl.

File metadata

  • Download URL: pipda-0.14.1-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pipda-0.14.1-py3-none-any.whl
Algorithm Hash digest
SHA256 87c7b1c6b084d0faaf06ec132e3da55a27de9ffbabfe886e1a55fc36b3cb4cf1
MD5 dbcbfa315b96ddccd2fa50f17f336c77
BLAKE2b-256 37d27d2a855b3deb37f82fa5423b042fc546ab68faa669c5462046a46c50e53b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page