Skip to main content

A Python module for maintaining pipeline syntax of Pandas statements.

Project description

PyPI version shields.io PyPI version shields.io PyPI version shields.io

Piper

Piper is a python package to help simplify data wrangling tasks with pandas. It provides a set of wrapper functions or 'verbs' that provide a simpler interface to standard Pandas functions.

Piper functions accept and receive pandas dataframe objects. They can be used as standalone functions but are more powerful when used together in a Jupyter notebook cell to form a data pipeline.

Instead of the traditional the 'dot notation' or method calling technique from within an object, piper receives and passes on the dataframe object between functions using the piper magic command link operator (that is '>>') within a cell. So, in traditional pandas, to see the first 5 rows of a dataframe:

df.head()

The equivalent in piper would be:

%%piper
df >> head()

Installation

To install the package, enter the following:

pip install dpiper

Documentation

Piper API documentation available at readthedocs

Quick start

Example #1

A dataframe consisting of two columns A and B.

import pandas as pd
import numpy as np

np.random.seed(42)
df = pd.DataFrame({'A': np.random.randint(10, 1000, 10),
                   'B': np.random.randint(10, 1000, 10)})
df.head()
A B
0 112 476
1 445 224
2 870 340
3 280 468
4 116 97

Piper equivalent

from piper.defaults import *
%%piper
df >> assign(C = lambda x: x.A + x.B,
             D = lambda x: x.C < 1000)
   >> where("~D")
A B C D
2 870 340 1210 False
8 624 673 1297 False

Example #2

Suppose you need the following function to trim columnar text data.

def trim_columns(df):
    ''' Trim blanks for given dataframe '''

    str_cols = df.select_dtypes(include='object').columns

    for col in str_cols:
        df[col] = df[col].str.strip()

    return df

import pandas as pd
from piper.factory import sample_data

df = sample_data()

# Select all columns EXCEPT 'dates'
subset_cols = ['order_dates', 'regions', 'countries', 'values_1', 'values_2']

criteria1 = ~df['countries'].isin(['Italy', 'Portugal'])
criteria2 = df['values_1'] > 40
criteria3 = df['values_2'] < 25

df2 = (df[subset_cols][criteria1 & criteria2 & criteria3]
       .pipe(trim_columns)
       .sort_values('countries', ascending=False))
df2.head()

Piper equivalent

Using the %%piper magic function, piper verbs can be combined with standard python functions.

from piper.defaults import *
%%piper
sample_data()
>> trim_columns()
>> select('-dates')
>> where(""" ~countries.isin(['Italy', 'Portugal']) &
              values_1 > 40 &
              values_2 < 25 """)
>> order_by('-countries')
>> head(5)

Result:

dates order_dates countries ids values_1 values_2
2020-03-03 2020-03-09 Sweden E 194 20
2020-05-02 2020-05-08 Sweden D 322 14
2020-01-20 2020-01-26 Spain A 183 20
2020-02-01 2020-02-07 Norway D 344 21
2020-05-06 2020-05-12 Norway B 135 21

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpiper-0.1.2.tar.gz (82.5 kB view hashes)

Uploaded Source

Built Distribution

dpiper-0.1.2-py3-none-any.whl (91.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page