Skip to main content

peperoncino: A library for easy data processing for pandas

Project description

peperoncino: A library for easy data processing for pandas


$ pip install peperoncino

How to use

Processing DataFrame

import peperoncino as pp

pipeline = pp.Pipeline(
    # query data
    pp.Query("bar <= 3"),
    # assign new feature
    pp.Assign(hoge="foo * bar"),
    # generate combination feature
    pp.Combinations(["foo", "baz"], ["*", "/"]),
    # target encoding
    pp.TargetEncoding(["baz"], "y", ref=0),
    # select features
        ["hoge", "*_foo_baz", "TARGET_ENC_baz_BY_y", "y"],

# execute the processing
train_df, val_df, test_df = \
    pipeline.process([train_df, val_df, test_df])

Predefined processings

name description
ApplyColumn Apply a function to a column.
AsCategory Assign category dtype to columns.
Assign Assign a feature by a formula.
Combinations Create combination features.
DropColumns Drop columns.
DropDuplicates Drop duplicate rows.
Pipeline Chain processings.
Query Query rows by a given condition.
RenameCOlumns Rename columns.
Select Select columns.
StatsEncoding Encode columns by statistical values of another column.
TargetEncoding Target Encoding with smoothing.

Define processing

All processings are subclass of pp.BaseProcessing.
All you need is define the _process(self, dfs: List[pd.DataFrame]) -> List[pd.DataFrame] function.

class ExampleProcessing(pp.BaseProcessing):
    def _process(self, dfs: List[pd.DataFrame]) -> List[pd.DataFrame]:
        return [df + 1 for df in dfs]

If your processing doesn't depent on each other data frames, then use pp.SeparatedProcessing.

class ExampleProcessing(pp.SeparatedProcessing):
    def sep_process(self, df: pd.DataFrame) -> pd.DataFrame:
        return df * 2

If you need to merge all dataframes and then apply your processing, use pp.MergedProcessing.

class ExampleProcessing(pp.SeparatedProcessing):
    def simul_process(self, df: pd.DataFrame) -> pd.DataFrame:
        return df.assign(col1_mean=df['col1'].mean())

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for peperoncino, version 0.0.5
Filename, size File type Python version Upload date Hashes
Filename, size peperoncino-0.0.5-py3-none-any.whl (15.4 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size peperoncino-0.0.5.tar.gz (10.1 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page