Skip to main content

A package designed to simplify data preprocessing for use with Pandas

Project description

pandashape: a simpleish Python package for easy data cleanup and preparation of Pandas dataframes

I made pandashape because I've been finding I do a lot of the same repetitive cleanup for simple modeling with scikit-learn. I've intentionally designed it to make data preparation expressive, concise, and easily repeatable - just put your use of

Getting started

Just install with pip!

pip install pandashape

Using pandashape

Create your dataframe however you choose - from a CSV, .txt. file, random generation, whatever. Then make a PandaShaper and use the expressive syntax to define a pipeline for cleanup:

# import packages
import numpy as np
import pandas as pd
from pandashape import PandaShaper, Columns
from pandashape.transformers import MassLabelEncoder, NullColumnsDropper

# create your frame
my_df = pd.read_csv('./my_data.csv')

# wrap it in a shaper
shaper = PandaShaper(my_df)

# create a pipeline of transform operations (these will happen in order)
# and assign the output to a new (transformed) frame!
transformed_df = shaper.transform(
    {
        # drop columns that have 80% or less null data
        'columns': Columns.All,
        'transformers': [
            NullColumnsDropper(null_values=[np.nan, None, ''], threshold=0.8),
            ModeImputer()
        ]
    },
    {
        # MassLabelEncoder one-hot-encodes targeted categorical columns if they
        # have a number of values ≥ the breakpoint or label encodes them normally 
        'columns': ['Education', 'SES'], 
        'transformers': MassLabelEncoder(label_encoding_breakpoint=4)
    }
)

# inspect the new frame to see the fruits of your labors!
transformed_df.head()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandashape-0.0.1.tar.gz (3.5 kB view hashes)

Uploaded Source

Built Distribution

pandashape-0.0.1-py3-none-any.whl (6.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page