A package designed to simplify data preprocessing for use with Pandas
Project description
pandashape: a simpleish Python package for easy data cleanup and preparation of Pandas dataframes
I made pandashape
because I've been finding I do a lot of the same repetitive cleanup for simple modeling with scikit-learn.
I've intentionally designed it to make data preparation expressive, concise, and easily repeatable - just put your use of
Getting started
Just install with pip!
pip install pandashape
Using pandashape
Create your dataframe however you choose - from a CSV, .txt.
file, random generation, whatever. Then make a PandaShaper and use
the expressive syntax to define a pipeline for cleanup:
# import packages
import numpy as np
import pandas as pd
from pandashape import PandaShaper, Columns
from pandashape.transformers import MassLabelEncoder, NullColumnsDropper
# create your frame
my_df = pd.read_csv('./my_data.csv')
# wrap it in a shaper
shaper = PandaShaper(my_df)
# create a pipeline of transform operations (these will happen in order)
# and assign the output to a new (transformed) frame!
transformed_df = shaper.transform(
{
# drop columns that have 80% or less null data
'columns': Columns.All,
'transformers': [
NullColumnsDropper(null_values=[np.nan, None, ''], threshold=0.8),
ModeImputer()
]
},
{
# MassLabelEncoder one-hot-encodes targeted categorical columns if they
# have a number of values ≥ the breakpoint or label encodes them normally
'columns': ['Education', 'SES'],
'transformers': MassLabelEncoder(label_encoding_breakpoint=4)
}
)
# inspect the new frame to see the fruits of your labors!
transformed_df.head()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pandashape-0.0.2.tar.gz
(5.7 kB
view hashes)
Built Distribution
Close
Hashes for pandashape-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fd55d291820f4f4dab90eba379ec1d2039328e44f5bdb813f73855bd9f37198 |
|
MD5 | 2982c02d0ed78ec50f2fb41731ca6dd3 |
|
BLAKE2b-256 | 650e50ab0ea904e029ed2c73fc5e808cf1feac208e452df2595fc4bd22dd1904 |