A package designed to simplify data preprocessing for use with Pandas
Project description
pandashape: a simpleish Python package for easy data cleanup and preparation of Pandas dataframes
I made pandashape
because I've been finding I do a lot of the same repetitive cleanup for simple modeling with scikit-learn.
I've intentionally designed it to make data preparation expressive, concise, and easily repeatable - just put your use of
Getting started
Just install with pip!
pip install pandashape
Using pandashape
Create your dataframe however you choose - from a CSV, .txt.
file, random generation, whatever. Then make a PandaShaper and use
the expressive syntax to define a pipeline for cleanup:
# import packages
import numpy as np
import pandas as pd
from pandashape import PandaShaper, Columns
from pandashape.transformers import CategoricalEncoder, NullColumnsDropper
# create your frame
my_df = pd.read_csv('./my_data.csv')
# wrap it in a shaper
shaper = PandaShaper(my_df)
# create a pipeline of transform operations (these will happen in order)
# and assign the output to a new (transformed) frame!
transformed_df = shaper.transform(
{
# drop columns that have 80% or less null data
'columns': Columns.All,
'transformers': [
NullColumnsDropper(null_values=[np.nan, None, ''], threshold=0.8),
ModeImputer()
]
},
{
# CategoricalEncoder one-hot-encodes targeted categorical columns if they
# have a number of values ≥ the breakpoint or label encodes them normally
'columns': ['Education', 'SES'],
'transformers': CategoricalEncoder(label_encoding_breakpoint=4)
}
)
# inspect the new frame to see the fruits of your labors!
transformed_df.head()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pandashape-0.0.3.tar.gz
(6.3 kB
view hashes)
Built Distribution
Close
Hashes for pandashape-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34c5e2e7fe8bd2a71a0dea3f914a6494945df2f6109aef14628aa947b8c180a3 |
|
MD5 | ae1c4c9094663bf80c3ef76b742f5808 |
|
BLAKE2b-256 | 7a53de6e11e382177cf44642fe3a1c83b85fb0681a569d08f38744ab1ec06800 |