A package designed to simplify data preprocessing for use with Pandas
Project description
pandashape: a simpleish Python package for easy data cleanup and preparation of Pandas dataframes
I made pandashape
because I've been finding I do a lot of the same repetitive cleanup for simple modeling with scikit-learn.
I've intentionally designed it to make data preparation expressive, concise, and easily repeatable - just put your use of
Getting started
Just install with pip!
pip install pandashape
Using pandashape
Create your dataframe however you choose - from a CSV, .txt.
file, random generation, whatever. Then make a PandaShaper and use
the expressive syntax to define a pipeline for cleanup:
# import packages
import numpy as np
import pandas as pd
from pandashape import PandaShaper, Columns
from pandashape.transformers import MassLabelEncoder, NullColumnsDropper
# create your frame
my_df = pd.read_csv('./my_data.csv')
# wrap it in a shaper
shaper = PandaShaper(my_df)
# create a pipeline of transform operations (these will happen in order)
# and assign the output to a new (transformed) frame!
transformed_df = shaper.transform(
{
# drop columns that have 80% or less null data
'columns': Columns.All,
'transformers': [
NullColumnsDropper(null_values=[np.nan, None, ''], threshold=0.8),
ModeImputer()
]
},
{
# MassLabelEncoder one-hot-encodes targeted categorical columns if they
# have a number of values ≥ the breakpoint or label encodes them normally
'columns': ['Education', 'SES'],
'transformers': MassLabelEncoder(label_encoding_breakpoint=4)
}
)
# inspect the new frame to see the fruits of your labors!
transformed_df.head()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pandashape-0.0.1.tar.gz
(3.5 kB
view hashes)
Built Distribution
Close
Hashes for pandashape-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 106a62f15b941cb4870a7e48f27f34a5a86750e4a936c87de8b73be44c020014 |
|
MD5 | 753769aa7f972eb67bb3c108e3871726 |
|
BLAKE2b-256 | be968dabc9f7c7e65255a84472866b9799b90384b0fe12fb93e743ac90265cd8 |