Clean and transform data for ML binary classification with ease
Project description
Mankey Stats
Mankey_stats is a Python library that allows the user to quickly and efficiently perform data preparation techniques to transform the datasets for ML modeling, this is done through the utilization of several transformation and statistical analysis methods.
Documentation
Primary functionality include:
- Detailed analysis of features, including numerical distibution tests
- Analysis and handling of outliers and missing data
- Interactive plotting and data visualization functionality
- Tranformation options including One hot encoding, ordinal transformations, and weight of evidence
- Functionality to prepare date fields for ML models
- Ability to examine and recommend without modifying the underlying data
- Optimized logic to ensure fast execution times, using numpy, scipy, and vectorization techniques
Analysis of features:
- Feature Normality test
- Grubb's test and Tucky's fences for handling outliers (based on stat. distribution)
- Missing value analysis (% and best method to handle - mode/median/or mean)
- Best scaling methods are selected for each numeric feature (min-max scaler or standard scaler)
Multiple methods to handle categorical features:
- One Hot encoder
- Ordinal encoder
- Weight of Evidence transformations
Date manipulation
- Ability to expand date fields to YEAR, MONTH, and/or DAY fields
- Subtract date features to create a "due in days" field
Installation
The library is published in the PyPi repository, it can be installed with pip:
pip install mankey_stats
Feel free to help us improve, simply clone it from this github and submit your features :)
git clone https://github.com/mankey_stats/mankey_stats.git
Dependencies:
We rely on the proven ML libraries: pandas, Seabor, plotly, numpy, scipy, and Scikit Learn
Example Usage
>>> import pandas as pd
>>> from mankey_stats.ordinal_encoder as ordinal_encoder
>>> data = {'type': ['bad', 'average', 'good', 'very good', 'excellent'],
'level': [1, 2, 3, 4, 5]
>>> levels_dict = {'type': ['bad', 'average', 'good', 'very good', 'excellent'],
}
>>> data = pd.DataFrame(data)
>>> print(data)
Out[1]:
type level
bad 1
average 2
good 3
very good 4
excellent 5
Name: var_A, dtype: int64
t_ord = transformers.Ordinal_Transformer()
t_ord.fit( levels_dict, df,None)
df = t_ord.transform(df, None)
Out[2]:
0 1
1 2
2 3
3 4
4 5
Name: var_A, dtype: int64
Find more in the documentation.
Documentation
mankey-stats documentation is built using Sphinx and is hosted on Read the Docs.
You can re-build the docs using build html
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mankey_stats-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef6d87652525c472a398aae3e1f02c6318ce07c6f5ed59337774ec77f44c2118 |
|
MD5 | d7f420bbb33c76bbbf31823c8a30ee3e |
|
BLAKE2b-256 | 2eca413d6c5b8479e913fad8ddb5e716dc2d743053bd5fa6eeef23b7ed47197b |