Clean and transform data for ML binary classification with ease

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Education
License
- OSI Approved :: MIT License
Operating System
- MacOS :: MacOS X
- Microsoft :: Windows
Programming Language
- Python :: 3

Project description

Mankey Stats

PythonVersion

alt text

Mankey_stats is a Python library that allows the user to quickly and efficiently perform data preparation techniques to transform the datasets for ML modeling, this is done through the utilization of several transformation and statistical analysis methods.

Documentation

Documentation

Primary functionality include:

Detailed analysis of features, including numerical distibution tests
Analysis and handling of outliers and missing data
Interactive plotting and data visualization functionality
Tranformation options including One hot encoding, ordinal transformations, and weight of evidence
Functionality to prepare date fields for ML models
Ability to examine and recommend without modifying the underlying data
Optimized logic to ensure fast execution times, using numpy, scipy, and vectorization techniques

Analysis of features:

Feature Normality test
Grubb's test and Tucky's fences for handling outliers (based on stat. distribution)
Missing value analysis (% and best method to handle - mode/median/or mean)
Best scaling methods are selected for each numeric feature (min-max scaler or standard scaler)

Multiple methods to handle categorical features:

One Hot encoder
Ordinal encoder
Weight of Evidence transformations

Date manipulation

Ability to expand date fields to YEAR, MONTH, and/or DAY fields
Subtract date features to create a "due in days" field

Installation

The library is published in the PyPi repository, it can be installed with pip:

pip install mankey_stats

Feel free to help us improve, simply clone it from this github and submit your features :)

git clone https://github.com/mankey_stats/mankey_stats.git

Dependencies:

We rely on the proven ML libraries: pandas, Seabor, plotly, numpy, scipy, and Scikit Learn

Example Usage

>>> import pandas as pd
>>> from mankey_stats.ordinal_encoder as ordinal_encoder

>>> data = {'type':  ['bad', 'average', 'good', 'very good', 'excellent'],
            'level': [1, 2, 3, 4, 5]

>>> levels_dict = {'type':  ['bad', 'average', 'good', 'very good', 'excellent'],
                   }

>>> data = pd.DataFrame(data)
>>> print(data)

Out[1]:
type    level
bad       1
average   2
good      3
very good 4
excellent 5  
Name: var_A, dtype: int64

    t_ord = transformers.Ordinal_Transformer()
    t_ord.fit( levels_dict, df,None)

    df = t_ord.transform(df, None)

Out[2]:
0       1
1       2
2       3
3       4
4       5
Name: var_A, dtype: int64

Find more in the documentation.

Documentation

mankey-stats documentation is built using Sphinx and is hosted on Read the Docs.

You can re-build the docs using build html

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Education
License
- OSI Approved :: MIT License
Operating System
- MacOS :: MacOS X
- Microsoft :: Windows
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.1

Mar 15, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mankey_stats-0.1.1.tar.gz (17.4 kB view hashes)

Uploaded Mar 15, 2022 Source

Built Distribution

mankey_stats-0.1.1-py3-none-any.whl (17.0 kB view hashes)

Uploaded Mar 15, 2022 Python 3

Hashes for mankey_stats-0.1.1.tar.gz

Hashes for mankey_stats-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`481a6e785a6e3071f4ec21b99ff5bab8bf707d6a92ffa289bf637f2a92134a32`
MD5	`fe806947f1eac59f975da4dd582581f0`
BLAKE2b-256	`6a487c8698528ee89112c15f7b3e5ccdd8ba95cab266ec03303c0b0923e5c4f9`

Hashes for mankey_stats-0.1.1-py3-none-any.whl

Hashes for mankey_stats-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ef6d87652525c472a398aae3e1f02c6318ce07c6f5ed59337774ec77f44c2118`
MD5	`d7f420bbb33c76bbbf31823c8a30ee3e`
BLAKE2b-256	`2eca413d6c5b8479e913fad8ddb5e716dc2d743053bd5fa6eeef23b7ed47197b`