Skip to main content

Clean and transform data for ML binary classification with ease

Project description

Mankey Stats

PythonVersion License https://github.com/dBlueG/mankey_stats/blob/main/LICENSE.md PyPI version fury.io Documentation Status https://mankey-stats.readthedocs.io/en/main/genindex.html

alt text

Mankey_stats is a Python library that allows the user to quickly and efficiently perform data preparation techniques to transform the datasets for ML modeling, this is done through the utilization of several transformation and statistical analysis methods.

Documentation

Primary functionality include:

  • Detailed analysis of features, including numerical distibution tests
  • Analysis and handling of outliers and missing data
  • Interactive plotting and data visualization functionality
  • Tranformation options including One hot encoding, ordinal transformations, and weight of evidence
  • Functionality to prepare date fields for ML models
  • Ability to examine and recommend without modifying the underlying data
  • Optimized logic to ensure fast execution times, using numpy, scipy, and vectorization techniques

Analysis of features:

  • Feature Normality test
  • Grubb's test and Tucky's fences for handling outliers (based on stat. distribution)
  • Missing value analysis (% and best method to handle - mode/median/or mean)
  • Best scaling methods are selected for each numeric feature (min-max scaler or standard scaler)

Multiple methods to handle categorical features:

  • One Hot encoder
  • Ordinal encoder
  • Weight of Evidence transformations

Date manipulation

  • Ability to expand date fields to YEAR, MONTH, and/or DAY fields
  • Subtract date features to create a "due in days" field

Installation

The library is published in the PyPi repository, it can be installed with pip:

pip install mankey_stats

Feel free to help us improve, simply clone it from this github and submit your features :)

git clone https://github.com/mankey_stats/mankey_stats.git

Dependencies:

We rely on the proven ML libraries: pandas, Seabor, plotly, numpy, scipy, and Scikit Learn

Example Usage

>>> import pandas as pd
>>> from mankey_stats.ordinal_encoder as ordinal_encoder

>>> data = {'type':  ['bad', 'average', 'good', 'very good', 'excellent'],
            'level': [1, 2, 3, 4, 5]

>>> levels_dict = {'type':  ['bad', 'average', 'good', 'very good', 'excellent'],
                   }

>>> data = pd.DataFrame(data)
>>> print(data)
Out[1]:
type    level
bad       1
average   2
good      3
very good 4
excellent 5  
Name: var_A, dtype: int64
    t_ord = transformers.Ordinal_Transformer()
    t_ord.fit( levels_dict, df,None)

    df = t_ord.transform(df, None)
Out[2]:
0       1
1       2
2       3
3       4
4       5
Name: var_A, dtype: int64

Find more in the documentation.

Documentation

mankey-stats documentation is built using Sphinx and is hosted on Read the Docs.

You can re-build the docs using build html

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mankey_stats-0.1.1.tar.gz (17.4 kB view hashes)

Uploaded Source

Built Distribution

mankey_stats-0.1.1-py3-none-any.whl (17.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page