Skip to main content

MOB is a statistical approach to transform continuous variables into optimal and monotonic categorical variables.

Project description

Monotonic-Optimal-Binning

MOB is a statistical approach to transform continuous variables into optimal and monotonic categorical variables. In this proejct, we extend the application so that the user can choose whether merge the bins under a statistics base or a bins size base to obtain the optimal result based on the users' expectation.

Installation

python3 -m pip install MOBPY

Usage

Example :

import pandas as pd
from MOBPY.MOB import MOB


if __name__ == '__main__' :
    # import the testing datasets
    df = pd.read_csv('/data/german_data_credit_cat.csv')
    
    # Original values in the column are [1,2], make it into 1 representing the positive term, and 0 for the other one.
    df['default'] = df['default'] - 1

    # run the MOB algorithm to discretize the variable 'Durationinmonth'.
    MOB_ALGO = MOB(data = df, var = 'Durationinmonth', response = 'default', exclude_value = None)
    # A must-do step is to set the binning constraints.
    MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3, 
                                    max_samples = 0.4, min_samples = 0.05, 
                                    min_bads = 0.05, 
                                    init_pvalue = 0.4, 
                                    maximize_bins=True)
    # execute the MOB algorithm.
    SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size') # Run under the bins size base.

    StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats') # Run under the statistical base. 
    

The runMOB method will return a pandas.DataFrame which shows the binning result of the variable and also the WoE summary information for each bin.

Image

And after we receive the binning result dataframe, we can plot it by using MOBPY.plot.MOB_PLOT.plotBinsSummary to visualize the binning summary result.

from MOBPY.plot.MOB_PLOT import MOB_PLOT

# plot the bin summary data.
print('Bins Size Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')

print('Statisitcal Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')

Image

Normally, the result of Stats (statistical base) and Size (bins size base) will be identical, but when the data appears to be quite extreme in the binning process, the Size method will prefer to make the population of each bin between the maximum and minimum limitation, while the Stats method will remain to conduct the algorithm through a stricter logic based on the testing hypothesis results.

For example,

# run the MOB algorithm to discretize the variable 'Creditamount'.
MOB_ALGO = MOB(data = df, var = 'Creditamount', response = 'default', exclude_value = None) 
# Set Binning Constraints (Must-Do!)
MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3, 
                                max_samples = 0.4, min_samples = 0.05, 
                                min_bads = 0.05, 
                                init_pvalue = 0.4, 
                                maximize_bins=True)
# mergeMethod = 'Size' means to run MOB algorithm under bins size base
SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size')
StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats')

# plot the bin summary data.
print('Bins Size Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')
print('Statisitcal Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')
SizeBinning StatsBinning
mergeMethod = 'Size' (bins size base) mergeMethod = 'Stats' (statistical base)
Image 1 Image 2

The left side image is the result generated by mergeMethod = 'Size' (bins size base), and the right side is the result generated by mergeMethod = 'Stats' (statistical base).We can see that the Size method merge the bins that do not meet the minimum sample population and maintain the bins number in order to prevent from exceeding the minimum bins limitation.

Environment

OS : macOS Ventura

IDE: Visual Studio Code 1.79.2 (Universal)

Language : Python 3.9.7 
    - pandas 1.3.4
    - numpy 1.20.3
    - scipy 1.7.1
    - matplotlib 3.7.1

Citation

Reference

Authors

  1. Chen, Ta-Hung (Denny)
  2. Tsai, Yu-Cheng (Darren)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MOBPY-1.0.0.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

MOBPY-1.0.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file MOBPY-1.0.0.tar.gz.

File metadata

  • Download URL: MOBPY-1.0.0.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for MOBPY-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6bdbb7fe8cc8acf0dffd3169d4e0b112e9f1b775109dbd5ca2eb1912ccd8ea02
MD5 941ab1ca273830d8d43e14919a70825c
BLAKE2b-256 7d72c1194f6d2d46bba0b67551ea3b1312ed9b562518d4e1c0b4e056d40b9aa2

See more details on using hashes here.

File details

Details for the file MOBPY-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: MOBPY-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for MOBPY-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d0e0fce49bd59bb8cf4dc72c1b25ac2d8119c37879aa35943b1ec38f68bd13f
MD5 98ff14123f9c1a4b9d3d44d730dffb7a
BLAKE2b-256 5e952aa0316bf7a4ad4d4576792e394ab800831203183d8ad3a4d99751699bdb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page