Skip to main content

MOB is a statistical approach to transform continuous variables into optimal and monotonic categorical variables.

Project description

Monotonic-Optimal-Binning

MOB is a statistical approach to transform continuous variables into optimal and monotonic categorical variables. In this proejct, we extend the application so that the user can choose whether merge the bins under a statistics base or a bins size base to obtain the optimal result based on the users' expectation.

Installation

python3 -m pip install MOBPY

Usage

Example :

import pandas as pd
from MOBPY.MOB import MOB


if __name__ == '__main__' :
    # import the testing datasets
    df = pd.read_csv('/data/german_data_credit_cat.csv')
    
    # Original values in the column are [1,2], make it into 1 representing the positive term, and 0 for the other one.
    df['default'] = df['default'] - 1

    # run the MOB algorithm to discretize the variable 'Durationinmonth'.
    MOB_ALGO = MOB(data = df, var = 'Durationinmonth', response = 'default', exclude_value = None)
    # A must-do step is to set the binning constraints.
    MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3, 
                                    max_samples = 0.4, min_samples = 0.05, 
                                    min_bads = 0.05, 
                                    init_pvalue = 0.4, 
                                    maximize_bins=True)
    # execute the MOB algorithm.
    SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size') # Run under the bins size base.

    StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats') # Run under the statistical base. 
    

The runMOB method will return a pandas.DataFrame which shows the binning result of the variable and also the WoE summary information for each bin.

Image

And after we receive the binning result dataframe, we can plot it by using MOBPY.plot.MOB_PLOT.plotBinsSummary to visualize the binning summary result.

from MOBPY.plot.MOB_PLOT import MOB_PLOT

# plot the bin summary data.
print('Bins Size Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')

print('Statisitcal Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')

Image

Normally, the result of Stats (statistical base) and Size (bins size base) will be identical, but when the data appears to be quite extreme in the binning process, the Size method will prefer to make the population of each bin between the maximum and minimum limitation, while the Stats method will remain to conduct the algorithm through a stricter logic based on the testing hypothesis results.

For example,

# run the MOB algorithm to discretize the variable 'Creditamount'.
MOB_ALGO = MOB(data = df, var = 'Creditamount', response = 'default', exclude_value = None) 
# Set Binning Constraints (Must-Do!)
MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3, 
                                max_samples = 0.4, min_samples = 0.05, 
                                min_bads = 0.05, 
                                init_pvalue = 0.4, 
                                maximize_bins=True)
# mergeMethod = 'Size' means to run MOB algorithm under bins size base
SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size')
StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats')

# plot the bin summary data.
print('Bins Size Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')
print('Statisitcal Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')
SizeBinning StatsBinning
mergeMethod = 'Size' (bins size base) mergeMethod = 'Stats' (statistical base)
Image 1 Image 2

The left side image is the result generated by mergeMethod = 'Size' (bins size base), and the right side is the result generated by mergeMethod = 'Stats' (statistical base).We can see that the Size method merge the bins that do not meet the minimum sample population and maintain the bins number in order to prevent from exceeding the minimum bins limitation.

Environment

OS : macOS Ventura

IDE: Visual Studio Code 1.79.2 (Universal)

Language : Python 3.9.7 
    - pandas 1.3.4
    - numpy 1.20.3
    - scipy 1.7.1
    - matplotlib 3.7.1

Citation

Reference

Authors

  1. Chen, Ta-Hung (Denny)
  2. Tsai, Yu-Cheng (Darren)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MOBPY-1.0.0.tar.gz (14.6 kB view hashes)

Uploaded Source

Built Distribution

MOBPY-1.0.0-py3-none-any.whl (14.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page