Skip to main content

MOB is a statistical approach to transform continuous variables into optimal and monotonic categorical variables.

Project description

Monotonic-Optimal-Binning

Python implementation (MOBPY)

MOB is a statistical approach to transform continuous variables into optimal and monotonic categorical variables. In this project, we have expanded the application to allow the users to merge the bins based on statistics or bin size. This is a Python-based project that enables the users to achieve monotone optimal binning results aligned with their expectations.

Installation

python3 -m pip install MOBPY

Usage

Example:

import pandas as pd
from MOBPY.MOB import MOB


if __name__ == '__main__' :
    # import the testing datasets
    df = pd.read_csv('/data/german_data_credit_cat.csv')
    
    # Original values in the column are [1,2], make it into 1 representing the positive term, and 0 for the other one.
    df['default'] = df['default'] - 1

    # run the MOB algorithm to discretize the variable 'Durationinmonth'.
    MOB_ALGO = MOB(data = df, var = 'Durationinmonth', response = 'default', exclude_value = None)
    # A must-do step is to set the binning constraints.
    MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3, 
                                    max_samples = 0.4, min_samples = 0.05, 
                                    min_bads = 0.05, 
                                    init_pvalue = 0.4, 
                                    maximize_bins=True)
    # execute the MOB algorithm.
    SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size') # Run under the bins size base.

    StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats') # Run under the statistical base. 
    

The runMOB method will return a pandas.DataFrame which shows the binning result of the variable and also the WoE summary information for each bin.

Image

And after we receive the binning result dataframe, we can plot it by using MOBPY.plot.MOB_PLOT.plotBinsSummary to visualize the binning summary result.

from MOBPY.plot.MOB_PLOT import MOB_PLOT

# plot the bin summary data.
print('Bins Size Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')

print('Statisitcal Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')

Image

Highlighted Features

User Preferences:

The MOB algorithm offers two user preference settings (mergeMethod argument):

  1. Size: This setting allows you to optimize the sample size of each bin within specified maximum and minimum limits while ensuring that the minimum number of bins constraint is maintained.

  2. Stats: With this setting, the algorithm applies a stricter approach based on hypothesis testing results.

Typically, the 'Stats' (statistical-based) and 'Size' (bin size-based) methods yield identical results. However, when dealing with data under certain scenarios where the 'Size' method, employed by MOB, tends to prioritize maintaining the population of each bin within the maximum and minimum limits. In contrast, the 'Stats' method adheres to a more rigorous logic based on the results of hypothesis testing.

For example,

# run the MOB algorithm to discretize the variable 'Creditamount'.
MOB_ALGO = MOB(data = df, var = 'Creditamount', response = 'default', exclude_value = None) 
# Set Binning Constraints (Must-Do!)
MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3, 
                                max_samples = 0.4, min_samples = 0.05, 
                                min_bads = 0.05, 
                                init_pvalue = 0.4, 
                                maximize_bins=True)
# mergeMethod = 'Size' means to run MOB algorithm under bins size base
SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size')
StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats')

# plot the bin summary data.
print('Bins Size Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')
print('Statisitcal Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')
SizeBinning StatsBinning
runMOB(mergeMethod='Size') (bins size base) runMOB(mergeMethod='Stats') (statistical base)

The left side image is the result generated by mergeMethod = 'Size' (bin size-based), and the right side is the result generated by mergeMethod = 'Stats' (statistical-based). We can see that the 'Size' method is designed to merge bins that fail to meet the minimum sample population requirement. This approach ensures that the number of bins remains within the specified limit, preventing it from exceeding the minimum bin limitation. By merging bins that fall short of the population threshold, the 'Size' method effectively maintains a balanced distribution of data across the bins..

Full Documentation

Full API Reference

Environment

OS : macOS Ventura

IDE: Visual Studio Code 1.79.2 (Universal)

Language : Python 3.9.7 
    - pandas 1.3.4
    - numpy 1.20.3
    - scipy 1.7.1
    - matplotlib 3.7.1

Reference

Authors

  1. Chen, Ta-Hung (Denny)
  2. Tsai, Yu-Cheng (Darren)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MOBPY-1.0.1.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

MOBPY-1.0.1-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file MOBPY-1.0.1.tar.gz.

File metadata

  • Download URL: MOBPY-1.0.1.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for MOBPY-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0091f2684976dcb44fb278ce0caca1cad9f4a1ba6500761f2c4a77df01096ab4
MD5 1e3bbf7fcf89f3269aad07bff34de2a9
BLAKE2b-256 3aee710bbf5253c7f01eac867a38e3e5f17db84c4278659c353f476c21ce1a62

See more details on using hashes here.

File details

Details for the file MOBPY-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: MOBPY-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for MOBPY-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5971908011fa7110415e247298303334e9d8b2a664522d12dcae69873e4c1b67
MD5 c18d19dfb9fd1a445c8cee90da2925aa
BLAKE2b-256 b3f1b1b6c09eddcf79b9b4a226d21d5b1f2f173f9cb2fe1985ce6481a552ddd1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page