MOB is a statistical approach to transform continuous variables into optimal and monotonic categorical variables.
Project description
Monotonic-Optimal-Binning
MOB is a statistical approach to transform continuous variables into optimal and monotonic categorical variables. In this proejct, we extend the application so that the user can choose whether merge the bins under a statistics
base or a bins size
base to obtain the optimal result based on the users' expectation.
Installation
python3 -m pip install MOBPY
Usage
Example :
import pandas as pd
from MOBPY.MOB import MOB
if __name__ == '__main__' :
# import the testing datasets
df = pd.read_csv('/data/german_data_credit_cat.csv')
# Original values in the column are [1,2], make it into 1 representing the positive term, and 0 for the other one.
df['default'] = df['default'] - 1
# run the MOB algorithm to discretize the variable 'Durationinmonth'.
MOB_ALGO = MOB(data = df, var = 'Durationinmonth', response = 'default', exclude_value = None)
# A must-do step is to set the binning constraints.
MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3,
max_samples = 0.4, min_samples = 0.05,
min_bads = 0.05,
init_pvalue = 0.4,
maximize_bins=True)
# execute the MOB algorithm.
SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size') # Run under the bins size base.
StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats') # Run under the statistical base.
The runMOB
method will return a pandas.DataFrame
which shows the binning result of the variable and also the WoE summary information for each bin.
And after we receive the binning result dataframe, we can plot it by using MOBPY.plot.MOB_PLOT.plotBinsSummary
to visualize the binning summary result.
from MOBPY.plot.MOB_PLOT import MOB_PLOT
# plot the bin summary data.
print('Bins Size Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')
print('Statisitcal Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')
Normally, the result of Stats
(statistical base) and Size
(bins size base) will be identical, but when the data appears to be quite extreme in the binning process, the Size
method will prefer to make the population of each bin between the maximum and minimum limitation, while the Stats
method will remain to conduct the algorithm through a stricter logic based on the testing hypothesis results.
For example,
# run the MOB algorithm to discretize the variable 'Creditamount'.
MOB_ALGO = MOB(data = df, var = 'Creditamount', response = 'default', exclude_value = None)
# Set Binning Constraints (Must-Do!)
MOB_ALGO.setBinningConstraints( max_bins = 6, min_bins = 3,
max_samples = 0.4, min_samples = 0.05,
min_bads = 0.05,
init_pvalue = 0.4,
maximize_bins=True)
# mergeMethod = 'Size' means to run MOB algorithm under bins size base
SizeBinning = MOB_ALGO.runMOB(mergeMethod='Size')
StatsBinning = MOB_ALGO.runMOB(mergeMethod='Stats')
# plot the bin summary data.
print('Bins Size Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = SizeBinning, var_name = 'Durationinmonth')
print('Statisitcal Base')
MOB_PLOT.plotBinsSummary(monoOptBinTable = StatsBinning, var_name = 'Durationinmonth')
SizeBinning | StatsBinning |
---|---|
mergeMethod = 'Size' (bins size base) | mergeMethod = 'Stats' (statistical base) |
The left side image is the result generated by mergeMethod = 'Size'
(bins size base), and the right side is the result generated by mergeMethod = 'Stats'
(statistical base).We can see that the Size method merge the bins that do not meet the minimum sample population and maintain the bins number in order to prevent from exceeding the minimum bins limitation.
Environment
OS : macOS Ventura
IDE: Visual Studio Code 1.79.2 (Universal)
Language : Python 3.9.7
- pandas 1.3.4
- numpy 1.20.3
- scipy 1.7.1
- matplotlib 3.7.1
Citation
Reference
-
Testing Dataset : German Credit Risk from Kaggle
-
GitHub Project : Monotone Optimal Binning (SAS 9.4 version)
Authors
- Chen, Ta-Hung (Denny)
- LinkedIn Profile : https://www.linkedin.com/in/dennychen-tahung/
- E-Mail : denny20700@gmail.com
- Tsai, Yu-Cheng (Darren)
- LindedIn Profile : https://www.linkedin.com/in/darren-yucheng-tsai/
- E-Mail :
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file MOBPY-1.0.0.tar.gz
.
File metadata
- Download URL: MOBPY-1.0.0.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bdbb7fe8cc8acf0dffd3169d4e0b112e9f1b775109dbd5ca2eb1912ccd8ea02 |
|
MD5 | 941ab1ca273830d8d43e14919a70825c |
|
BLAKE2b-256 | 7d72c1194f6d2d46bba0b67551ea3b1312ed9b562518d4e1c0b4e056d40b9aa2 |
File details
Details for the file MOBPY-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: MOBPY-1.0.0-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d0e0fce49bd59bb8cf4dc72c1b25ac2d8119c37879aa35943b1ec38f68bd13f |
|
MD5 | 98ff14123f9c1a4b9d3d44d730dffb7a |
|
BLAKE2b-256 | 5e952aa0316bf7a4ad4d4576792e394ab800831203183d8ad3a4d99751699bdb |