Skip to main content

Python Implementation of Monotonic Optimal Binning

Project description

Introduction

As an attempt to mimic the mob R package (https://CRAN.R-project.org/package=mob), the py_mob is a collection of python functions that would generate the monotonic binning and perform the WoE (Weight of Evidence) transformation used in consumer credit scorecard developments.

Being a piecewise constant transformation in the context of logistic regressions, the WoE has also been employed in other use cases, such as consumer credit loss estimation, prepayment, and even fraud detection models. In addition to monotonic binning and WoE transformation, Information Value and KS statistic of each independent variables is also calculated to evaluate the variable predictiveness.

Different from other python packages for the same purpose, the py_mob package is very lightweight and the underlying computation is driven by the built-in python list or the numpy array. Functions would return lists of dictionaries, which can be easily converted to other data structures, such as pandas.DataFrame or astropy.table.

Currently, six different monotonic binning algorithms are implemented, namely qtl_bin(), bad_bin(), iso_bin(), rng_bin(), kmn_bin(), and gbm_bin(). For details, please see https://github.com/statcompute/py_mob.

Installation

pip3 install py_mob

Core Functions

py_mob
  |-- qtl_bin()  : An iterative discretization based on quantiles of X.  
  |-- bad_bin()  : A revised iterative discretization for records with Y = 1.
  |-- iso_bin()  : A discretization algorthm driven by the isotonic regression between X and Y. 
  |-- rng_bin()  : A revised iterative discretization based on the equal-width range of X.  
  |-- kmn_bin()  : A discretization algorthm based on the kmean clustering of X.  
  |-- gbm_bin()  : A discretization algorthm based on the gradient boosting machine.  
  |-- summ_bin() : Generates the statistical summary for the binning outcome. 
  |-- view_bin() : Displays the binning outcome in a tabular form. 
  `-- cal_woe()  : Applies the WoE transformation to a numeric vector based on the binning outcome.

Example

import sas7bdat, py_mob

df = sas7bdat.SAS7BDAT("accepts.sas7bdat").to_data_frame()

utl = df.rev_util.to_numpy()

bad = df.bad.to_numpy()

utl_bin = py_mob.qtl_bin(utl, bad)

for key in utl_bin:
  print(key + ":")
  for lst in utl_bin[key]:
    print(lst)
#cut:
#30.0
#tbl:
#{'bin': 1, 'freq': 2962, 'miss': 0, 'bads': 467.0, 'rate': 0.1577, 'woe': -0.3198, 'iv': 0.047, 
# 'rule': '$X$ <= 30.0'}
#{'bin': 2, 'freq': 2875, 'miss': 0, 'bads': 729.0, 'rate': 0.2536, 'woe': 0.2763, 'iv': 0.0406, 
# 'rule': '$X$ > 30.0'}

py_mob.view_bin(utl_bin)
#|   bin |   freq |   miss |   bads |   rate |     woe |     iv | rule        |
#|-------|--------|--------|--------|--------|---------|--------|-------------|
#|     1 |   2962 |      0 |    467 | 0.1577 | -0.3198 | 0.047  | $X$ <= 30.0 |
#|     2 |   2875 |      0 |    729 | 0.2536 |  0.2763 | 0.0406 | $X$ > 30.0  |

py_mob.summ_bin(utl_bin)
#{'bad rate': 0.2049, 'iv': 0.0876, 'ks': 14.71}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_mob-0.2.3.tar.gz (8.8 kB view hashes)

Uploaded Source

Built Distribution

py_mob-0.2.3-py3-none-any.whl (8.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page