Skip to main content

Monotonic Variable Binning by WOE

Project description

Monotonic-WOE-Binning-Algorithm

This algorithm is based on the excellent paper by Mironchyk and Tchistiakov (2017) named "Monotone optimal binning algorithm for credit risk modeling".

How to use

  1. pip install monotonic_binning: pip install -i https://test.pypi.org/simple/simple/ monotonic-binning
  2. Import monotonic_woe_binning: from monotonic_binning import monotonic_woe_binning as bin
  3. Use fit and transform to bin variables for train and test datasets respectively

Demo Run Details

The demo_run.py file available under tests/ uses German credit card data from Penn State's online course and gives an overview of how to use the package.

Summary of Monotonic WOE

The weight-of-evidence (WOE) method of evaluating strength of predictors is an understated one in the field of analytics. While it is standard fare in credit risk modelling, it is under-utilized in other settings though its formulation makes it generic enough for use in other domains too. The WOE method primarily aims to bin variables into buckets that deliver the most information to a potential classification model. Quite often, WOE binning methods measure effectiveness of such bins using Information Value or IV. For a more detailed introduction to WOE and IV, this article is a useful read.

In the world of credit risk modelling, regulatory oversight often requires that the variables that go into models are split into bins

  • whose weight of evidence (WOE) values maintain a monotonic relationship with the 1/0 variable (loan default or not default for example.)
  • are reasonably sized and large enough to be respresentative of population segments, and
  • maximize the IV value of the given variable in the process of this binning.

To exemplify the constraints such a problem, consider a simple dataset containing age and a default indicator (1 if defaulted, 0 if not). The following is a possible scenario in which the variable is binned into three groups in such a manner that their WOE values decrease monotomically as the ages of customers increase.

The WOE is derived in such a manner that as the WOE value increases, the default rate decreases. So we can infer that younger customers are more likely to default in comparison to older customers.

Arriving at the perfect bin cutoffs to meet all three requirements discussed earlier is a non-trivial exercise. Most statistical software provide this type of optimal discretization of interval variables. R's smbinning package and SAS' proc transreg are two such examples. To my knowledge, Python's solutions to this problem are fairly sparse.

This package is an attempt to complement already exhaustive packages like scorecardpy with the capability to bin variables with monotonic WOE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

monotonic_binning-0.0.1.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

monotonic_binning-0.0.1-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file monotonic_binning-0.0.1.tar.gz.

File metadata

  • Download URL: monotonic_binning-0.0.1.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for monotonic_binning-0.0.1.tar.gz
Algorithm Hash digest
SHA256 65f79d807b7ec0f37be80ee64334baaaed35ef44ac17bd0688c9f3e737bf7d09
MD5 642dda41210516811bfb089d36cadae3
BLAKE2b-256 c83faf5dfe5546d0be72a528f3f8a174a9b21a88f6fd701f82de84c14ecc7928

See more details on using hashes here.

Provenance

File details

Details for the file monotonic_binning-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: monotonic_binning-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for monotonic_binning-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 84599df1431c563a6f737d44d89821f09ea52045ad1492995e46cfb934f5be7c
MD5 bc5643248a63388e86af32c1deb68c29
BLAKE2b-256 a69e4fc8a79c1555b52de9e6f7b297c3841a9e06d164181e7108df00084422fd

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page