Skip to main content

Monotonic Optimal Binning for Loss Models

Project description

Introduction

To mimic the py_mob package (https://pypi.org/project/py-mob) for binary outcomes, the loss_mob is a collection of python functions that would generate the monotonic binning and perform the variable transformation for loss or severity such that the Spearman correlation between the transformed $X$, i.e. $F(X_i)$, and $E(Y_i | X_i)$ is equal to 1. In case of loss models with $Ln()$ link function, the transformation is derived as $F(x)_i = Ln \frac{\sum_i Y / \sum_i Exposure}{\sum Y / \sum Exposure}$ in the training sample, where $Exposure$ is the number of cases and $i$ refers to the $ith$ bin groupped by $x$ values.

The py_mob package has also been presented in the 2023 CAS (Casualty Actuarial Society) Annual Meeting.

Should you have any question or suggestion about the package, please feel free to drop me a line.

Core Functions

loss_mob
  |-- qtl_bin()  : Iterative discretization based on quantiles of X.  
  |-- los_bin()  : Revised iterative discretization for records with Y > 0.
  |-- iso_bin()  : Discretization driven by the isotonic regression. 
  |-- val_bin()  : Revised iterative discretization based on unique values of X.  
  |-- rng_bin()  : Revised iterative discretization based on the equal-width range of X.  
  |-- gbm_bin()  : Discretization based on the gradient boosting machine (GBM).  
  |-- cus_bin()  : Customized discretization based on pre-determined cut points.  
  |-- view_bin() : Displays the binning outcome in a tabular form. 
  |-- cal_newx() : Applies the variable transformation to a numeric vector based on the binning outcome.
  |-- chk_newx() : Verifies the transformation generated from the cal_newx() function.
  |-- mi_score() : Calculates the Mutual Information (MI) score between X and Y.
  |-- screen()   : Calculates Spearman and Distance Correlations between X and Y.
  |-- bin_gini() : Calculates the gini-coefficient between X and Y based on the binning outcome.
  |-- smape()    : Calculates the sMAPE value between Y and Yhat.
  `-- get_mtpl() : Extracts French Motor Third-Part Liability Claims dataset from OpenML.

Example

import loss_mob as mob

# LOAD THE DATASET
data = mob.get_mtpl()

data.keys()
# dict_keys(['idpol', 'claimnb', 'exposure', 'area', 'vehpower', 'vehage', 'drivage', 
# 'bonusmalus', 'vehbrand', 'vehgas', 'density', 'region', 'claimamount', 'purepremium'])

var = ['vehpower', 'vehage', 'drivage', 'bonusmalus', 'density']

# SCREEN EACH VARIABLE OF INTEREST
rst = [{"variable": _, **mob.screen(data[_], data["purepremium"])} for _ in var]

# RANK VARIABLES BY DISTANCE CORRELATION
for _ in sorted(rst, key = lambda x: -abs(x["distance correlation"])):
  print(_)

# {'variable': 'bonusmalus', 'total records': 678013, 'nonmissing records': 678013, 'missing percent': 0.0, 'unique value count': 115, 'coefficient of variation': 0.26165082, 'spearman correlation': 0.05716908, 'distance correlation': 0.0434537}
# {'variable': 'drivage', 'total records': 678013, 'nonmissing records': 678013, 'missing percent': 0.0, 'unique value count': 83, 'coefficient of variation': 0.31071883, 'spearman correlation': -0.004906, 'distance correlation': 0.01428907}
# {'variable': 'density', 'total records': 678013, 'nonmissing records': 678013, 'missing percent': 0.0, 'unique value count': 1607, 'coefficient of variation': 2.20854394, 'spearman correlation': 0.02022122, 'distance correlation': 0.01106909}
# {'variable': 'vehage', 'total records': 678013, 'nonmissing records': 678013, 'missing percent': 0.0, 'unique value count': 78, 'coefficient of variation': 0.80437458, 'spearman correlation': 0.01952645, 'distance correlation': 0.01080137}
# {'variable': 'vehpower', 'total records': 678013, 'nonmissing records': 678013, 'missing percent': 0.0, 'unique value count': 12, 'coefficient of variation': 0.31774149, 'spearman correlation': 0.00230745, 'distance correlation': 0.00356986}

# GENERATE BINNING BASED ON GBM FOR EACH VARIABLE
bout = dict((v, mob.gbm_bin(data[v], data["purepremium"])) for v in var)
mob.view_bin(bout["vehage"])

# |  bin  |   freq |   miss |           ysum |     yavg |        newx |         rule              |
# |-------|--------|--------|----------------|----------|-------------|---------------------------|
# |   1   | 356354 |      0 | 114686591.4672 | 321.8333 | -0.17468183 | $X$ <= 6                  |
# |   2   | 194371 |      0 |  69559830.5303 | 357.8714 | -0.06854178 | $X$ > 6 and $X$ <= 12     |
# |   3   | 127288 |      0 |  75609359.3214 | 594.0023 |  0.43816751 | $X$ > 12                  |

# VARIABLE TRANSFORMATION
dout = mob.cal_newx(data['vehage'], bout["vehage"])
mob.head(dout)

# {'x': 1, 'bin': 1, 'newx': -0.17468183}
# {'x': 5, 'bin': 1, 'newx': -0.17468183}
# {'x': 0, 'bin': 1, 'newx': -0.17468183}

# VALIDATE THE TRANSFORMATION
mob.chk_newx(dout)

# |  bin  |        newx |   freq |    dist    |         xrng              |
# |-------|-------------|--------|------------|---------------------------|
# |   1   | -0.17468183 | 356354 |   52.5586% |                  0 <==> 6 |
# |   2   | -0.06854178 | 194371 |   28.6677% |                 7 <==> 12 |
# |   3   |  0.43816751 | 127288 |   18.7737% |               13 <==> 100 |

Authors

WenSui Liu is a seasoned data scientist with 15-year experience in the financial service industry.

Joyce Liu is a college student majoring in Mathematics with a strong passion for data science.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loss_mob-0.1.17.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

loss_mob-0.1.17-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file loss_mob-0.1.17.tar.gz.

File metadata

  • Download URL: loss_mob-0.1.17.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for loss_mob-0.1.17.tar.gz
Algorithm Hash digest
SHA256 8da360457bdeac64de4f61038d98bc694658fccc774e0b68ce06eb3e992e5373
MD5 df0f344e5072982de9cb2e905c3b0b9c
BLAKE2b-256 fc65e3ae8599fbb31fec4232b4bbc699ecfc257addd2e97938d69bb518b085a6

See more details on using hashes here.

File details

Details for the file loss_mob-0.1.17-py3-none-any.whl.

File metadata

  • Download URL: loss_mob-0.1.17-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for loss_mob-0.1.17-py3-none-any.whl
Algorithm Hash digest
SHA256 bf0a34f6605b03f47b7eaeba3e11a4d45a25df10d08806d7607e8a28dd2cffaa
MD5 6a6be00d2b8b98997e2be4fcd18dfb8c
BLAKE2b-256 27dd3ed4e60ab509704c1d005092073ccd5e6bbd2c6de93ad7b7ab208788e5af

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page