scorecard,logistical regression
Project description
LAPRAS
Lapras is designed to make the model developing job easily and conveniently. It contains these functions below in one key operation: data exploratory analysis, feature selection, feature binning, data visualization, scorecard modeling(a logistic regression model with excellent interpretability), performance measure.
Let's get started.
Usage
1.Exploratory Data Analysis lapras.detect() lapras.eda() lapras.quality() lapras.IV() lapras.VIF() lapras.PSI()
2.Feature Selection lapras.select() lapras.stepwise()
3.Binning lapras.Combiner() lapras.WOETransformer() lapras.bin_stats() lapras.bin_plot()
4.Modeling lapras.ScoreCard()
5.Performance Measure lapras.perform() lapras.LIFT() lapras.score_plot() lapras.KS_bucket() lapras.PPSI() lapras.KS() lapras.AUC()
6.One Key Auto Modeling Lapras also provides a function which runs all the steps above automatically: lapras.auto_model()
Install
via pip
pip install lapras --upgrade -i https://pypi.org/simple
via source code
python setup.py install
install_requires = [ 'numpy >= 1.18.4', 'pandas >= 0.25.1', 'scipy >= 1.3.2', 'scikit-learn =0.22.2', 'seaborn >= 0.10.1', 'statsmodels >= 0.13.1', 'tensorflow >= 2.2.0', 'hyperopt >= 0.2.7', 'pickle >= 4.0', 'plotly >= 5.9.0', ]
Documents
import lapras
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib as mpl
import matplotlib.pyplot as plt
pd.options.display.max_colwidth = 100
import math
%matplotlib inline
# Read in data file
df = pd.read_csv('data/demo.csv',encoding="utf-8")
to_drop = ['id'] # exclude the features which not being used, eg:id
target = 'bad' # Y label name
train_df, test_df, _, _ = train_test_split(df, df[[target]], test_size=0.3, random_state=42) # to divide the training set and testing set, strongly recommended
# EDA(Exploratory Data Analysis)
# Parameter details:
# dataframe=None
lapras.detect(train_df).sort_values("missing")
type | size | missing | unique | mean_or_top1 | std_or_top2 | min_or_top3 | 1%_or_top4 | 10%_or_top5 | 50%_or_bottom5 | 75%_or_bottom4 | 90%_or_bottom3 | 99%_or_bottom2 | max_or_bottom1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | int64 | 5502 | 0.0000 | 5502 | 3947.266630 | 2252.395671 | 2.0 | 87.03 | 820.1 | 3931.5 | 5889.25 | 7077.8 | 7782.99 | 7861.0 |
bad | int64 | 5502 | 0.0000 | 2 | 0.073246 | 0.260564 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 1.00 | 1.0 |
score | int64 | 5502 | 0.0000 | 265 | 295.280625 | 66.243181 | 0.0 | 0.00 | 223.0 | 303.0 | 336.00 | 366.0 | 416.00 | 461.0 |
age | float64 | 5502 | 0.0002 | 34 | 27.659880 | 4.770299 | 19.0 | 21.00 | 23.0 | 27.0 | 30.00 | 34.0 | 43.00 | 53.0 |
wealth | float64 | 5502 | 0.0244 | 18 | 4.529806 | 1.823149 | 1.0 | 1.00 | 3.0 | 4.0 | 5.00 | 7.0 | 10.00 | 22.0 |
education | float64 | 5502 | 0.1427 | 5 | 3.319483 | 1.005660 | 1.0 | 1.00 | 2.0 | 4.0 | 4.00 | 4.0 | 5.00 | 5.0 |
period | float64 | 5502 | 0.1714 | 5 | 7.246326 | 1.982060 | 4.0 | 4.00 | 6.0 | 6.0 | 10.00 | 10.0 | 10.00 | 14.0 |
max_unpay_day | float64 | 5502 | 0.9253 | 11 | 185.476886 | 22.339647 | 28.0 | 86.00 | 171.0 | 188.0 | 201.00 | 208.0 | 208.00 | 208.0 |
# Exploratory Data Analysis
# feature_list = ['age', 'education', 'score']
# exclude_list = ['id']
# bins_map = {'age':[-1,20,30,99]}
# data_type_map = {'education':'discrete'}
# labels_map = {'education':{'1.0':'111','2.0':'222','3.0':'333','4.0':'444','5.0':'555'}}
# lapras.eda(df,feature_list=feature_list , exclude_list = exclude_list, bins_map=bins_map,
# labels_map=labels_map, data_type_map=data_type_map, max_bins=6)
lapras.eda(df)
# Calculate IV value of features(Calculate by default decision tree binning)
# Parameter details:
# dataframe=None original data
# target = 'target' Y label name
lapras.quality(train_df.drop(to_drop,axis=1),target = target)
iv | unique | |
---|---|---|
score | 0.758342 | 265.0 |
age | 0.504588 | 35.0 |
wealth | 0.275775 | 19.0 |
education | 0.230553 | 6.0 |
max_unpay_day | 0.170061 | 12.0 |
period | 0.073716 | 6.0 |
# Calculate PSI betweet features
# Parameter details:
# actual=None actual feature
# predict=None prediction feature
# bins=10 count of binning
# return_frame=False return the dataframe of binning if set to true
cols = list(lapras.quality(train_df,target = target).reset_index()['index'])
for col in cols:
if col not in [target]:
print("%s: %.4f" % (col,lapras.PSI(train_df[col], test_df[col])))
score: 0.1500
age: 0.0147
wealth: 0.0070
education: 0.0010
max_unpay_day: 0.0042
id: 0.0000
period: 0.0030
# Calculate VIF
# Parameter details:
# dataframe=None
lapras.VIF(train_df.drop(['id','bad'],axis=1))
wealth 1.124927
max_unpay_day 2.205619
score 18.266471
age 17.724547
period 1.193605
education 1.090158
dtype: float64
# Calculate IV value
# Parameter details:
# feature=None feature data
# target=None Y label data
lapras.IV(train_df['age'],train_df[target])
0.5045879202656338
# Features filtering
# Parameter details:
# frame=None original data
# target=None Y label name
# empty=0.9 empty feature filtering, feature will be removed if data missing ratio greater than the threshold
# iv=0.02 IV value filtering, feature will be removed if IV value lesser than the threshold
# corr=0.7 correlation filtering, feature will be removed if correlation value greater than the threshold
# vif=False multicollinearity filtering, feature will be removed if multicollinearity value greater than the threshold, default False due to a large number of calculations
# return_drop=False reture the removed features if set to true
# exclude=None features will be remained if set into this parameter
train_selected, dropped = lapras.select(train_df.drop(to_drop,axis=1),target = target, empty = 0.95, \
iv = 0.05, corr = 0.9, vif = False, return_drop=True, exclude=[])
print(dropped)
print(train_selected.shape)
train_selected
{'empty': array([], dtype=float64), 'iv': array([], dtype=object), 'corr': array([], dtype=object)}
(5502, 7)
bad | wealth | max_unpay_day | score | age | period | education | |
---|---|---|---|---|---|---|---|
4168 | 0 | 4.0 | NaN | 288 | 23.0 | 6.0 | 4.0 |
605 | 0 | 4.0 | NaN | 216 | 32.0 | 6.0 | 4.0 |
3018 | 0 | 5.0 | NaN | 250 | 23.0 | 6.0 | 2.0 |
4586 | 0 | 7.0 | 171.0 | 413 | 31.0 | NaN | 2.0 |
1468 | 0 | 5.0 | NaN | 204 | 29.0 | 6.0 | 2.0 |
... | ... | ... | ... | ... | ... | ... | ... |
5226 | 0 | 4.0 | 171.0 | 346 | 23.0 | NaN | 3.0 |
5390 | 0 | 5.0 | NaN | 207 | 32.0 | NaN | 3.0 |
860 | 0 | 6.0 | NaN | 356 | 42.0 | 4.0 | 3.0 |
7603 | 0 | 3.0 | NaN | 323 | 34.0 | NaN | 3.0 |
7270 | 0 | 4.0 | NaN | 378 | 24.0 | 10.0 | 4.0 |
5502 rows × 7 columns
# Feature Binning, following methods are supported: monotonous binning, decision tree binning, Kmeans binning, equal frequency binning, equal step size binning
# Parameter details:
# X=None original data
# y=None Y label name
# method='dt' Binning method:'dt':decision tree binning(default),'mono':monotonous binning,'kmeans':Kmeans binning,'quantile':equal frequency binning,'step':equal step size binning
# min_samples=1 the least sample numbers in each binning, represent the count of numbers when greater than 1, represent the ratio of total count when between 0 and 1
# n_bins=10 maximun binning count
# c.load(dict) adjust the binning by loading a customized dict
# c.export() export current binning information by dict format
c = lapras.Combiner()
c.fit(train_selected, y = target,method = 'mono', min_samples = 0.05,n_bins=8) #empty_separate = False
# # c.load({'age': [22.5, 23.5, 24.5, 25.5, 28.5,36.5],
# # 'education': [ 3.5],
# # 'max_unpay_day': [59.5],
# # 'period': [5.0, 9.0],
# # 'score': [205.5, 236.5, 265.5, 275.5, 294.5, 329.5, 381.5],
# # 'wealth': [2.5, 3.5, 6.5]})
c.export()
{'age': [23.0, 24.0, 25.0, 26.0, 28.0, 29.0, 37.0],
'education': [3.0, 4.0],
'max_unpay_day': [171.0],
'period': [6.0, 10.0],
'score': [237.0, 272.0, 288.0, 296.0, 330.0, 354.0, 384.0],
'wealth': [3.0, 4.0, 5.0, 7.0]}
# To transform the original data into binning data
# Parameter details:
# X=None original data
# labels=False binning label will be shown when set to true
c.transform(train_selected, labels=True).iloc[0:10,:]
bad | wealth | max_unpay_day | score | age | period | education | |
---|---|---|---|---|---|---|---|
4168 | 0 | 02.[4.0,5.0) | 00.[-inf,171.0) | 03.[288.0,296.0) | 01.[23.0,24.0) | 01.[6.0,10.0) | 02.[4.0,inf) |
605 | 0 | 02.[4.0,5.0) | 00.[-inf,171.0) | 00.[-inf,237.0) | 06.[29.0,37.0) | 01.[6.0,10.0) | 02.[4.0,inf) |
3018 | 0 | 03.[5.0,7.0) | 00.[-inf,171.0) | 01.[237.0,272.0) | 01.[23.0,24.0) | 01.[6.0,10.0) | 00.[-inf,3.0) |
4586 | 0 | 04.[7.0,inf) | 01.[171.0,inf) | 07.[384.0,inf) | 06.[29.0,37.0) | 00.[-inf,6.0) | 00.[-inf,3.0) |
1468 | 0 | 03.[5.0,7.0) | 00.[-inf,171.0) | 00.[-inf,237.0) | 06.[29.0,37.0) | 01.[6.0,10.0) | 00.[-inf,3.0) |
6251 | 0 | 03.[5.0,7.0) | 00.[-inf,171.0) | 01.[237.0,272.0) | 01.[23.0,24.0) | 02.[10.0,inf) | 00.[-inf,3.0) |
3686 | 0 | 00.[-inf,3.0) | 00.[-inf,171.0) | 00.[-inf,237.0) | 01.[23.0,24.0) | 01.[6.0,10.0) | 00.[-inf,3.0) |
3615 | 0 | 02.[4.0,5.0) | 00.[-inf,171.0) | 03.[288.0,296.0) | 06.[29.0,37.0) | 02.[10.0,inf) | 02.[4.0,inf) |
5338 | 0 | 00.[-inf,3.0) | 00.[-inf,171.0) | 04.[296.0,330.0) | 03.[25.0,26.0) | 02.[10.0,inf) | 00.[-inf,3.0) |
3985 | 0 | 03.[5.0,7.0) | 00.[-inf,171.0) | 01.[237.0,272.0) | 01.[23.0,24.0) | 01.[6.0,10.0) | 02.[4.0,inf) |
# To output bin_stats and bin_plot
# Parameter details:
# frame=None data transformed by Combiner, keeping binning labels
# col=None features to be outputed
# target='target' Y label name
# Note:The binning details may be different between traning set and testing set due to Population Stability.
cols = list(lapras.quality(train_selected,target = target).reset_index()['index'])
for col in cols:
if col != target:
print(lapras.bin_stats(c.transform(train_selected[[col, target]], labels=True), col=col, target=target))
lapras.bin_plot(c.transform(train_selected[[col,target]], labels=True), col=col, target=target)
score bad_count total_count bad_rate ratio woe \
0 00.[-inf,237.0) 136 805 0.168944 0.146310 0.944734
1 01.[237.0,272.0) 101 832 0.121394 0.151218 0.558570
2 02.[272.0,288.0) 46 533 0.086304 0.096874 0.178240
3 03.[288.0,296.0) 20 295 0.067797 0.053617 -0.083176
4 04.[296.0,330.0) 73 1385 0.052708 0.251727 -0.350985
5 05.[330.0,354.0) 18 812 0.022167 0.147583 -1.248849
6 06.[354.0,384.0) 8 561 0.014260 0.101963 -1.698053
7 07.[384.0,inf) 1 279 0.003584 0.050709 -3.089758
iv total_iv
0 0.194867 0.735116
1 0.059912 0.735116
2 0.003322 0.735116
3 0.000358 0.735116
4 0.026732 0.735116
5 0.138687 0.735116
6 0.150450 0.735116
7 0.160788 0.735116
age bad_count total_count bad_rate ratio woe \
0 00.[-inf,23.0) 90 497 0.181087 0.090331 1.028860
1 01.[23.0,24.0) 77 521 0.147793 0.094693 0.785844
2 02.[24.0,25.0) 57 602 0.094684 0.109415 0.280129
3 03.[25.0,26.0) 38 539 0.070501 0.097964 -0.041157
4 04.[26.0,28.0) 58 997 0.058175 0.181207 -0.246509
5 05.[28.0,29.0) 20 379 0.052770 0.068884 -0.349727
6 06.[29.0,37.0) 57 1657 0.034400 0.301163 -0.796844
7 07.[37.0,inf) 6 310 0.019355 0.056343 -1.387405
iv total_iv
0 0.147647 0.45579
1 0.081721 0.45579
2 0.009680 0.45579
3 0.000163 0.45579
4 0.009918 0.45579
5 0.007267 0.45579
6 0.137334 0.45579
7 0.062060 0.45579
wealth bad_count total_count bad_rate ratio woe \
0 00.[-inf,3.0) 106 593 0.178752 0.107779 1.013038
1 01.[3.0,4.0) 84 1067 0.078725 0.193929 0.078071
2 02.[4.0,5.0) 88 1475 0.059661 0.268084 -0.219698
3 03.[5.0,7.0) 99 1733 0.057126 0.314976 -0.265803
4 04.[7.0,inf) 26 634 0.041009 0.115231 -0.614215
iv total_iv
0 0.169702 0.236205
1 0.001222 0.236205
2 0.011787 0.236205
3 0.019881 0.236205
4 0.033612 0.236205
education bad_count total_count bad_rate ratio woe \
0 00.[-inf,3.0) 225 2123 0.105982 0.385860 0.405408
1 01.[3.0,4.0) 61 648 0.094136 0.117775 0.273712
2 02.[4.0,inf) 117 2731 0.042841 0.496365 -0.568600
iv total_iv
0 0.075439 0.211775
1 0.009920 0.211775
2 0.126415 0.211775
max_unpay_day bad_count total_count bad_rate ratio woe \
0 00.[-inf,171.0) 330 5098 0.064731 0.926572 -0.132726
1 01.[171.0,inf) 73 404 0.180693 0.073428 1.026204
iv total_iv
0 0.015426 0.134699
1 0.119272 0.134699
period bad_count total_count bad_rate ratio woe \
0 00.[-inf,6.0) 52 1158 0.044905 0.210469 -0.519398
1 01.[6.0,10.0) 218 2871 0.075932 0.521810 0.038912
2 02.[10.0,inf) 133 1473 0.090292 0.267721 0.227787
iv total_iv
0 0.045641 0.061758
1 0.000803 0.061758
2 0.015314 0.061758
# WOE value transformation
# transer.fit():
# X=None data transformed by Combiner
# y=None Y label
# exclude=None features exclude from transformation
# transer.transform():
# X=None
# transer.export():
# Note: Only training set need to be fit
transfer = lapras.WOETransformer()
transfer.fit(c.transform(train_selected), train_selected[target], exclude=[target])
train_woe = transfer.transform(c.transform(train_selected))
transfer.export()
{'age': {0: 1.0288596439961428,
1: 0.7858440185299318,
2: 0.2801286322797789,
3: -0.041156782250006324,
4: -0.24650930955337075,
5: -0.34972695582581514,
6: -0.7968444812848496,
7: -1.387405073069694},
'education': {0: 0.4054075821430197,
1: 0.27371220345368763,
2: -0.5685998002779383},
'max_unpay_day': {0: -0.13272639517618706, 1: 1.026204224879801},
'period': {0: -0.51939830439238,
1: 0.0389118677598222,
2: 0.22778739438526965},
'score': {0: 0.9447339847162963,
1: 0.5585702161999536,
2: 0.17824043251497793,
3: -0.08317566500410743,
4: -0.3509853692471706,
5: -1.2488485442424984,
6: -1.6980533007340262,
7: -3.089757954582164},
'wealth': {0: 1.01303813013795,
1: 0.0780708378046198,
2: -0.21969844672815222,
3: -0.2658032661768855,
4: -0.6142151848362123}}
# Features filtering could be done once more after transformed into WOE value. This is optional.
train_woe, dropped = lapras.select(train_woe,target = target, empty = 0.9, \
iv = 0.02, corr = 0.9, vif = False, return_drop=True, exclude=[])
print(dropped)
print(train_woe.shape)
train_woe.head(10)
{'empty': array([], dtype=float64), 'iv': array([], dtype=object), 'corr': array([], dtype=object)}
(5502, 7)
bad | wealth | max_unpay_day | score | age | period | education | |
---|---|---|---|---|---|---|---|
4168 | 0 | -0.219698 | -0.132726 | -0.083176 | 0.785844 | 0.038912 | -0.568600 |
605 | 0 | -0.219698 | -0.132726 | 0.944734 | -0.796844 | 0.038912 | -0.568600 |
3018 | 0 | -0.265803 | -0.132726 | 0.558570 | 0.785844 | 0.038912 | 0.405408 |
4586 | 0 | -0.614215 | 1.026204 | -3.089758 | -0.796844 | -0.519398 | 0.405408 |
1468 | 0 | -0.265803 | -0.132726 | 0.944734 | -0.796844 | 0.038912 | 0.405408 |
6251 | 0 | -0.265803 | -0.132726 | 0.558570 | 0.785844 | 0.227787 | 0.405408 |
3686 | 0 | 1.013038 | -0.132726 | 0.944734 | 0.785844 | 0.038912 | 0.405408 |
3615 | 0 | -0.219698 | -0.132726 | -0.083176 | -0.796844 | 0.227787 | -0.568600 |
5338 | 0 | 1.013038 | -0.132726 | -0.350985 | -0.041157 | 0.227787 | 0.405408 |
3985 | 0 | -0.265803 | -0.132726 | 0.558570 | 0.785844 | 0.038912 | -0.568600 |
# stepwise regression, to select best features, this is optional
# Parameter details:
# frame=None original data
# target='target' Y label name
# estimator='ols' model for regression, supporting 'ols', 'lr', 'lasso', 'ridge'
# direction='both' direction for stepwise, supporting 'forward', 'backward', 'both'
# criterion='aic' metric, supporting 'aic', 'bic', 'ks', 'auc'
# max_iter=None max iteration times
# return_drop=False return cols being removed if set to true
# exclude=None exclude features
final_data = lapras.stepwise(train_woe,target = target, estimator='ols', direction = 'both', criterion = 'aic', exclude = [])
final_data
bad | wealth | max_unpay_day | score | age | |
---|---|---|---|---|---|
4168 | 0 | -0.219698 | -0.132726 | -0.083176 | 0.785844 |
605 | 0 | -0.219698 | -0.132726 | 0.944734 | -0.796844 |
3018 | 0 | -0.265803 | -0.132726 | 0.558570 | 0.785844 |
4586 | 0 | -0.614215 | 1.026204 | -3.089758 | -0.796844 |
1468 | 0 | -0.265803 | -0.132726 | 0.944734 | -0.796844 |
... | ... | ... | ... | ... | ... |
5226 | 0 | -0.219698 | 1.026204 | -1.248849 | 0.785844 |
5390 | 0 | -0.265803 | -0.132726 | 0.944734 | -0.796844 |
860 | 0 | -0.265803 | -0.132726 | -1.698053 | -1.387405 |
7603 | 0 | 0.078071 | -0.132726 | -0.350985 | -0.796844 |
7270 | 0 | -0.219698 | -0.132726 | -1.698053 | 0.280129 |
5502 rows × 5 columns
# Scorecard modeling
# Parameter details:
# base_odds=1/60,base_score=600 When base_odds is 1/60, the corresponding base_score will be 600
# pdo=40,rate=2 If the base_odds decrease by half, the corresponding pdo will increase by 40, these are the default parameters
# combiner=None Combiner, input the fitted object
# transfer=None WOETransformer, input the fitted object
# model_type='lr' enumerate:'lr':sklearn LR 'ols':statsmodels ols
# ScoreCard.fit():
# X=None WOE value
# y=None Y label
card = lapras.ScoreCard(
combiner = c,
transfer = transfer
)
col = list(final_data.drop([target],axis=1).columns)
card.fit(final_data[col], final_data[target])
ScoreCard(base_odds=0.016666666666666666, base_score=600, card=None,
combiner=<lapras.transform.Combiner object at 0x000001EC0FB72438>,
pdo=40, rate=2,
transfer=<lapras.transform.WOETransformer object at 0x000001EC0FDAEF98>)
# ScoreCard class method expaination
# ScoreCard.predict() predict score for each sample:
# X=None
# ScoreCard.predict_prob() predict prob for each sample:
# X=None
# ScoreCard.export() output the details of scorecard by dict format
# ScoreCard.get_params() to get the parameters of scorecard by dict format, usually used in deployment
# card.intercept_ intercept of logical regression
# card.coef_ coefficient of logical regression
final_result = final_data[[target]].copy()
score = card.predict(final_data[col])
prob = card.predict_prob(final_data[col])
final_result['score'] = score
final_result['prob'] = prob
print("card.intercept_:%s" % (card.intercept_))
print("card.coef_:%s" % (card.coef_))
card.get_params()['combiner']
card.get_params()['transfer']
card.export()
card.intercept_:-2.5207582925622476
card.coef_:[0.32080944 0.3452988 0.68294643 0.66842902]
{'age': {'[-inf,23.0)': -39.69,
'[23.0,24.0)': -30.31,
'[24.0,25.0)': -10.81,
'[25.0,26.0)': 1.59,
'[26.0,28.0)': 9.51,
'[28.0,29.0)': 13.49,
'[29.0,37.0)': 30.74,
'[37.0,inf)': 53.52},
'intercept': {'[-inf,inf)': 509.19},
'max_unpay_day': {'[-inf,171.0)': 2.64, '[171.0,inf)': -20.45},
'score': {'[-inf,237.0)': -37.23,
'[237.0,272.0)': -22.01,
'[272.0,288.0)': -7.02,
'[288.0,296.0)': 3.28,
'[296.0,330.0)': 13.83,
'[330.0,354.0)': 49.22,
'[354.0,384.0)': 66.92,
'[384.0,inf)': 121.77},
'wealth': {'[-inf,3.0)': -18.75,
'[3.0,4.0)': -1.45,
'[4.0,5.0)': 4.07,
'[5.0,7.0)': 4.92,
'[7.0,inf)': 11.37}}
# model performance metrics, including KS, AUC, ROC curve, KS curve, PR curve
# Parameter details
# feature=None predicted value
# target=None actual label
lapras.perform(prob,final_result[target])
KS: 0.4160
AUC: 0.7602
# Parameter details
# frame=None original dataframe
# score='score' score label name
# target='target' Y label name
# score_bond=None score boundary, default by 30, customized by list, e.g. [100,200,300]
lapras.score_plot(final_result,score='score', target=target)
bad: [42, 78, 70, 104, 61, 28, 18, 1, 1, 0]
good: [129, 249, 494, 795, 1075, 972, 825, 282, 164, 114]
all: [171, 327, 564, 899, 1136, 1000, 843, 283, 165, 114]
all_rate: ['3.11%', '5.94%', '10.25%', '16.34%', '20.65%', '18.18%', '15.32%', '5.14%', '3.00%', '2.07%']
bad_rate: ['24.56%', '23.85%', '12.41%', '11.57%', '5.37%', '2.80%', '2.14%', '0.35%', '0.61%', '0.00%']
# LIFT show
# feature=None predicted value
# target=None actual label
# recall_list=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1] default
lapras.LIFT(prob,final_data[target])
recall | precision | improve | |
---|---|---|---|
0 | 0.1 | 0.240000 | 3.202779 |
1 | 0.2 | 0.261290 | 3.486897 |
2 | 0.3 | 0.240964 | 3.215642 |
3 | 0.4 | 0.189535 | 2.529327 |
4 | 0.5 | 0.179170 | 2.391013 |
5 | 0.6 | 0.174352 | 2.326707 |
6 | 0.7 | 0.161622 | 2.156831 |
7 | 0.8 | 0.126972 | 1.694425 |
8 | 0.9 | 0.113936 | 1.520466 |
9 | 1.0 | 0.074935 | 1.000000 |
Automatical modeling
# auto_model parameters df,target,to_drop are necessary, others are optional
# bins_show=False showing the binning graphs when set to true
# iv_rank=False feature IV values will be ranked when set to true
# perform_show=False showing performance(training set)
# coef_negative=True coefficient can be negative if set to true
# return: ScoreCard object
auto_card = lapras.auto_model(df=train_df,target=target,to_drop=to_drop,bins_show=False,iv_rank=False,perform_show=False,
coef_negative = False, empty = 0.95, iv = 0.02, corr = 0.9, vif = False, method = 'mono',
n_bins=8, min_samples=0.05, pdo=40, rate=2, base_odds=1 / 60, base_score=600)
——data filtering——
original feature:6 filtered features:6
——feature binning——
——WOE value transformation——
——feature filtering once more——
original feature:6 filtered features:6
——scorecard modeling——
intercept: -2.520670026708529
coef: [0.66928671 0.59743968 0.31723278 0.22972838 0.28750881 0.26435224]
——model performance metrics——
KS: 0.4208
AUC: 0.7626
recall precision improve
0 0.1 0.238095 3.188586
1 0.2 0.254777 3.411990
2 0.3 0.239521 3.207679
3 0.4 0.193742 2.594611
4 0.5 0.182805 2.448141
5 0.6 0.171510 2.296866
6 0.7 0.160501 2.149437
7 0.8 0.130259 1.744435
8 0.9 0.110603 1.481206
9 1.0 0.074671 1.000000
Automatic modeling finished, time costing: 0 second
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for lapras-0.0.23-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1820d0bdd2c09bdc75c4ea538c83772681046584dfe88d1fdc4e3803b5b27cef |
|
MD5 | e7e9d635a5ddaf6c60c9244efdd15726 |
|
BLAKE2b-256 | 0050c7d10c402e0fadbf8a31010f8a6b65238a911c45748e2bbaca47d89e6c0d |