Automated Modeling in Financial Domain. TeaML is a simple and design friendly automatic modeling learning framework.
Project description
TeaML - Automated Modeling in Financial Domain
๐๐๐ We are proud to announce that we design an automatic modeling robot based on financial risk control field
! ๐๐๐
Table of Contents
๐ฃ Overview
TeaML is a simple and design friendly automatic modeling learning framework. It can automatically model from beginning to end, and in the end, it will also help you output a model report about the model.
- Human-friendly. TeaML's code is straightforward, well documented and tested, which makes it very easy to understand and modify.
- Built-in financial risk control field. TeaML built-in financial risk control field, it fits well with the use in the field of financial risk control, including WOE, and is very suitable for this scenario.
- Flexible. TeaML provides a variety of variable selection methods, each of which can be self-defined. You can also assemble these algorithms in different order.
- Final Report. TeaML can provide you with a final version of the model report, so that you can find the details in your model.
โจ Our Goal
- Automation In the near future, we will update and add some fantastic algorithms, including but not limited to variable generation (VariableCluster is already in experimental function).
- Common Use All algorithmic engineers, including model analysts, can use it to increase efficiency as long as you have some algorithmic knowledge.
- Wonderful thing We hope that there will be many wonderful things to add. At present, there is no optimization algorithm and parallel strategy in this version. We will try to add these things in later iterations, maybe not too long.
โณ Performance
Task | Strategy | Dataset | Score | Detail |
---|---|---|---|---|
Predicting the Delay Rate of Financial Risk | TeaML | Financial Risk Data | 0.6894 (AUC) | WOE(Monotonic) + STEPWISE |
Predicting the Delay Rate of Financial Risk | LightGBM | Financial Risk Data | 0.6773 (AUC) | LightGBM |
๐ Quick start
Requirements and Installation
The project is based on Python 3.7, Python 3.6 may also work, but it is not fully tested to ensure that all functions are normal.
If you haven't installed lightgbm, you need to install the package yourself.
pip install TeaML
Example Usage
Let's run a simple version.
from TeaML.utils.tea_encoder import *
from TeaML.utils.tea_filter import *
from TeaML.utils.tea_utils import *
from TeaML.utils.auto_bin_woe import *
import TeaML
data = pd.read_csv("TeaML/examples.csv")
# encoder
ct = TeaBadRateEncoder(num=1)
me = TeaMeanEncoder(categorical_features=['city'])
t = TeaOneHotEncoder()
encoder = [me]
# woe & feature selection
woe = TeaML.WOE(bins=10, bad_rate_merge=True, bad_rate_sim_threshold=0.05, psi_threshold=0.1, iv_threshold=None)
iv = FilterIV(200, 100)
vif = FilterVif(50)
mod = FilterModel('lr', 70)
nova = FilterANOVA(40, 30)
coline = FilterCoLine({'penalty': 'l2', 'C': 0.01, 'fit_intercept': True})
fshap = FilterSHAP(70)
outlier = OutlierTransform()
filtercor = FilterCorr(20)
stepwise = FilterStepWise(method='p_value')
method = [woe, stepwise]
# main
tea = TeaML.Tea(['core_lend_request_id', 'lend_customer_id', 'customer_sex',
'data_center_id', 'trace_back_time', 'mobile', 'user_id', 'id_no', 'task_id', 'id',
'id_district_name', 'id_province_name', 'id_city_name', 'pass_time'],
'is_overdue_M0',
datetime_feature='pass_time',
split_method='oot',
file_path='report.xlsx')
tea.wash(data, null_drop_rate=0.8, most_common_drop_rate=0.9)
tea.cook(encoder)
tea.select(method)
tea.drink(LogisticRegression(penalty='l2', C=1, class_weight='balanced'))
tea.sleep(woe.bins)
'''
Preliminary screening...
100%|โโโโโโโโโโ| 19/19 [00:00<00:00, 29.19it/s]
100%|โโโโโโโโโโ| 19/19 [00:00<00:00, 50.03it/s]
100%|โโโโโโโโโโ| 19/19 [00:00<00:00, 55.00it/s]
100%|โโโโโโโโโโ| 19/19 [00:00<00:00, 104.02it/s]
0%| | 0/19 [00:00<?, ?it/s]
cal bin ks, train...
cal bin ks, oot...
100%|โโโโโโโโโโ| 19/19 [00:00<00:00, 21.33it/s]
100%|โโโโโโโโโโ| 100/100 [00:00<00:00, 116.78it/s]
0%| | 0/19 [00:00<?, ?it/s]
Train AUC: 0.6107958854166341
Test AUC: 0.6083763215945612
OOT AUC: 0.6050562520208106
Train KS: 0.1719605325145203
Test KS: 0.17401800497420833
OOT KS: 0.1616794283922675
---------------------------------------------------
cal bin ks, train...
100%|โโโโโโโโโโ| 19/19 [00:00<00:00, 97.39it/s]
36%|โโโโ | 4/19 [00:00<00:00, 37.62it/s]
cal bin ks, oot...
100%|โโโโโโโโโโ| 19/19 [00:00<00:00, 25.66it/s]
Add P with p-value 1.14745e-22
Add F with p-value 3.38993e-15
Add I with p-value 5.18381e-10
Add J with p-value 2.8625e-09
Add M with p-value 6.66696e-07
Add Q with p-value 3.18125e-09
Add B with p-value 1.14541e-06
Add D with p-value 1.21802e-05
Add K with p-value 2.70815e-05
Add C with p-value 0.000118247
Add A with p-value 0.000214666
Add L with p-value 0.000169921
Add H with p-value 0.00139263
Add N with p-value 0.000488745
feature_name feature_coef
5 Q 2.332818
10 A 2.203708
12 H 1.391547
7 D 1.385142
2 I 1.192397
13 N 1.181320
0 P 0.926443
8 K 0.914186
1 F 0.898581
3 J 0.868826
6 B 0.864311
4 M 0.851936
11 L 0.842446
9 C 0.704460
Finish ๐ต
'''
What's the encoder in tea.cook()๏ผ
This is a module for automatic processing of discrete variables in robots.
We offer you three ways to deal with categorical variables
ct = TeaBadRateEncoder(num=1)
me = TeaMeanEncoder(categorical_features=['city'])
t = TeaOneHotEncoder()
encoder = [me]
TeaBadRateEncoder: Replace categorical variables with bad_rate of each bin
TeaMeanEncoder: MeanEncoder
TeaOneHotEncoder: Such as Onehot
What's the method in tea.cook()
This is a module for automatic selection of variables in robots.
What you fill in in the TeaML's method is orderly.
For example, the following represents a monotone woe transformation of all variables, followed by a step-by-step regression based on p-value.
woe = TeaML.WOE(bins=10, bad_rate_merge=True, bad_rate_sim_threshold=0.05, psi_threshold=0.1, iv_threshold=None)
iv = FilterIV(200, 100)
vif = FilterVif(50)
mod = FilterModel('lr', 70)
nova = FilterANOVA(40, 30)
coline = FilterCoLine({'penalty': 'l2', 'C': 0.01, 'fit_intercept': True})
fshap = FilterSHAP(70)
outlier = OutlierTransform()
filtercor = FilterCorr(20)
stepwise = FilterStepWise(method='p_value')
method = [woe, stepwise]
๐ก Support
We support all people to make suggestions, because this is support and encouragement for our project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file TeaML-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: TeaML-0.2.0-py3-none-any.whl
- Upload date:
- Size: 8.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b638142278573c16acaacf5e2d47e42a25c452d31d121d32372ac4f6d6f1696c |
|
MD5 | 5ddc8bfcdeab3597bc68b7c127882f66 |
|
BLAKE2b-256 | a62553344b7317ae138da1698c9bdacca8ca1a5b9250e9800e28cc39cee0232c |