自动决策树规则挖掘工具包
Project description
自动决策树规则挖掘工具包
在笔者金融风控的日常工作中,很多时候需要根据数据集内的诸多特征(有很多其他称呼,比如因子、变量、自变量、解释变量等)来挖掘一些有用的规则和组合策略,在保证通过率的基础上尽可能多的拒绝坏客户。面对成千上万的特征,如何从数据集中找到有效的规则和组合策略,一直以来都是金融风控搬砖工的日常工作。 pdtr
旨在帮助读者快速从高维数据中提取出有效的规则和组合策略。
仓库地址:https://github.com/itlubber/pdtr
博文地址:https://itlubber.art/archives/auto-strategy-mining
背景简介
金融场景风险大致可以概括为三种:系统性风险、欺诈风险(无还款意愿)、信用风险(无还款能力),而作为一名风控搬砖工,日常工作中有大量的数据挖掘工作,如何从高维数据集中挖掘出行之有效的规则、策略及模型来防范欺诈风险和信用风险每个搬砖工的基操。本仓库由笔者基于网上开源的一系列相关知识,结合实际工作中遇到的实际需求,整理得到。旨在为诸位仁兄提供一个便捷、高效、赏心悦目的决策树组合策略挖掘报告,及一系列能够实际运用到风险控制上的策略。
项目结构
pdtr
.
| README.md # 说明文档
| setup.py # 打包发布文件
| LICENSE # 开源协议
| requirements.txt # 项目依赖包
+---examples # 演示样例
| | combine_rules_cache # 缓存文件
| | combine_rules_cache.svg # 缓存文件
| | pdtr_samplts.ipynb # 演示样例程序
| \---model_report # 模型报告输出文件夹
| | 决策树组合策略挖掘.xlsx # 策略挖掘报告
| +---auto_mining_rules # 组合策略可视化存储文件夹
| | combiner_rules_0.png # 决策树可视化图片
| | ......
| \---bin_plots # 简单策略可视化存储文件夹
| bin_vars_A.png # 变量分箱可视化图片
| ......
\---pdtr # PDTR 源码包
template.xlsx # excel 模版文件
excel_writer.py # excel写入公共方法
matplot_chinese.ttf # matplotlib 中文字体
transforme.py # 策略挖掘方法
环境准备
创建虚拟环境(可选)
- 通过
conda
创建虚拟环境
>> conda create -n score python==3.8.13
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.10.3
latest version: 23.3.1
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: /Users/lubberit/anaconda3/envs/score
added / updated specs:
- python==3.8.13
The following packages will be downloaded:
package | build
---------------------------|-----------------
ca-certificates-2023.01.10 | hecd8cb5_0 121 KB
ncurses-6.4 | hcec6c5f_0 1018 KB
openssl-1.1.1t | hca72f7f_0 3.3 MB
pip-23.0.1 | py38hecd8cb5_0 2.5 MB
python-3.8.13 | hdfd78df_1 10.8 MB
setuptools-66.0.0 | py38hecd8cb5_0 1.2 MB
sqlite-3.41.2 | h6c40b1e_0 1.2 MB
wheel-0.38.4 | py38hecd8cb5_0 65 KB
xz-5.4.2 | h6c40b1e_0 372 KB
------------------------------------------------------------
Total: 20.5 MB
The following NEW packages will be INSTALLED:
ca-certificates pkgs/main/osx-64::ca-certificates-2023.01.10-hecd8cb5_0
libcxx pkgs/main/osx-64::libcxx-14.0.6-h9765a3e_0
libffi pkgs/main/osx-64::libffi-3.3-hb1e8313_2
ncurses pkgs/main/osx-64::ncurses-6.4-hcec6c5f_0
openssl pkgs/main/osx-64::openssl-1.1.1t-hca72f7f_0
pip pkgs/main/osx-64::pip-23.0.1-py38hecd8cb5_0
python pkgs/main/osx-64::python-3.8.13-hdfd78df_1
readline pkgs/main/osx-64::readline-8.2-hca72f7f_0
setuptools pkgs/main/osx-64::setuptools-66.0.0-py38hecd8cb5_0
sqlite pkgs/main/osx-64::sqlite-3.41.2-h6c40b1e_0
tk pkgs/main/osx-64::tk-8.6.12-h5d9f67b_0
wheel pkgs/main/osx-64::wheel-0.38.4-py38hecd8cb5_0
xz pkgs/main/osx-64::xz-5.4.2-h6c40b1e_0
zlib pkgs/main/osx-64::zlib-1.2.13-h4dc903c_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
sqlite-3.41.2 | 1.2 MB | ################################################################################################### | 100%
wheel-0.38.4 | 65 KB | ################################################################################################### | 100%
openssl-1.1.1t | 3.3 MB | ################################################################################################### | 100%
python-3.8.13 | 10.8 MB | ################################################################################################### | 100%
setuptools-66.0.0 | 1.2 MB | ################################################################################################### | 100%
ncurses-6.4 | 1018 KB | ################################################################################################### | 100%
xz-5.4.2 | 372 KB | ################################################################################################### | 100%
ca-certificates-2023 | 121 KB | ################################################################################################### | 100%
pip-23.0.1 | 2.5 MB | ################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate score
#
# To deactivate an active environment, use
#
# $ conda deactivate
- 通过
pyenv
创建虚拟环境
# 安装环境
>> pyenv install -v 3.8.13
# 启动环境
>> pyenv local 3.8.13
# 卸载环境
>> pyenv uninstall 3.8.13
安装项目依赖
>> pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
......
Installing collected packages: webencodings, six, pytz, colour, zipp, tomli, tinycss2, threadpoolctl, python-dateutil, pyparsing, pycparser, pluggy, pillow, packaging, numpy, kiwisolver, joblib, iniconfig, graphviz, fonttools, exceptiongroup, et-xmlfile, defusedxml, cycler, scipy, pytest, patsy, pandas, openpyxl, importlib-resources, cssselect2, contourpy, cffi, statsmodels, scikit-learn, matplotlib, cairocffi, dtreeviz, category-encoders, CairoSVG
Successfully installed CairoSVG-2.7.0 cairocffi-1.5.1 category-encoders-2.6.0 cffi-1.15.1 colour-0.1.5 contourpy-1.0.7 cssselect2-0.7.0 cycler-0.11.0 defusedxml-0.7.1 dtreeviz-2.2.1 et-xmlfile-1.1.0 exceptiongroup-1.1.1 fonttools-4.39.4 graphviz-0.20.1 importlib-resources-5.12.0 iniconfig-2.0.0 joblib-1.2.0 kiwisolver-1.4.4 matplotlib-3.7.1 numpy-1.22.2 openpyxl-3.0.7 packaging-23.1 pandas-1.5.3 patsy-0.5.3 pillow-9.5.0 pluggy-1.0.0 pycparser-2.21 pyparsing-3.0.9 pytest-7.3.1 python-dateutil-2.8.2 pytz-2023.3 scikit-learn-1.2.2 scipy-1.10.1 six-1.11.0 statsmodels-0.14.0 threadpoolctl-3.1.0 tinycss2-1.2.1 tomli-2.0.1 webencodings-0.5.1 zipp-3.15.0
PDTR
安装
pip install pdtr
版本介绍
0.1.0
仅包含决策树策略挖掘相关工具
0.1.1
除版本 0.1.0
中的决策树挖掘相关工具以外,新增了基于 toad
和 optbinning
的单变量策略挖掘相关方法
运行样例
- 导入相关依赖
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
try:
from pdtr import ParseDecisionTreeRules
except ModuleNotFoundError:
import sys
sys.path.append("../")
from pdtr import ParseDecisionTreeRules
np.random.seed(1)
- 数据集加载
feature_map = {}
n_samples = 10000
ab = np.array(list('ABCDEFG'))
data = pd.DataFrame({
'A': np.random.randint(10, size = n_samples),
'B': ab[np.random.choice(7, n_samples)],
'C': ab[np.random.choice(2, n_samples)],
'D': np.random.random(size = n_samples),
'target': np.random.randint(2, size = n_samples)
})
- 数据集拆分
train, test = train_test_split(data, test_size=0.3, shuffle=data["target"])
- 决策树自动规则挖掘
pdtr_instance = ParseDecisionTreeRules(target="target", max_iter=8, output="model_report/决策树组合策略挖掘.xlsx")
pdtr_instance.fit(train, lift=0., max_depth=2, max_samples=1., verbose=False, max_features="auto")
- 规则验证
all_rules = pdtr_instance.insert_all_rules(test=test)
- 导出策略挖掘报告
pdtr_instance.save()
- 挖掘报告
参考
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdtr-0.1.2.tar.gz
.
File metadata
- Download URL: pdtr-0.1.2.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a392bbd4c12f783bc8964930ba7ffa8b2891dc38135b64ff39efe5e92734f3a |
|
MD5 | 1238072051b6679ca2d6296abda24d7c |
|
BLAKE2b-256 | 9522ec3c4a96ff4c09f1fd0307b561a230c4739ec2bbfc9dc21437963acbbaf6 |
File details
Details for the file pdtr-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: pdtr-0.1.2-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e89c08f5c89ac9f1f63951fa04d10a51337fdd9b1ff830f6e357de5fd48cd9e |
|
MD5 | fcc8998f66abcd9aed367c90220af96e |
|
BLAKE2b-256 | 81c899906a2c0048b7e55be7e1d4843f6193604dcaa7de37a46589ff6a655f80 |