自动决策树规则挖掘工具包
Project description
自动决策树规则挖掘工具包
背景简介
金融场景风险大致可以概括为三种:系统性风险、欺诈风险(无还款意愿)、信用风险(无还款能力),而作为一名风控搬砖工,日常工作中有大量的数据挖掘工作,如何从高维数据集中挖掘出行之有效的规则、策略及模型来防范欺诈风险和信用风险每个搬砖工的基操。本仓库由笔者基于网上开源的一系列相关知识,结合实际工作中遇到的实际需求,整理得到。旨在为诸位仁兄提供一个便捷、高效、赏心悦目的决策树组合策略挖掘报告,及一系列能够实际运用到风险控制上的策略。
环境准备
创建虚拟环境(可选)
- 通过
conda
创建虚拟环境
>> conda create -n score python==3.8.13
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.10.3
latest version: 23.3.1
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: /Users/lubberit/anaconda3/envs/score
added / updated specs:
- python==3.8.13
The following packages will be downloaded:
package | build
---------------------------|-----------------
ca-certificates-2023.01.10 | hecd8cb5_0 121 KB
ncurses-6.4 | hcec6c5f_0 1018 KB
openssl-1.1.1t | hca72f7f_0 3.3 MB
pip-23.0.1 | py38hecd8cb5_0 2.5 MB
python-3.8.13 | hdfd78df_1 10.8 MB
setuptools-66.0.0 | py38hecd8cb5_0 1.2 MB
sqlite-3.41.2 | h6c40b1e_0 1.2 MB
wheel-0.38.4 | py38hecd8cb5_0 65 KB
xz-5.4.2 | h6c40b1e_0 372 KB
------------------------------------------------------------
Total: 20.5 MB
The following NEW packages will be INSTALLED:
ca-certificates pkgs/main/osx-64::ca-certificates-2023.01.10-hecd8cb5_0
libcxx pkgs/main/osx-64::libcxx-14.0.6-h9765a3e_0
libffi pkgs/main/osx-64::libffi-3.3-hb1e8313_2
ncurses pkgs/main/osx-64::ncurses-6.4-hcec6c5f_0
openssl pkgs/main/osx-64::openssl-1.1.1t-hca72f7f_0
pip pkgs/main/osx-64::pip-23.0.1-py38hecd8cb5_0
python pkgs/main/osx-64::python-3.8.13-hdfd78df_1
readline pkgs/main/osx-64::readline-8.2-hca72f7f_0
setuptools pkgs/main/osx-64::setuptools-66.0.0-py38hecd8cb5_0
sqlite pkgs/main/osx-64::sqlite-3.41.2-h6c40b1e_0
tk pkgs/main/osx-64::tk-8.6.12-h5d9f67b_0
wheel pkgs/main/osx-64::wheel-0.38.4-py38hecd8cb5_0
xz pkgs/main/osx-64::xz-5.4.2-h6c40b1e_0
zlib pkgs/main/osx-64::zlib-1.2.13-h4dc903c_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
sqlite-3.41.2 | 1.2 MB | ################################################################################################### | 100%
wheel-0.38.4 | 65 KB | ################################################################################################### | 100%
openssl-1.1.1t | 3.3 MB | ################################################################################################### | 100%
python-3.8.13 | 10.8 MB | ################################################################################################### | 100%
setuptools-66.0.0 | 1.2 MB | ################################################################################################### | 100%
ncurses-6.4 | 1018 KB | ################################################################################################### | 100%
xz-5.4.2 | 372 KB | ################################################################################################### | 100%
ca-certificates-2023 | 121 KB | ################################################################################################### | 100%
pip-23.0.1 | 2.5 MB | ################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate score
#
# To deactivate an active environment, use
#
# $ conda deactivate
- 通过
pyenv
创建虚拟环境
# 安装环境
>> pyenv install -v 3.8.13
# 启动环境
>> pyenv local 3.8.13
# 卸载环境
>> pyenv uninstall 3.8.13
安装项目依赖
>> pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
......
Installing collected packages: webencodings, six, pytz, colour, zipp, tomli, tinycss2, threadpoolctl, python-dateutil, pyparsing, pycparser, pluggy, pillow, packaging, numpy, kiwisolver, joblib, iniconfig, graphviz, fonttools, exceptiongroup, et-xmlfile, defusedxml, cycler, scipy, pytest, patsy, pandas, openpyxl, importlib-resources, cssselect2, contourpy, cffi, statsmodels, scikit-learn, matplotlib, cairocffi, dtreeviz, category-encoders, CairoSVG
Successfully installed CairoSVG-2.7.0 cairocffi-1.5.1 category-encoders-2.6.0 cffi-1.15.1 colour-0.1.5 contourpy-1.0.7 cssselect2-0.7.0 cycler-0.11.0 defusedxml-0.7.1 dtreeviz-2.2.1 et-xmlfile-1.1.0 exceptiongroup-1.1.1 fonttools-4.39.4 graphviz-0.20.1 importlib-resources-5.12.0 iniconfig-2.0.0 joblib-1.2.0 kiwisolver-1.4.4 matplotlib-3.7.1 numpy-1.22.2 openpyxl-3.0.7 packaging-23.1 pandas-1.5.3 patsy-0.5.3 pillow-9.5.0 pluggy-1.0.0 pycparser-2.21 pyparsing-3.0.9 pytest-7.3.1 python-dateutil-2.8.2 pytz-2023.3 scikit-learn-1.2.2 scipy-1.10.1 six-1.11.0 statsmodels-0.14.0 threadpoolctl-3.1.0 tinycss2-1.2.1 tomli-2.0.1 webencodings-0.5.1 zipp-3.15.0
PDTR
安装
pip install pdtr
运行样例
- 导入相关依赖
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
try:
from pdtr import ParseDecisionTreeRules
except ModuleNotFoundError:
import sys
sys.path.append("../")
from pdtr import ParseDecisionTreeRules
np.random.seed(1)
- 数据集加载
feature_map = {}
n_samples = 10000
ab = np.array(list('ABCDEFG'))
data = pd.DataFrame({
'A': np.random.randint(10, size = n_samples),
'B': ab[np.random.choice(7, n_samples)],
'C': ab[np.random.choice(2, n_samples)],
'D': np.random.random(size = n_samples),
'target': np.random.randint(2, size = n_samples)
})
- 数据集拆分
train, test = train_test_split(data, test_size=0.3, shuffle=data["target"])
- 决策树自动规则挖掘
pdtr_instance = ParseDecisionTreeRules(target="target", max_iter=8, output="model_report/决策树组合策略挖掘.xlsx")
pdtr_instance.fit(train, lift=0., max_depth=2, max_samples=1., verbose=False, max_features="auto")
- 规则验证
all_rules = pdtr_instance.insert_all_rules(test=test)
- 导出策略挖掘报告
pdtr_instance.save()
- 挖掘报告
examples/决策树组合策略挖掘.xlsx
参考
https://github.com/itlubber/LogisticRegressionPipeline https://github.com/itlubber/itlubber-excel-writer
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdtr-0.1.0.tar.gz
(16.3 kB
view details)
Built Distribution
pdtr-0.1.0-py3-none-any.whl
(14.1 kB
view details)
File details
Details for the file pdtr-0.1.0.tar.gz
.
File metadata
- Download URL: pdtr-0.1.0.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a76d4c4c8759f0bb39638ef67aa8e5463a57dd05ab97a038f5b851fcd42cbeb8 |
|
MD5 | d4c7ae51183dd5704a1a0d3cf57af051 |
|
BLAKE2b-256 | e4ca1d2335ef11fb3d0f01007515ce0ac7b3fbeb7b8e6946e60fde839edd8b7b |
File details
Details for the file pdtr-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pdtr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12f6e554d78317737022e248a080c8b9a5c6a8ec26e86d43220e1cb12c0b31ad |
|
MD5 | c7a7e6174b3eb8490f2628b906292cc1 |
|
BLAKE2b-256 | 37fd4403b764e8d3aa3ac8434a4a45289f3b176f136f01cd0fdde2a23c532428 |