Skip to main content

Credit risk modeling factory: WOE binning, scorecards, LightGBM, Excel reporting.

Project description

SuperModelingFactory

PyPI Python License: BSL 1.1 Build wheels Docs

风控建模工厂 —— 一套面向信用评分卡开发与模型管理的完整 Python 工具链。

📖 在线文档 · 安装、快速上手、API 参考、用户指南一应俱全。

安装

pip install supermodelingfactory

macOS 用户额外需要安装 OpenMP 运行时(lightgbm 依赖):

brew install libomp

支持的环境:Python 3.10 / 3.11 / 3.12 / 3.13,平台 macOS arm64 / Linux x86_64 / Windows x86_64。

详见 INSTALL.md

许可证

本项目采用 Business Source License 1.1,Change Date 为 2030-06-24。之前:

  • ✅ 允许:个人学习、学术研究、内部评估、原型、教学
  • ❌ 不允许:任何生产 / 商业 / 营收性使用

2030-06-24 后自动转为 Apache 2.0。商业授权请联系作者。

项目现在以源码形式打包和分发;wheel 与 sdist 均包含 Python 源码,不再通过 Cython 隐藏核心模块实现。

项目概述

SuperModelingFactory 整合了信贷风控建模全流程所需的三大能力:

子项目 功能定位 核心能力
Modeling_Tool 建模引擎 数据分箱、WOE 编码、特征分析、模型训练与评估、样本管理
ExcelMaster 报告引擎 程序化 Excel 工作簿生成,支持图表、条件格式、光标流式写入
Report 报告模板 模型性能报告、WOE 图批量导出、多模型对比报告

项目结构

SuperModelingFactory/
├── Modeling_Tool/          # 核心建模工具包
│   ├── Core/               #   基础设施:分箱、ODPS、工具函数、加密
│   ├── WOE/                #   WOE 编码:分箱、变换、映射、可视化
│   ├── Feature/            #   特征分析:分布偏移、PSI、相关性过滤
│   ├── Model/              #   模型训练:LR、LightGBM、XGBoost、变量选择
│   ├── Eval/               #   模型评估:Gains 表、ROC/KS、性能汇总
│   └── Sample/             #   样本管理:切分、分层、拒绝推断、分布适配
├── ExcelMaster/            # Excel 报告引擎
│   ├── ExcelFormatTool.py  #   格式定义(50+ 预设单元格格式)
│   ├── ExcelMaster.py      #   核心引擎(光标流式写入、图表、条件格式)
│   ├── Template.py         #   分析报告模板(PVA、Bivar、GridSearch 等)
│   └── Utility.py          #   工具函数(颜色、路径、PSI 报表处理等)
└── Report/                 # 模型评估报告模板
    └── Report_Tool.py      #   性能报告、WOE 绘图、多模型对比

安装

依赖

# 核心依赖
pip install pandas numpy scipy scikit-learn

# 建模引擎
pip install lightgbm xgboost joblib

# Excel 报告
pip install xlsxwriter openpyxl Pillow matplotlib seaborn

# 可选
pip install pyodps          # 阿里云 MaxCompute 连接
pip install imbalanced-learn # SMOTE 采样
pip install tqdm             # 进度条

使用

git clone <repo-url>
cd SuperModelingFactory
export PYTHONPATH="${PYTHONPATH}:$(pwd)"

快速开始

典型风控建模流程

from Modeling_Tool import (
    # 分箱
    Binning, super_binning,
    # WOE 编码
    WOE_Master,
    # 特征分析
    VarExtractionInsights, CorrelationFilter, PSICalculator,
    # 模型训练
    GradientBoostingModel, LRMaster,
    # 模型评估
    GainsTableCalculator, PerformanceEvaluator,
    # 样本管理
    SampleSplitter, RejectInferrer
)

# 1. 样本切分
splitter = SampleSplitter(test_size=0.3, random_state=42, stratify=True)
train_df, test_df = splitter.split_df(data, target='is_bad')

# 2. WOE 分箱与编码
woe_master = WOE_Master(train_data=train_df, varlist=feature_cols, dep='is_bad')
woe_master.fit(nbins=10, equal_freq=True)
train_woe = woe_master.transform(train_df)
test_woe = woe_master.transform(test_df)

# 3. 特征筛选
psi_calc = PSICalculator(buckets=10)
psi_result = psi_calc.calculate(expected_df=train_df, current_data=test_df, varlist=feature_cols)

corr_filter = CorrelationFilter(data=train_woe, dep='is_bad')
keep_vars = corr_filter.remove_highly_correlated(feature_cols)

# 4. 模型训练
model = GradientBoostingModel('lgb', params={'n_estimators': 100, 'learning_rate': 0.1})
model.fit(train_woe[keep_vars], train_woe['is_bad'], test_woe[keep_vars], test_woe['is_bad'])

# 5. 模型评估
evaluator = PerformanceEvaluator(tgt_name='is_bad', model=model.model, feature_cols=keep_vars)
evaluator.add_dataset('train', train_woe).add_dataset('test', test_woe)
perf_result = evaluator.evaluate()

使用 ExcelMaster 生成报告

from ExcelMaster.ExcelMaster import ExcelMaster

em = ExcelMaster('model_report.xlsx')
ws = em.add_worksheet('Performance')

# 流式写入 DataFrame
em.write_dataframe(ws, perf_result, title='模型性能汇总', titleformat='BLUE_H2')
em.insert_image(ws, 'roc_curve.png', figScale=(600, 400))

em.close_workbook()

架构设计

依赖方向

                    ┌─────────┐
                    │  Core   │  (基础设施,无跨包依赖)
                    └────┬────┘
           ┌─────────┬───┼───────┬─────────┐
           ▼         ▼   ▼       ▼         ▼
         WOE      Model  Eval  Feature   Sample
           │         │              │        │
           └─────────┴──────────────┴────────
                (均单向依赖 Core,模块间延迟导入)
  • Core 是所有子包的基础,不依赖任何其他子包
  • 其他子包之间通过延迟导入(函数体内 import)避免循环依赖
  • 顶层 Modeling_Tool/__init__.py 提供精选的统一 API

命名规范

  • 所有公开 API 通过 __init__.py 导出,使用方只需 from Modeling_Tool import ...
  • 类名采用 PascalCase,函数名采用 snake_case
  • _ 开头的函数/方法为内部实现,不对外暴露

持续集成

本仓库的 GitHub Actions(.github/workflows/tests.yml)会在 push 到 main 与 PR 上自动跑 pytest,矩阵为:

  • Python:3.113.12
  • 依赖矩阵:
    • legacynumpy<2 + scipy<1.13 + lightgbm<4
    • modernnumpy>=2 + scipy>=1.13 + lightgbm>=4
  • 共同约束:pandas>=2.0,<2.3(等 issue #2 修复后放宽)

测试用例托管在独立仓库 SuperModelingFactory_pytest(私有),workflow 通过 secrets.PYTEST_REPO_TOKEN 跨仓 clone。

配置 PAT(只需做一次)

  1. 进入 GitHub Settings · Tokens (classic) 生成新 token,scope 勾选 repo(只读访问私有仓库即可)
  2. 进入本仓库 Settings → Secrets and variables → Actions → New repository secret
  3. Name: PYTEST_REPO_TOKEN,Value: 粘贴 token

如改用 Fine-grained PAT,需将其授权访问 SuperModelingFactory_pytest 仓库的 Contents: Read 权限。

版本

  • Version: 0.2.0
  • Author: Jingkai Sun

许可证

内部项目,仅供团队使用。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

supermodelingfactory-0.2.0.tar.gz (287.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

supermodelingfactory-0.2.0-py3-none-any.whl (315.0 kB view details)

Uploaded Python 3

File details

Details for the file supermodelingfactory-0.2.0.tar.gz.

File metadata

  • Download URL: supermodelingfactory-0.2.0.tar.gz
  • Upload date:
  • Size: 287.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for supermodelingfactory-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bbb4c503cd78f99c6b542030bcb99570557f0fe80525b98ad507e11783cb9cac
MD5 29384accdcd604f43783a1bcb34b2c63
BLAKE2b-256 a626373b780b40078fbda50faf6ad8bb12a299bfc0bb4024cae7e4a538598238

See more details on using hashes here.

Provenance

The following attestation bundles were made for supermodelingfactory-0.2.0.tar.gz:

Publisher: build.yml on Kyle-J-Sun/SuperModelingFactory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file supermodelingfactory-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for supermodelingfactory-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d97b4afcbf2873d4284c7b0f9f53ac32640634b14ac90be900df0ef4a9a3418a
MD5 ff6ff02b3d36752f6b74efca24183e64
BLAKE2b-256 92f188583eac06a14c1cb102a9017488ff7c5a1575e67df1c326e6ae650ebba3

See more details on using hashes here.

Provenance

The following attestation bundles were made for supermodelingfactory-0.2.0-py3-none-any.whl:

Publisher: build.yml on Kyle-J-Sun/SuperModelingFactory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page