自定义的风控自动建模工具包

Project description

自动建模

建议在自动建模前完成数据的清洗和预处理工作。

当前支持评分卡和 XGB 两种模型架构，新增其他模型架构只需继承AutomatedModeling类并实现对应接口即可。

评分卡

评分卡自动建模流程：

划分 train, validation 和 oot
原始特征基于信息价值和相关性进行初筛，确定潜在的入模变量
对潜在的入模变量进行单调分箱，并根据分箱结果进行 WOE 变换
从潜在的入模变量中剔除稳定性较差的变量
从稳定性较好的变量中剔除 iv 值较低的变量
对剩余变量进行逐步回归，确定最终的入模变量
使用入模变量拟合评分卡

评分卡自动化建模调用示例:

"""
注意，`target`和`time`都是`data`中字段，其中`target`列中的值必须是0-1变量，`time`列中的值必须是 datetime.date 对象
run 方法中的 split_time 参数也必须传入一个 datetime.date 对象
"""
model: AutomatedScoreCard = AutomatedScoreCard(data=data, target=target, time="time", not_features=not_features) # 传入数据以实例化一个 AutomatedScoreCard 对象
model.run(split_time=critical_date, split_validation_set=False) # 调用对象的 run 方法

如何发挥主观能动性？

放弃直接调用run方法，手动执行run方法中执行的代码并 DIY 成您的形状即可

def run(self, split_time: date, split_validation_set: bool = False, validation_pct: float = 0.2,
            empty: float = 0.9, origin_iv: float = 0.02, corr: float = 0.7,
            max_n_bins: int = 5,
            psi_threshold: float = 0.1, psi_original: bool = True,
            woe_iv: float = 0.02, eliminate_low_iv_oot: bool = True,
            estimator: str = "ols", direction: str = "both", criterion: str = "aic",
            model_score: str = "model_score",
            n_bins: int = 50
            ) -> None:
        """自动建模的主方法

        Args:
            split_time (date): 划分 train 和 oot 的边界日期
            split_validation_set (bool, optional): 是否要在 train 中随机划分 validation. Defaults to False.
            validation_pct (float, optional): 验证集占训练集的比例. Defaults to 0.2.
            empty (float, optional): 初筛时的缺失值阈值. Defaults to 0.9.
            origin_iv (float, optional): 初筛时的 iv 阈值. Defaults to 0.02.
            corr (float, optional): 初筛时的相关系数阈值. Defaults to 0.7.
            max_n_bins (int, optional): 潜在入模变量单调分箱时最大分箱个数
            psi_threshold (float, optional): 剔除不稳定变量时的 PSI 阈值. Defaults to 0.1.
            psi_original (bool, optional): 使用原始数据还是离散化后的数据（True: 原始数据, False: 离散化后的数据）. Defaults to True.
            woe_iv (float, optional): 离散化后用 iv 值进行筛选时的 iv 阈值. Defaults to 0.02.
            eliminate_low_iv_oot (bool, optional): 是否将离散化后测试集上 iv 值较低的变量也剔除. Defaults to True.
            estimator (str, optional): 用于拟合的模型，["ols" | "lr" | "lasso" | "ridge"]. Defaults to "ols".
            direction (str, optional): 逐步回归的方向，["forward" | "backward" | "both"]. Defaults to "both".
            criterion (str, optional): 评判标准，["aic" | "bic" | "ks" | "auc"]. Defaults to "aic".
            model_score (str, optional): 模型分字段命名. Defaults to "model_score".
            n_bins (int, optional): 计算模型分 PSI 时的分箱个数. Defaults to 50.
        """
        self.split_train_oot(split_time=split_time, split_validation_set=split_validation_set, validation_pct=validation_pct) # 划分训练集、验证集和测试集
        print(f"训练集有 {self.train.shape[0]} 条样本，bad_rate 为 {self.train[self.target].mean()}；验证集有 {self.validation.shape[0]} 条样本，bad_rate 为 {self.validation[self.target].mean()}；测试集有 {self.oot.shape[0]} 条样本，bad_rate 为 {self.oot[self.target].mean()}")
        self.initial_screening(empty=empty, iv=origin_iv, corr=corr) # 原始特征信息价值和相关性初筛
        print(f"原始数据 IV 和相关性初筛后剩余 {len(self.latent_features)} 个变量，分别是 {self.latent_features}")
        self.monotonic_trend_binning(max_n_bins=max_n_bins) # 将潜在入模变量进行单调分箱并进行 WOE 变换
        stable_features: List[str] = self.eliminate_unstable_features(selected_features=self.latent_features, psi_threshold=psi_threshold, psi_original=psi_original) # 从潜在入模变量中剔除稳定性较差的变量
        print(f"从潜在入模变量中剔除不稳定变量后剩余 {len(stable_features)} 个变量，分别是 {stable_features}")
        # 从稳定性较好的变量中再剔除 iv 值较低的变量
        selected_features: List[str] = self.eliminate_low_iv_features(selected_features=stable_features, iv=woe_iv, train_validation_oot=1)
        if eliminate_low_iv_oot: # 如果需要剔除离散化后测试集上 iv 值较低的变量
            selected_features: List[str] = self.eliminate_low_iv_features(selected_features=selected_features, iv=woe_iv, train_validation_oot=0)
        print(f"从稳定性较好的变量中再剔除 iv 值较低的变量后剩余 {len(selected_features)} 个变量，分别是 {selected_features}")
        used_features: List[str] = self.stepwise_after_woe_transformer(selected_features=selected_features, estimator=estimator, direction=direction, criterion=criterion) # 逐步回顾确定最终的入模变量
        print(f"经逐步回归确定入模变量共 {len(used_features)} 个，分别是 {used_features}")
        self.fit(used_features=used_features, model_score=model_score) # 拟合评分卡
        evaluation: Dict[str, float] = self.evaluate(n_bins=n_bins) # 模型评价指标
        print(f"训练集上 KS 值为：{evaluation['train_ks']}，AUC 值为：{evaluation['train_auc']}；验证集上 KS 值为：{evaluation['validation_ks']}，AUC 值为 {evaluation['validation_auc']}；测试集上 KS 值为：{evaluation['oot_ks']}，AUC 值为 {evaluation['oot_auc']}；模型分的 PSI 为 {evaluation['model_psi']}")

通过AutomatedScoreCard对象的adjust_binning_rules方法来手动分箱；注意，您只能对潜在的入模变量（AutomatedScoreCard对象的latent_features属性）进行手动分箱。如果您想直接干预最终的入模变量，修改传入给fit方法的used_features参数即可。

XGB

XGB 自动建模流程：

划分 train, validation 和 oot
剔除稳定性较差的变量
剔除 iv 值较低的变量
将剩余变量丢入 XGB 进行拟合

XGB 自动化建模调用示例:

"""
注意，`target`和`time`都是`data`中字段，其中`target`列中的值必须是0-1变量，`time`列中的值必须是 datetime.date 对象
run 方法中的 split_time 参数也必须传入一个 datetime.date 对象
"""
model: AutomatedXGBoost = AutomatedXGBoost(data=data, target=target, time="time", not_features=not_features) # 传入数据以实例化一个 AutomatedScoreCard 对象
model.run(split_time=critical_date, split_validation_set=False) # 调用对象的 run 方法

您可以通过AutomatedXGBoost的静态属性params来查看默认的 xgb 训练参数。

PS: AutomatedXGBoost使用的是原生的xgboost，而非scikit-learn库的接口

Project details

Release history Release notifications | RSS feed

This version

0.2.0

Nov 27, 2025

0.1.0

Nov 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

automatedmodeling-0.2.0.tar.gz (26.2 kB view details)

Uploaded Nov 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

automatedmodeling-0.2.0-py3-none-any.whl (28.8 kB view details)

Uploaded Nov 27, 2025 Python 3

File details

Details for the file automatedmodeling-0.2.0.tar.gz.

File metadata

Download URL: automatedmodeling-0.2.0.tar.gz
Upload date: Nov 27, 2025
Size: 26.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.2

File hashes

Hashes for automatedmodeling-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e85834c53e0c7119664525afed7f649f5919e02c5820232680dba9ccef767fab`
MD5	`c3d7631dfd75e3b6528141807c241b9a`
BLAKE2b-256	`9d8ba49c20cad892ca8e9d32c7fd8741bbf0d53373d8e94065ac9825beccd50d`

See more details on using hashes here.

File details

Details for the file automatedmodeling-0.2.0-py3-none-any.whl.

File metadata

Download URL: automatedmodeling-0.2.0-py3-none-any.whl
Upload date: Nov 27, 2025
Size: 28.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.2

File hashes

Hashes for automatedmodeling-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dea568b427fa9fc564059bc3040652a370f2ef6042f251d8549eb4d48c4c2eb5`
MD5	`028056374e42def85ffe5e2ad627ad1f`
BLAKE2b-256	`df8a6a2f8289846e7e9ce4cbeb37178552f96c614192dddab9e55fff6906ea65`

See more details on using hashes here.

automatedmodeling 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

自动建模

评分卡

如何发挥主观能动性？

XGB

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes