High-performance Risk Modeling Toolkit powered by Polars
Project description
🚀 MARS: High-Performance Risk Modeling Framework
MARS (Modeling Analysis Risk Score) 是一个面向信贷风控建模场景的 Python 工具库。它基于 Polars 构建数据处理逻辑,并遵循 Scikit-learn 的 API 设计规范,旨在为信贷风控大规模宽表场景下的数据画像、特征工程与模型评估提供更高效的解决方案。
核心目标:通过 Polars 的向量化执行提升数据处理效率,同时保持与 Scikit-learn 流水线(Pipeline)的兼容性。
✨ 核心特性 (Key Features)
1. 📊 高性能数据画像 (Data Profiling)
提供数据质量诊断与可视化报告,性能比传统 Pandas 方案快数倍。
- 全量指标概览: 一次性计算 Missing, Zero, Unique, Top1 等基础 DQ 指标。
- Unicode Sparklines: 在终端或 Notebook 中直接生成迷你分布图 (如
▂▅▇█),快速洞察数据分布。 - 多维趋势分析: 支持按时间 (Month/Vintage) 或客群进行分组分析,自动计算初步的稳定性指标 (Var, CV)。
- Excel 自动化报告: 导出带有热力图、数据条和条件格式的精美 Excel 报表。
2. 🧮 快速分箱引擎 (High-Performance Binning)
针对风控评分卡场景深度优化的分箱器。
- MarsNativeBinner: 基于 Polars 表达式实现的快速分箱。
- 支持 Quantile (等频), Uniform (等宽), CART (决策树) 三种模式。
- 并行加速: 决策树分箱利用
joblib实现多核并行,内存占用低。
- MarsOptimalBinner: 混合动力最优分箱。
- Hybrid Engine: 结合 Polars 的快速预分箱 (O(N)) 与
optbinning的数学规划 (MIP/CP) 求解 (O(1))。 - 支持单调性约束 (Monotonic Trend) 和特殊值/缺失值的独立分层处理。
- Hybrid Engine: 结合 Polars 的快速预分箱 (O(N)) 与
3. 🛠️ 工程化设计
- Auto Polars: 智能装饰器支持 Pandas DataFrame 无缝输入,内部自动转换为 Polars 计算,结果按需回退。
- Pipeline Ready: 所有组件均继承自
MarsBaseEstimator和MarsTransformer,兼容 Sklearn Pipeline。
📦 安装 (Installation)
# 推荐使用 pip 安装
pip install mars-risk
# 或者从源码安装
git clone [https://github.com/leeesq/mars-risk.git](https://github.com/leeesq/mars-risk.git)
cd mars-risk
pip install -e .
依赖项: polars, pandas, numpy, scikit-learn, scipy, xlsxwriter, colorlog, optbinning
⚡️ 快速上手 (Quick Start)
场景 1:生成数据画像报告
import polars as pl
from mars.analysis.profiler import MarsDataProfiler
# 1. 加载数据
df = pl.read_csv("your_data.csv")
# 2. 初始化分析器 (支持自定义缺失值,如 -999)
profiler = MarsDataProfiler(df, custom_missing_values=[-999, "unknown"])
# 3. 生成画像报告
report = profiler.generate_profile(
profile_by="month", # 可选:按月份分组分析趋势
config_overrides={"enable_sparkline": True} # 开启迷你分布图
)
# 4. 展示与导出
report.show_overview() # 在 Jupyter 中查看概览 (含热力图)
report.show_trend("mean") # 查看均值趋势
report.write_excel("data_profile_report.xlsx") # 导出为 Excel
场景 2:快速特征分箱
from mars.feature.binner import MarsNativeBinner, MarsOptimalBinner
# --- 方式 A: 快速原生分箱 (适合大规模预处理) ---
binner = MarsNativeBinner(
features=["age", "income"],
method="quantile", # 等频分箱
n_bins=10,
special_values=[-1] # 特殊值独立成箱
)
binner.fit(X_train, y_train)
X_train_binned = binner.transform(X_train)
# --- 方式 B: 最优分箱 (适合评分卡精细建模) ---
opt_binner = MarsOptimalBinner(
features=["credit_score"],
n_bins=5,
solver="cp", # 使用约束编程求解
monotonic_trend="ascending" # 强制单调递增
)
opt_binner.fit(X_train, y_train)
print(opt_binner.bin_cuts_) # 查看最优切点
📂 项目结构 (Project Structure)
mars/
├── analysis/ # 数据分析与画像模块
│ ├── profiler.py # MarsDataProfiler 核心逻辑
│ ├── report.py # MarsProfileReport 报告容器
│ └── config.py # 分析配置类
├── feature/ # 特征工程模块
│ ├── binning.py # NativeBinner & OptimalBinner
│ ├── encoding.py # TODO
│ ├── selector.py # TODO
│ └── imputer.py # TODO
├── risk/ # TODO
├── metrics/ # 指标计算
│ └── calculation.py # TODO
├── modeling/ # 自动建模流水线(最终幻想)TODO
│ ├── base.py # TODO
│ └── tuner.py # TODO
├── scoring/ # 评分量化 TODO
├── core/ # 核心基类
│ ├── base.py # 兼容 Sklearn
│ └── exceptions.py # 自定义异常
└── utils/ # 工具库
├── logger.py # 全局日志配置
└── decorators.py # 装饰器
📄 许可证 (License)
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mars_risk-0.0.3.tar.gz.
File metadata
- Download URL: mars_risk-0.0.3.tar.gz
- Upload date:
- Size: 80.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
475d0b500c7490cc9727917fe9cb92570831b5a4f780818b5455cc6624dfa109
|
|
| MD5 |
b01adf9e172b33a7a1c0326105965841
|
|
| BLAKE2b-256 |
fed12e6681ad7382e9c48c0a5880b2a7849459e606af69df23748626cf0aa0ff
|
Provenance
The following attestation bundles were made for mars_risk-0.0.3.tar.gz:
Publisher:
publish.yml on leeesq/mars-risk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mars_risk-0.0.3.tar.gz -
Subject digest:
475d0b500c7490cc9727917fe9cb92570831b5a4f780818b5455cc6624dfa109 - Sigstore transparency entry: 844941179
- Sigstore integration time:
-
Permalink:
leeesq/mars-risk@8038f660b17a751aa895242ab10259a4917528bf -
Branch / Tag:
refs/tags/0.0.3 - Owner: https://github.com/leeesq
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8038f660b17a751aa895242ab10259a4917528bf -
Trigger Event:
release
-
Statement type:
File details
Details for the file mars_risk-0.0.3-py3-none-any.whl.
File metadata
- Download URL: mars_risk-0.0.3-py3-none-any.whl
- Upload date:
- Size: 85.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
513a9fe22bf1082ba5a23116f9faa2e73dba982cedb2af4f40c81fa52b2c151f
|
|
| MD5 |
6a5f3787cf50f1c7abafb6e1a99b7740
|
|
| BLAKE2b-256 |
0b54b9a1419e8db9e2ad6f7a593b33c547ad2013d343099afea5ceddfb80d6b9
|
Provenance
The following attestation bundles were made for mars_risk-0.0.3-py3-none-any.whl:
Publisher:
publish.yml on leeesq/mars-risk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mars_risk-0.0.3-py3-none-any.whl -
Subject digest:
513a9fe22bf1082ba5a23116f9faa2e73dba982cedb2af4f40c81fa52b2c151f - Sigstore transparency entry: 844941188
- Sigstore integration time:
-
Permalink:
leeesq/mars-risk@8038f660b17a751aa895242ab10259a4917528bf -
Branch / Tag:
refs/tags/0.0.3 - Owner: https://github.com/leeesq
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8038f660b17a751aa895242ab10259a4917528bf -
Trigger Event:
release
-
Statement type: