Monotone optimal binning (MOB) via PAVA with constraints, plus plotting utilities.
Project description
Monotonic-Optimal-Binning
MOBPY - Monotonic Optimal Binning for Python
A fast, deterministic Python library for creating monotonic optimal bins with respect to a target variable. MOBPY implements two distinct binning pipelines:
- Numeric x — stack-based PAVA + constrained adjacent merging (Welch's t-test)
- Categorical x — chi-square merging with multiple comparison correction (Holm by default)
🎯 Key Features
- ⚡ Fast & Deterministic: O(n log n) + O(n) PAVA for numeric; O(k²) chi-square merging for categorical
- 🔀 Two Binning Paths: Numeric PAVA pipeline and categorical chi-square pipeline — unified API
- 📊 Monotonic Guarantee: Strict monotonicity between bins and target (numeric path)
- 🔧 Flexible Constraints: Min/max samples, min positives, min negatives, min/max bins — enforced on both paths
- 📈 WoE & IV Calculation: Automatic Weight of Evidence and Information Value for binary targets (all bins including Missing and Excluded)
- 🎨 Rich Visualizations: PAVA process plots, WoE bars, event rate charts, and
plot_categorical_mergefor the categorical path - ♾️ Safe Edges: First bin at -∞, last at +∞ for numeric; full category-set coverage for categorical
📦 Installation
pip install MOBPY
For development installation:
git clone https://github.com/ChenTaHung/Monotonic-Optimal-Binning.git
cd Monotonic-Optimal-Binning
pip install -e .
🚀 Quick Start
Numeric Binning
import pandas as pd
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_bin_statistics
import matplotlib.pyplot as plt
df = pd.read_csv('data/german_data_credit_cat.csv')
df['default'] = df['default'] - 1 # convert 1/2 → 0/1
constraints = BinningConstraints(
min_bins=4,
max_bins=6,
min_samples=0.05, # at least 5% of total samples per bin
min_positives=0.01, # at least 1% of positives per bin
min_negatives=0.01, # at least 1% of negatives per bin (ensures stable WoE)
)
binner = MonotonicBinner(df=df, x='Durationinmonth', y='default',
constraints=constraints)
binner.fit()
summary = binner.summary_()
print(summary[['bucket', 'count', 'mean', 'woe', 'iv']])
Output:
bucket count mean woe iv
0 (-inf, 9) 94 0.106 1.241870 0.106307
1 [9, 16) 337 0.234 0.335632 0.035238
2 [16, 45) 499 0.343 -0.193553 0.019342
3 [45, +inf) 70 0.571 -1.127082 0.102180
Categorical Binning
import pandas as pd
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_woe_bars, plot_categorical_merge
import matplotlib.pyplot as plt
df = pd.read_csv('data/transactions.csv')
binner = MonotonicBinner(
df=df,
x='merchant_category',
y='is_fraud',
x_type='categorical', # activate chi-square merging
categorical_alpha=0.05,
categorical_correction='holm',
constraints=BinningConstraints(max_bins=8, min_bins=2, min_samples=30),
max_label_cats=3, # truncate long bin labels: {A, B, C, ...+N}
)
binner.fit()
diag = binner.get_diagnostics()
print(f"{diag['n_initial_categories']} categories → {diag['n_final_bins']} bins")
print(f"Total IV: {binner.summary_()['iv'].sum():.4f}")
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(18, 5))
plot_woe_bars(binner.summary_(), ax=axes[0], tick_labels='auto', show_iv=True)
plot_categorical_merge(binner, ax=axes[1], show_counts=False)
plt.tight_layout()
plt.show()
# Category → bin mapping
ba = binner.bin_assignment()
for bin_idx in sorted(ba.unique()):
print(f"Bin {bin_idx} ({binner.bins_().loc[bin_idx, 'mean']:.1%}):",
sorted(ba[ba == bin_idx].index))
📊 Visualization
Numeric binning — comprehensive analysis
from MOBPY.plot import plot_bin_statistics
fig = plot_bin_statistics(binner)
plt.show()
plot_bin_statistics creates a multi-panel view: WoE bars · event rate · sample distribution · bin boundaries on data.
Numeric binning — PAVA process
from MOBPY.plot import plot_pava_comparison
fig = plot_pava_comparison(binner)
plt.show()
Categorical binning — merge visualization
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_categorical_merge
import matplotlib.pyplot as plt
binner = MonotonicBinner(
# Please refer to examples/E-Commerce Fraud - Categorical Binning.ipynb
)
binner.fit()
fig, ax = plt.subplots(figsize=(20, 6))
plot_categorical_merge(
binner,
ax=ax,
show_counts=False, # 60 bars — skip per-bar counts to avoid clutter
)
plt.tight_layout()
plt.show()
plot_categorical_merge shows each original category as a bar, coloured by its final bin. Groups are separated by gaps; a dashed line spans each bin at its pooled event rate; the dotted line marks the overall mean.
🔬 Understanding the Algorithm
Numeric path (x_type='numeric', default)
Stage 1 — PAVA: Creates initial monotonic blocks by pooling adjacent violators.
Stage 2 — Constrained merging: Merges adjacent blocks (3 phases):
- Statistical merging (Welch's t-test, respects
max_bins) min_samplesenforcement (stop atmin_binsfloor)min_positives/min_negativesenforcement (binary targets only)
print(f"PAVA blocks: {len(binner.pava_blocks_())}")
print(f"Final bins: {len(binner.bins_())}")
# PAVA blocks: 10
# Final bins: 4
Categorical path (x_type='categorical')
Stage 1 — Chi-square merging: Pairs of category blocks are merged based on adjusted p-values (3 phases):
- Statistical merging — chi-square + Holm correction, pair-result cache keeps total cost O(k²)
min_samplesenforcementmin_positives/min_negativesenforcement
🎛️ Advanced Configuration
Constraints with class-count enforcement
# Fractional (adaptive to data size)
constraints = BinningConstraints(
max_bins=8,
min_samples=0.05, # 5% of total samples
max_samples=0.30, # 30% of total samples
min_positives=0.02, # 2% of positive samples
min_negatives=0.02, # 2% of negative samples — prevents log(0) in WoE
)
# Absolute (fixed)
constraints = BinningConstraints(
max_bins=5,
min_samples=100,
min_positives=20,
min_negatives=50,
)
Handling special values
age_binner = MonotonicBinner(
df=df,
x='Age',
y='default',
constraints=constraints,
exclude_values=[-999, -1, 0], # reported as separate rows in summary_()
).fit()
Unseen categories (categorical path)
binner = MonotonicBinner(
df=train_df, x='category', y='target',
x_type='categorical',
unseen_categories='error', # raises ValueError for unseen values (default)
# unseen_categories='unknown', # returns "Unknown" / NaN WoE instead
)
binner.fit()
# Transform test data — unseen categories handled gracefully
df['bin'] = binner.transform(test_df['category'], assign='interval')
df['woe'] = binner.transform(test_df['category'], assign='woe')
Transform new data
new_data = pd.DataFrame({'age': [25, 45, 65]})
# Bin label
print(binner.transform(new_data['age'], assign='interval'))
# 0 (-inf, 26)
# 1 [35, 75)
# 2 [35, 75)
# WoE score
print(binner.transform(new_data['age'], assign='woe'))
# 0 -0.526748
# 1 0.306015
# 2 0.306015
📈 Use Cases
MOBPY is ideal for:
- Credit Risk Modeling: Create monotonic risk score bins for regulatory compliance
- Insurance Pricing: Develop age/risk factor bands with clear premium progression
- Customer Segmentation: Build ordered customer value tiers or merge categorical merchant types
- Feature Engineering: Generate interpretable binned features for scorecards
- Regulatory Reporting: Ensure transparent, monotonic relationships in models
📚 Documentation
- API Reference — Project structure and workflow
- MonotonicBinner — Full class API (numeric + categorical)
- BinningConstraints — Constraint configuration
- Categorical Merge Module — Chi-square algorithm details
- Plot Module — All visualization functions
- plot_categorical_merge — Categorical merge visualization
- Examples & Tutorials — Jupyter notebooks with real-world examples
🧪 Testing
# Run all tests
.venv/bin/python -m pytest tests/ -q
📖 Reference
- Mironchyk, Pavel, and Viktor Tchistiakov. Monotone optimal binning algorithm for credit risk modeling. (2017)
- Smalbil, P. J. The choices of weights in the iterative convex minorant algorithm. (2015)
- Testing Dataset 1: German Credit Risk from Kaggle
- Testing Dataset 2: US Health Insurance Dataset from Kaggle
- GitHub Project: Monotone Optimal Binning (SAS 9.4 version)
👥 Authors
-
Ta-Hung (Denny) Chen
- LinkedIn: https://www.linkedin.com/in/dennychen-tahung/
- E-mail: denny20700@gmail.com
-
Yu-Cheng (Darren) Tsai
-
Peter Chen
- LinkedIn: https://www.linkedin.com/in/peterchentsungwei/
- E-mail: peterwei20700@gmail.com
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mobpy-2.3.0.tar.gz.
File metadata
- Download URL: mobpy-2.3.0.tar.gz
- Upload date:
- Size: 617.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c18fd25097e74e8bee245647e5f7bf545944a7ffcd58a8140fcecb9991ad76c
|
|
| MD5 |
8c8fce35809e69b9c29ca936ba57f61c
|
|
| BLAKE2b-256 |
b8253adcc5d1e657d035a6e824e5080304a49f6240ccbc3a67d147eb7165b814
|
Provenance
The following attestation bundles were made for mobpy-2.3.0.tar.gz:
Publisher:
Publish.yml on ChenTaHung/Monotonic-Optimal-Binning
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mobpy-2.3.0.tar.gz -
Subject digest:
8c18fd25097e74e8bee245647e5f7bf545944a7ffcd58a8140fcecb9991ad76c - Sigstore transparency entry: 1686073664
- Sigstore integration time:
-
Permalink:
ChenTaHung/Monotonic-Optimal-Binning@ecbf29f94ffd2583d5dc3da4ce23db97688c8e50 -
Branch / Tag:
refs/tags/v2.3.0 - Owner: https://github.com/ChenTaHung
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Publish.yml@ecbf29f94ffd2583d5dc3da4ce23db97688c8e50 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mobpy-2.3.0-py3-none-any.whl.
File metadata
- Download URL: mobpy-2.3.0-py3-none-any.whl
- Upload date:
- Size: 71.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffbf878d4d40c90eee5c3cb5de62dd1ee8eff14f57a8015745009481cd3d0b5a
|
|
| MD5 |
20cd41f0f17619cc0f8b1a46df30a6f5
|
|
| BLAKE2b-256 |
31a2095af8ad1fc642450aec5d7f61e5ad0e528dc58cfccfbc705d3f1b4eb14a
|
Provenance
The following attestation bundles were made for mobpy-2.3.0-py3-none-any.whl:
Publisher:
Publish.yml on ChenTaHung/Monotonic-Optimal-Binning
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mobpy-2.3.0-py3-none-any.whl -
Subject digest:
ffbf878d4d40c90eee5c3cb5de62dd1ee8eff14f57a8015745009481cd3d0b5a - Sigstore transparency entry: 1686074337
- Sigstore integration time:
-
Permalink:
ChenTaHung/Monotonic-Optimal-Binning@ecbf29f94ffd2583d5dc3da4ce23db97688c8e50 -
Branch / Tag:
refs/tags/v2.3.0 - Owner: https://github.com/ChenTaHung
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Publish.yml@ecbf29f94ffd2583d5dc3da4ce23db97688c8e50 -
Trigger Event:
push
-
Statement type: