Monotone optimal binning (MOB) via PAVA with constraints, plus plotting utilities.
Project description
Monotonic-Optimal-Binning
MOBPY - Monotonic Optimal Binning for Python
A fast, deterministic Python library for creating monotonic optimal bins with respect to a target variable. MOBPY implements a stack-based Pool-Adjacent-Violators Algorithm (PAVA) followed by constrained adjacent merging, ensuring strict monotonicity and statistical robustness.
🎯 Key Features
- ⚡ Fast & Deterministic: Stack-based PAVA with O(n) complexity, followed by O(k) adjacent merges
- 📊 Monotonic Guarantee: Ensures strict monotonicity (increasing/decreasing) between bins and target
- 🔧 Flexible Constraints: Min/max samples, min positives, min/max bins with automatic resolution
- 📈 WoE & IV Calculation: Automatic Weight of Evidence and Information Value for binary targets
- 🎨 Rich Visualizations: Comprehensive plotting functions for PAVA process and binning results
- ♾️ Safe Edges: First bin starts at -∞, last bin ends at +∞ for complete coverage
📦 Installation
pip install MOBPY
For development installation:
git clone https://github.com/ChenTaHung/Monotonic-Optimal-Binning.git
cd Monotonic-Optimal-Binning
pip install -e .
🚀 Quick Start
import pandas as pd
import numpy as np
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_bin_statistics, plot_pava_comparison
import matplotlib.pyplot as plt
df = pd.read_csv('/Users/chentahung/Desktop/git/mob-py/data/german_data_credit_cat.csv')
# Convert default to 0/1 (original is 1/2)
df['default'] = df['default'] - 1
# Configure constraints
constraints = BinningConstraints(
min_bins=4, # Minimum number of bins
max_bins=6, # Maximum number of bins
min_samples=0.05, # Each bin needs at least 5% of total samples
min_positives=0.01 # Each bin needs at least 1% of total positive samples
)
# Create and fit the binner
binner = MonotonicBinner(
df=df,
x='Durationinmonth',
y='default',
constraints=constraints
)
binner.fit()
# Get binning results
bins = binner.bins_() # Bin boundaries
summary = binner.summary_() # Detailed statistics with WoE/IV
display(summary)
Output:
bucket count count_pct sum mean std min max woe iv
0 (-inf, 9) 94 9.4 10.0 0.106383 0.309980 0.0 1.0 1.241870 0.106307
1 [9, 16) 337 33.7 79.0 0.234421 0.424267 0.0 1.0 0.335632 0.035238
2 [16, 45) 499 49.9 171.0 0.342685 0.475084 0.0 1.0 -0.193553 0.019342
3 [45, +inf) 70 7.0 4 0.0 0.571429 0.498445 0.0 1.0 -1.127082 0.102180
📊 Visualization
MOBPY provides comprehensive visualization of binning results:
# Generate comprehensive binning analysis plot
fig = plot_bin_statistics(binner)
plt.show()
The plot_bin_statistics function creates a multi-panel visualization showing:
- Top Left: Weight of Evidence (WoE) bars for each bin
- Top Right: Event rate trend with sample distribution
- Bottom Left: Sample distribution histogram
- Bottom Right: Target distribution boxplots per bin
🔬 Understanding the Algorithm
MOBPY uses a two-stage approach:
Stage 1: PAVA (Pool-Adjacent-Violators Algorithm)
Creates initial monotonic blocks by pooling adjacent violators:
from MOBPY.plot import plot_pava_comparison
# Visualize PAVA process
fig = plot_pava_comparison(binner)
plt.show()
Stage 2: Constrained Merging
Merges adjacent blocks to satisfy constraints while preserving monotonicity:
# Check initial PAVA blocks vs final bins
print(f"PAVA blocks: {len(binner.pava_blocks_())}")
print(f"Final bins: {len(binner.bins_())}")
> PAVA blocks: 10
> Final bins: 4
🎛️ Advanced Configuration
Custom Constraints
# Fractional constraints (adaptive to data size)
constraints = BinningConstraints(
max_bins=8,
min_samples=0.05, # 5% of total samples
max_samples=0.30, # 30% of total samples
min_positives=0.01 # 1% of positive samples
)
# Absolute constraints (fixed values)
constraints = BinningConstraints(
max_bins=5,
min_samples=100, # At least 100 samples per bin
max_samples=500 # At most 500 samples per bin
)
Handling Special Values
# Exclude special codes from binning
age_binner = MonotonicBinner(
df=df,
x='Age',
y='default',
constraints= constraints,
exclude_values=[-999, -1, 0] # Treat as separate bins
).fit()
Transform New Data
new_data = pd.DataFrame({'age': [25, 45, 65]})
# Get bin assignments
bins = age_binner.transform(new_data['age'], assign='interval')
print(bins)
# Output:
# 0 (-inf, 26)
# 1 [35, 75)
# 2 [35, 75)
# Name: age, dtype: object
# Get WoE values for scoring
print(age_binner.transform(new_data['age'], assign='woe'))
# Output:
# 0 -0.526748
# 1 0.306015
# 2 0.306015
📈 Use Cases
MOBPY is ideal for:
- Credit Risk Modeling: Create monotonic risk score bins for regulatory compliance
- Insurance Pricing: Develop age/risk factor bands with clear premium progression
- Customer Segmentation: Build ordered customer value tiers
- Feature Engineering: Generate interpretable binned features for ML models
- Regulatory Reporting: Ensure transparent, monotonic relationships in models
📚 Documentation
- API Reference - Complete API documentation
- Algorithm Details - Mathematical foundations
- Examples & Tutorials - Jupyter notebooks with real-world examples
🧪 Testing
# Run unit tests
pytest -vv -ignore-userwarnings -q
📖 Reference
- Mironchyk, Pavel, and Viktor Tchistiakov. Monotone optimal binning algorithm for credit risk modeling. (2017)
- Smalbil, P. J. The choices of weights in the iterative convex minorant algorithm. (2015)
- Testing Dataset 1: German Credit Risk from Kaggle
- Testing Dataset 2: US Health Insurance Dataset from Kaggle
- GitHub Project: Monotone Optimal Binning (SAS 9.4 version)
👥 Authors
-
Ta-Hung (Denny) Chen
- LinkedIn: https://www.linkedin.com/in/dennychen-tahung/
- E-mail: denny20700@gmail.com
-
Yu-Cheng (Darren) Tsai
- LinkedIn: https://www.linkedin.com/in/darren-yucheng-tsai/
- E-mail:
-
Peter Chen
- LinkedIn: https://www.linkedin.com/in/peterchentsungwei/
- E-mail: peterwei20700@gmail.com
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mobpy-2.1.0.tar.gz.
File metadata
- Download URL: mobpy-2.1.0.tar.gz
- Upload date:
- Size: 422.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee35582659244aa18f5637306c610e186dff120761100d66fcb8cdd025ee1c3c
|
|
| MD5 |
e83042ffe1c2febd00d1aa0b93a0fc31
|
|
| BLAKE2b-256 |
6f17fa9b8e069a3d33009c9bad21214d4f0452d4ec71833a5468db58ef002b10
|
File details
Details for the file mobpy-2.1.0-py3-none-any.whl.
File metadata
- Download URL: mobpy-2.1.0-py3-none-any.whl
- Upload date:
- Size: 55.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f454fa46c21c7b39fc2ec4c1243d74c9f5aa8629bf2a32e74117c611e84ac20e
|
|
| MD5 |
088bc589f303523d0084a7c3369239ec
|
|
| BLAKE2b-256 |
554d17f1dbeff1b57f522838fe75dee62523f0c0198c0c138bb116b02a34d375
|