Skip to main content

Monotone optimal binning (MOB) via PAVA with constraints, plus plotting utilities.

Project description

Monotonic-Optimal-Binning

MOBPY - Monotonic Optimal Binning for Python

Run Tests Python 3.9+ License: MIT PyPI version

A fast, deterministic Python library for creating monotonic optimal bins with respect to a target variable. MOBPY implements a stack-based Pool-Adjacent-Violators Algorithm (PAVA) followed by constrained adjacent merging, ensuring strict monotonicity and statistical robustness.

🎯 Key Features

  • ⚡ Fast & Deterministic: Stack-based PAVA with O(n) complexity, followed by O(k) adjacent merges
  • 📊 Monotonic Guarantee: Ensures strict monotonicity (increasing/decreasing) between bins and target
  • 🔧 Flexible Constraints: Min/max samples, min positives, min/max bins with automatic resolution
  • 📈 WoE & IV Calculation: Automatic Weight of Evidence and Information Value for binary targets
  • 🎨 Rich Visualizations: Comprehensive plotting functions for PAVA process and binning results
  • ♾️ Safe Edges: First bin starts at -∞, last bin ends at +∞ for complete coverage

📦 Installation

pip install MOBPY

For development installation:

git clone https://github.com/ChenTaHung/Monotonic-Optimal-Binning.git
cd Monotonic-Optimal-Binning
pip install -e .

🚀 Quick Start

import pandas as pd
import numpy as np
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_bin_statistics, plot_pava_comparison
import matplotlib.pyplot as plt

df = pd.read_csv('/Users/chentahung/Desktop/git/mob-py/data/german_data_credit_cat.csv')
# Convert default to 0/1 (original is 1/2)
df['default'] = df['default'] - 1

# Configure constraints
constraints = BinningConstraints(
    min_bins=4,           # Minimum number of bins
    max_bins=6,           # Maximum number of bins
    min_samples=0.05,     # Each bin needs at least 5% of total samples
    min_positives=0.01    # Each bin needs at least 1% of total positive samples
)

# Create and fit the binner
binner = MonotonicBinner(
    df=df,
    x='Durationinmonth',
    y='default',
    constraints=constraints
)
binner.fit()

# Get binning results
bins = binner.bins_()        # Bin boundaries
summary = binner.summary_()  # Detailed statistics with WoE/IV
display(summary)

Output:

    bucket	    count	count_pct	sum	    mean	    std	        min	 max	woe	        iv
0	(-inf, 9)	94	    9.4	        10.0	0.106383	0.309980	0.0	 1.0	1.241870	0.106307
1	[9, 16)	    337	    33.7	    79.0	0.234421	0.424267	0.0	 1.0	0.335632	0.035238
2	[16, 45)	499	    49.9	    171.0	0.342685	0.475084	0.0	 1.0	-0.193553	0.019342
3	[45, +inf)	70	    7.0	4       0.0	    0.571429	0.498445	0.0	 1.0	-1.127082	0.102180

📊 Visualization

MOBPY provides comprehensive visualization of binning results:

# Generate comprehensive binning analysis plot
fig = plot_bin_statistics(binner)
plt.show()

Binning Analysis

The plot_bin_statistics function creates a multi-panel visualization showing:

  • Top Left: Weight of Evidence (WoE) bars for each bin
  • Top Right: Event rate trend with sample distribution
  • Bottom Left: Sample distribution histogram
  • Bottom Right: Target distribution boxplots per bin

🔬 Understanding the Algorithm

MOBPY uses a two-stage approach:

Stage 1: PAVA (Pool-Adjacent-Violators Algorithm)

Creates initial monotonic blocks by pooling adjacent violators:

from MOBPY.plot import plot_pava_comparison

# Visualize PAVA process
fig = plot_pava_comparison(binner)
plt.show()

Pava Comparison

Stage 2: Constrained Merging

Merges adjacent blocks to satisfy constraints while preserving monotonicity:

# Check initial PAVA blocks vs final bins
print(f"PAVA blocks: {len(binner.pava_blocks_())}")
print(f"Final bins: {len(binner.bins_())}")

> PAVA blocks: 10
> Final bins: 4

🎛️ Advanced Configuration

Custom Constraints

# Fractional constraints (adaptive to data size)
constraints = BinningConstraints(
    max_bins=8,
    min_samples=0.05,     # 5% of total samples
    max_samples=0.30,     # 30% of total samples
    min_positives=0.01    # 1% of positive samples
)

# Absolute constraints (fixed values)
constraints = BinningConstraints(
    max_bins=5,
    min_samples=100,      # At least 100 samples per bin
    max_samples=500       # At most 500 samples per bin
)

Handling Special Values

# Exclude special codes from binning
age_binner = MonotonicBinner(
    df=df,
    x='Age',
    y='default',
    constraints= constraints,
    exclude_values=[-999, -1, 0]  # Treat as separate bins
).fit()

Transform New Data

new_data = pd.DataFrame({'age': [25, 45, 65]})

# Get bin assignments
bins = age_binner.transform(new_data['age'], assign='interval')
print(bins)
# Output:
# 0    (-inf, 26)
# 1      [35, 75)
# 2      [35, 75)
# Name: age, dtype: object

# Get WoE values for scoring
print(age_binner.transform(new_data['age'], assign='woe'))
# Output:
# 0   -0.526748
# 1    0.306015
# 2    0.306015

📈 Use Cases

MOBPY is ideal for:

  • Credit Risk Modeling: Create monotonic risk score bins for regulatory compliance
  • Insurance Pricing: Develop age/risk factor bands with clear premium progression
  • Customer Segmentation: Build ordered customer value tiers
  • Feature Engineering: Generate interpretable binned features for ML models
  • Regulatory Reporting: Ensure transparent, monotonic relationships in models

📚 Documentation

🧪 Testing

# Run unit tests
pytest -vv -ignore-userwarnings -q

📖 Reference

👥 Authors

  1. Ta-Hung (Denny) Chen

  2. Yu-Cheng (Darren) Tsai

  3. Peter Chen

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mobpy-2.1.0.tar.gz (422.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mobpy-2.1.0-py3-none-any.whl (55.8 kB view details)

Uploaded Python 3

File details

Details for the file mobpy-2.1.0.tar.gz.

File metadata

  • Download URL: mobpy-2.1.0.tar.gz
  • Upload date:
  • Size: 422.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for mobpy-2.1.0.tar.gz
Algorithm Hash digest
SHA256 ee35582659244aa18f5637306c610e186dff120761100d66fcb8cdd025ee1c3c
MD5 e83042ffe1c2febd00d1aa0b93a0fc31
BLAKE2b-256 6f17fa9b8e069a3d33009c9bad21214d4f0452d4ec71833a5468db58ef002b10

See more details on using hashes here.

File details

Details for the file mobpy-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: mobpy-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 55.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for mobpy-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f454fa46c21c7b39fc2ec4c1243d74c9f5aa8629bf2a32e74117c611e84ac20e
MD5 088bc589f303523d0084a7c3369239ec
BLAKE2b-256 554d17f1dbeff1b57f522838fe75dee62523f0c0198c0c138bb116b02a34d375

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page