Monotone optimal binning (MOB) via PAVA with constraints, plus plotting utilities.

These details have not been verified by PyPI

Project description

Monotonic-Optimal-Binning

MOBPY - Monotonic Optimal Binning for Python

A fast, deterministic Python library for creating monotonic optimal bins with respect to a target variable. MOBPY implements two distinct binning pipelines:

Numeric x — stack-based PAVA + constrained adjacent merging (Welch's t-test)
Categorical x — chi-square merging with multiple comparison correction (Holm by default)

🎯 Key Features

⚡ Fast & Deterministic: O(n log n) + O(n) PAVA for numeric; O(k²) chi-square merging for categorical
🔀 Two Binning Paths: Numeric PAVA pipeline and categorical chi-square pipeline — unified API
📊 Monotonic Guarantee: Strict monotonicity between bins and target (numeric path)
🔧 Flexible Constraints: Min/max samples, min positives, min negatives, min/max bins — enforced on both paths
📈 WoE & IV Calculation: Automatic Weight of Evidence and Information Value for binary targets (all bins including Missing and Excluded)
🎨 Rich Visualizations: PAVA process plots, WoE bars, event rate charts, and plot_categorical_merge for the categorical path
♾️ Safe Edges: First bin at -∞, last at +∞ for numeric; full category-set coverage for categorical

📦 Installation

pip install MOBPY

For development installation:

git clone https://github.com/ChenTaHung/Monotonic-Optimal-Binning.git
cd Monotonic-Optimal-Binning
pip install -e .

🚀 Quick Start

Numeric Binning

import pandas as pd
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_bin_statistics
import matplotlib.pyplot as plt

df = pd.read_csv('data/german_data_credit_cat.csv')
df['default'] = df['default'] - 1  # convert 1/2 → 0/1

constraints = BinningConstraints(
    min_bins=4,
    max_bins=6,
    min_samples=0.05,     # at least 5% of total samples per bin
    min_positives=0.01,   # at least 1% of positives per bin
    min_negatives=0.01,   # at least 1% of negatives per bin (ensures stable WoE)
)

binner = MonotonicBinner(df=df, x='Durationinmonth', y='default',
                         constraints=constraints)
binner.fit()

summary = binner.summary_()
print(summary[['bucket', 'count', 'mean', 'woe', 'iv']])

Output:

    bucket      count  mean      woe         iv
0  (-inf, 9)      94  0.106  1.241870  0.106307
1  [9, 16)       337  0.234  0.335632  0.035238
2  [16, 45)      499  0.343 -0.193553  0.019342
3  [45, +inf)     70  0.571 -1.127082  0.102180

Categorical Binning

import pandas as pd
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_woe_bars, plot_categorical_merge
import matplotlib.pyplot as plt

df = pd.read_csv('data/transactions.csv')

binner = MonotonicBinner(
    df=df,
    x='merchant_category',
    y='is_fraud',
    x_type='categorical',          # activate chi-square merging
    categorical_alpha=0.05,
    categorical_correction='holm',
    constraints=BinningConstraints(max_bins=8, min_bins=2, min_samples=30),
    max_label_cats=3,              # truncate long bin labels: {A, B, C, ...+N}
)
binner.fit()

diag = binner.get_diagnostics()
print(f"{diag['n_initial_categories']} categories → {diag['n_final_bins']} bins")
print(f"Total IV: {binner.summary_()['iv'].sum():.4f}")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(18, 5))
plot_woe_bars(binner.summary_(), ax=axes[0], tick_labels='auto', show_iv=True)
plot_categorical_merge(binner, ax=axes[1], show_counts=False)
plt.tight_layout()
plt.show()

# Category → bin mapping
ba = binner.bin_assignment()
for bin_idx in sorted(ba.unique()):
    print(f"Bin {bin_idx} ({binner.bins_().loc[bin_idx, 'mean']:.1%}):",
          sorted(ba[ba == bin_idx].index))

📊 Visualization

Numeric binning — comprehensive analysis

from MOBPY.plot import plot_bin_statistics

fig = plot_bin_statistics(binner)
plt.show()

Binning Analysis

plot_bin_statistics creates a multi-panel view: WoE bars · event rate · sample distribution · bin boundaries on data.

Numeric binning — PAVA process

from MOBPY.plot import plot_pava_comparison

fig = plot_pava_comparison(binner)
plt.show()

Pava Comparison

Categorical binning — merge visualization

from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_categorical_merge
import matplotlib.pyplot as plt

binner = MonotonicBinner(
    # Please refer to examples/E-Commerce Fraud - Categorical Binning.ipynb
)
binner.fit()

fig, ax = plt.subplots(figsize=(20, 6))
plot_categorical_merge(
    binner,
    ax=ax,
    show_counts=False,   # 60 bars — skip per-bar counts to avoid clutter
)
plt.tight_layout()
plt.show()

Category Merge Result

plot_categorical_merge shows each original category as a bar, coloured by its final bin. Groups are separated by gaps; a dashed line spans each bin at its pooled event rate; the dotted line marks the overall mean.

🔬 Understanding the Algorithm

Numeric path (x_type='numeric', default)

Stage 1 — PAVA: Creates initial monotonic blocks by pooling adjacent violators.

Stage 2 — Constrained merging: Merges adjacent blocks (3 phases):

Statistical merging (Welch's t-test, respects max_bins)
min_samples enforcement (stop at min_bins floor)
min_positives / min_negatives enforcement (binary targets only)

print(f"PAVA blocks: {len(binner.pava_blocks_())}")
print(f"Final bins:  {len(binner.bins_())}")
# PAVA blocks: 10
# Final bins:  4

Categorical path (x_type='categorical')

Stage 1 — Chi-square merging: Pairs of category blocks are merged based on adjusted p-values (3 phases):

Statistical merging — chi-square + Holm correction, pair-result cache keeps total cost O(k²)
min_samples enforcement
min_positives / min_negatives enforcement

🎛️ Advanced Configuration

Constraints with class-count enforcement

# Fractional (adaptive to data size)
constraints = BinningConstraints(
    max_bins=8,
    min_samples=0.05,     # 5% of total samples
    max_samples=0.30,     # 30% of total samples
    min_positives=0.02,   # 2% of positive samples
    min_negatives=0.02,   # 2% of negative samples — prevents log(0) in WoE
)

# Absolute (fixed)
constraints = BinningConstraints(
    max_bins=5,
    min_samples=100,
    min_positives=20,
    min_negatives=50,
)

Handling special values

age_binner = MonotonicBinner(
    df=df,
    x='Age',
    y='default',
    constraints=constraints,
    exclude_values=[-999, -1, 0],   # reported as separate rows in summary_()
).fit()

Unseen categories (categorical path)

binner = MonotonicBinner(
    df=train_df, x='category', y='target',
    x_type='categorical',
    unseen_categories='error',     # raises ValueError for unseen values (default)
    # unseen_categories='unknown', # returns "Unknown" / NaN WoE instead
)
binner.fit()

# Transform test data — unseen categories handled gracefully
df['bin'] = binner.transform(test_df['category'], assign='interval')
df['woe'] = binner.transform(test_df['category'], assign='woe')

Transform new data

new_data = pd.DataFrame({'age': [25, 45, 65]})

# Bin label
print(binner.transform(new_data['age'], assign='interval'))
# 0    (-inf, 26)
# 1      [35, 75)
# 2      [35, 75)

# WoE score
print(binner.transform(new_data['age'], assign='woe'))
# 0   -0.526748
# 1    0.306015
# 2    0.306015

📈 Use Cases

MOBPY is ideal for:

Credit Risk Modeling: Create monotonic risk score bins for regulatory compliance
Insurance Pricing: Develop age/risk factor bands with clear premium progression
Customer Segmentation: Build ordered customer value tiers or merge categorical merchant types
Feature Engineering: Generate interpretable binned features for scorecards
Regulatory Reporting: Ensure transparent, monotonic relationships in models

📚 Documentation

API Reference — Project structure and workflow
MonotonicBinner — Full class API (numeric + categorical)
BinningConstraints — Constraint configuration
Categorical Merge Module — Chi-square algorithm details
Plot Module — All visualization functions
plot_categorical_merge — Categorical merge visualization
Examples & Tutorials — Jupyter notebooks with real-world examples

🧪 Testing

# Run all tests
.venv/bin/python -m pytest tests/ -q

📖 Reference

Mironchyk, Pavel, and Viktor Tchistiakov. Monotone optimal binning algorithm for credit risk modeling. (2017)
Smalbil, P. J. The choices of weights in the iterative convex minorant algorithm. (2015)
Testing Dataset 1: German Credit Risk from Kaggle
Testing Dataset 2: US Health Insurance Dataset from Kaggle
GitHub Project: Monotone Optimal Binning (SAS 9.4 version)

👥 Authors

Ta-Hung (Denny) Chen
- LinkedIn: https://www.linkedin.com/in/dennychen-tahung/
- E-mail: denny20700@gmail.com
Yu-Cheng (Darren) Tsai
- LinkedIn: https://www.linkedin.com/in/darren-yucheng-tsai/
Peter Chen
- LinkedIn: https://www.linkedin.com/in/peterchentsungwei/
- E-mail: peterwei20700@gmail.com

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.3.0

May 31, 2026

2.2.0

Feb 19, 2026

2.1.0

Nov 6, 2025

2.0.0

Aug 28, 2025

1.1.1

Aug 3, 2023

1.1.0

Aug 1, 2023

1.0.1

Jul 9, 2023

1.0.0

Jul 4, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mobpy-2.3.0.tar.gz (617.7 kB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mobpy-2.3.0-py3-none-any.whl (71.7 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file mobpy-2.3.0.tar.gz.

File metadata

Download URL: mobpy-2.3.0.tar.gz
Upload date: May 31, 2026
Size: 617.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mobpy-2.3.0.tar.gz
Algorithm	Hash digest
SHA256	`8c18fd25097e74e8bee245647e5f7bf545944a7ffcd58a8140fcecb9991ad76c`
MD5	`8c8fce35809e69b9c29ca936ba57f61c`
BLAKE2b-256	`b8253adcc5d1e657d035a6e824e5080304a49f6240ccbc3a67d147eb7165b814`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mobpy-2.3.0.tar.gz:

Publisher: Publish.yml on ChenTaHung/Monotonic-Optimal-Binning

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mobpy-2.3.0.tar.gz
- Subject digest: 8c18fd25097e74e8bee245647e5f7bf545944a7ffcd58a8140fcecb9991ad76c
- Sigstore transparency entry: 1686073664
- Sigstore integration time: May 31, 2026
Source repository:
- Permalink: ChenTaHung/Monotonic-Optimal-Binning@ecbf29f94ffd2583d5dc3da4ce23db97688c8e50
- Branch / Tag: refs/tags/v2.3.0
- Owner: https://github.com/ChenTaHung
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: Publish.yml@ecbf29f94ffd2583d5dc3da4ce23db97688c8e50
- Trigger Event: push

File details

Details for the file mobpy-2.3.0-py3-none-any.whl.

File metadata

Download URL: mobpy-2.3.0-py3-none-any.whl
Upload date: May 31, 2026
Size: 71.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mobpy-2.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ffbf878d4d40c90eee5c3cb5de62dd1ee8eff14f57a8015745009481cd3d0b5a`
MD5	`20cd41f0f17619cc0f8b1a46df30a6f5`
BLAKE2b-256	`31a2095af8ad1fc642450aec5d7f61e5ad0e528dc58cfccfbc705d3f1b4eb14a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mobpy-2.3.0-py3-none-any.whl:

Publisher: Publish.yml on ChenTaHung/Monotonic-Optimal-Binning

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mobpy-2.3.0-py3-none-any.whl
- Subject digest: ffbf878d4d40c90eee5c3cb5de62dd1ee8eff14f57a8015745009481cd3d0b5a
- Sigstore transparency entry: 1686074337
- Sigstore integration time: May 31, 2026
Source repository:
- Permalink: ChenTaHung/Monotonic-Optimal-Binning@ecbf29f94ffd2583d5dc3da4ce23db97688c8e50
- Branch / Tag: refs/tags/v2.3.0
- Owner: https://github.com/ChenTaHung
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: Publish.yml@ecbf29f94ffd2583d5dc3da4ce23db97688c8e50
- Trigger Event: push

MOBPY 2.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Monotonic-Optimal-Binning

MOBPY - Monotonic Optimal Binning for Python

🎯 Key Features

📦 Installation

🚀 Quick Start

Numeric Binning

Categorical Binning

📊 Visualization

Numeric binning — comprehensive analysis

Numeric binning — PAVA process

Categorical binning — merge visualization

🔬 Understanding the Algorithm

Numeric path (x_type='numeric', default)

Categorical path (x_type='categorical')

🎛️ Advanced Configuration

Constraints with class-count enforcement

Handling special values

Unseen categories (categorical path)

Transform new data

📈 Use Cases

📚 Documentation

🧪 Testing

📖 Reference

👥 Authors

📄 License

🤝 Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance