Advanced probability calibration techniques for machine learning models
Project description
Calibre: Advanced Probability Calibration
Calibre provides advanced probability calibration techniques that go beyond traditional isotonic regression. It offers multiple methods to balance monotonicity and granularity preservation, giving you fine control over your model's probability estimates.
While techniques like isotonic regression have been standard for this task, they come with significant limitations:
- Loss of granularity: Traditional isotonic regression often collapses many distinct probability values into a small number of unique values, which can be problematic for decision-making.
- Rigid monotonicity: Perfect monotonicity might not always be necessary or beneficial; small violations might be acceptable if they better preserve the information content of the original predictions.
Calibre addresses these limitations by implementing a suite of advanced calibration techniques that provide more nuanced control over model probability calibration. Its methods are designed to preserve granularity while still favoring a generally monotonic trend.
| Method | Description | Key Strength | Use When |
|---|---|---|---|
| IsotonicCalibrator | Standard isotonic regression with diagnostic support | Fast, monotonic guarantee | You need strict monotonicity |
| NearlyIsotonicCalibrator | Allows controlled monotonicity violations | Preserves granularity | You want to balance monotonicity vs information |
| SplineCalibrator | Smooth calibration using I-splines | Smooth, differentiable | You need smooth probability curves |
| RelaxedPAVACalibrator | Ignores small violations below threshold | Fast, practical | You want to ignore noise-level violations |
| RegularizedIsotonicCalibrator | L2 regularized isotonic regression | Reduces overfitting | You have limited calibration data |
| SmoothedIsotonicCalibrator | Post-smoothing of isotonic output | Reduces staircase effect | You want smooth output from isotonic |
| CDIIsotonicCalibrator | Cost & data-informed isotonic (research) | Decision-aware | You have specific decision thresholds |
🚀 Quick Start
pip install calibre
from calibre import IsotonicCalibrator
import numpy as np
# Your model's probability predictions and true labels
y_pred = np.array([0.2, 0.3, 0.4, 0.6, 0.7, 0.8])
y_true = np.array([0, 0, 0, 1, 1, 1])
# Calibrate probabilities
cal = IsotonicCalibrator()
cal.fit(y_pred, y_true)
y_calibrated = cal.transform(y_pred)
Usage Examples
Compare Different Methods
from calibre import (
IsotonicCalibrator,
NearlyIsotonicCalibrator,
SplineCalibrator
)
import numpy as np
# Generate example data
np.random.seed(42)
n = 1000
X = np.random.uniform(0, 1, n)
y = np.random.binomial(1, X, n)
# Try different calibrators
calibrators = {
'Isotonic': IsotonicCalibrator(),
'Nearly Isotonic': NearlyIsotonicCalibrator(lam=1.0),
'Spline': SplineCalibrator(n_splines=10)
}
for name, cal in calibrators.items():
cal.fit(X, y)
y_cal = cal.transform(X)
n_unique = len(np.unique(y_cal))
print(f"{name}: {n_unique} unique values")
Diagnostic-Enabled Calibration
from calibre import IsotonicCalibrator
# Enable automatic plateau diagnostics
cal = IsotonicCalibrator(enable_diagnostics=True)
cal.fit(X, y)
# Check if calibration has problematic plateaus
if cal.has_diagnostics():
print(cal.diagnostic_summary())
# Output: "2 plateaus detected: 1 supported, 1 limited-data"
Fine-Tuning Monotonicity
from calibre import NearlyIsotonicCalibrator
# Strict monotonicity (λ=10)
cal_strict = NearlyIsotonicCalibrator(lam=10.0)
# Relaxed monotonicity (λ=0.1)
cal_relaxed = NearlyIsotonicCalibrator(lam=0.1)
# The relaxed version preserves more granularity
# while strict version ensures stronger monotonicity
🎯 Choosing the Right Method
graph TD
A[Start] --> B{Need strict<br/>monotonicity?}
B -->|Yes| C{Have enough<br/>data?}
B -->|No| D{Want smooth<br/>function?}
C -->|Yes| E[IsotonicCalibrator]
C -->|No| F[RegularizedIsotonicCalibrator]
D -->|Yes| G[SplineCalibrator]
D -->|No| H[NearlyIsotonicCalibrator]
📈 Evaluation Metrics
from calibre.metrics import (
expected_calibration_error,
brier_score,
calibration_diversity_index
)
# Measure calibration quality
ece = expected_calibration_error(y_true, y_calibrated)
bs = brier_score(y_true, y_calibrated)
diversity = calibration_diversity_index(y_calibrated)
print(f"ECE: {ece:.4f}, Brier: {bs:.4f}, Diversity: {diversity:.4f}")
🔬 Advanced Features
Plateau Diagnostics
Calibre can automatically detect and classify plateaus in calibration curves:
from calibre import run_plateau_diagnostics
# Analyze plateaus in your calibration
results = run_plateau_diagnostics(X_train, y_train, y_calibrated)
for plateau in results:
print(f"Plateau at [{plateau['start']:.2f}, {plateau['end']:.2f}]: "
f"{plateau['classification']} (confidence: {plateau['confidence']:.2f})")
Metrics Overview
- Calibration Errors:
mean_calibration_error,expected_calibration_error,maximum_calibration_error - Overall Performance:
brier_score,calibration_curve - Granularity Preservation:
calibration_diversity_index,unique_value_counts - Diagnostic Metrics:
plateau_quality_score,tie_preservation_score
📚 Documentation & Resources
🛠️ Installation
Requirements
- Python 3.11+
- NumPy, SciPy, scikit-learn
- CVXPY (for NearlyIsotonicCalibrator)
Development Installation
git clone https://github.com/finite-sample/calibre.git
cd calibre
pip install -e ".[dev]"
📄 License
MIT License - see LICENSE file for details.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
📖 Citation
If you use Calibre in your research, please cite:
@software{calibre2024,
title = {Calibre: Advanced Probability Calibration for Machine Learning},
author = {Sood, Gaurav},
year = {2024},
url = {https://github.com/finite-sample/calibre}
}
🔗 See Also
🔗 Adjacent Repositories
- gojiplus/robust_pava — Increase uniqueness in isotonic regression by ignoring small violations
- gojiplus/pyppur — pyppur: Python Projection Pursuit Unsupervised (Dimension) Reduction To Min. Reconstruction Loss or DIstance DIstortion
- gojiplus/rmcp — R MCP Server
- gojiplus/bloomjoin — bloomjoin: An R package implementing Bloom filter-based joins for improved performance with large datasets.
- gojiplus/incline — Estimate Trend at a Point in a Noisy Time Series
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file calibre-0.6.0.tar.gz.
File metadata
- Download URL: calibre-0.6.0.tar.gz
- Upload date:
- Size: 37.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e415760bf139000c6a658f731b2d4a2a9418364b308324d577a5d3175d31ea07
|
|
| MD5 |
d0d907039c6498a4d24a58e9703c60f8
|
|
| BLAKE2b-256 |
9d11374204e5adbbe0233c0aeab6460404af81155d3b2423b71035eb7537e3f1
|
Provenance
The following attestation bundles were made for calibre-0.6.0.tar.gz:
Publisher:
python-publish.yml on finite-sample/calibre
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
calibre-0.6.0.tar.gz -
Subject digest:
e415760bf139000c6a658f731b2d4a2a9418364b308324d577a5d3175d31ea07 - Sigstore transparency entry: 779979854
- Sigstore integration time:
-
Permalink:
finite-sample/calibre@7df767151eacbb28f3355e0965d9f90fc7c3b8b5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/finite-sample
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7df767151eacbb28f3355e0965d9f90fc7c3b8b5 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file calibre-0.6.0-py3-none-any.whl.
File metadata
- Download URL: calibre-0.6.0-py3-none-any.whl
- Upload date:
- Size: 48.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
917cf0590c7e06816daa032418e4feda89d32a1b3541560d16dbb2dd3f9fed16
|
|
| MD5 |
663def6a38d6becb24bf034730acce10
|
|
| BLAKE2b-256 |
a083f72e9b330afd40519bfe25421ee895bd3b4bbcede655c2538d8a24ab3541
|
Provenance
The following attestation bundles were made for calibre-0.6.0-py3-none-any.whl:
Publisher:
python-publish.yml on finite-sample/calibre
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
calibre-0.6.0-py3-none-any.whl -
Subject digest:
917cf0590c7e06816daa032418e4feda89d32a1b3541560d16dbb2dd3f9fed16 - Sigstore transparency entry: 779979856
- Sigstore integration time:
-
Permalink:
finite-sample/calibre@7df767151eacbb28f3355e0965d9f90fc7c3b8b5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/finite-sample
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7df767151eacbb28f3355e0965d9f90fc7c3b8b5 -
Trigger Event:
workflow_dispatch
-
Statement type: