Extract calibrated explanations from machine learning models.
Project description
Calibrated Explanations (Documentation)
Quick Reference
Purpose: Uncertainty-aware feature-importance explanations for scikit-learn compatible models.
Install:
pip install calibrated-explanations
Primary Use Cases: binary-classification, multiclass-classification, regression, probabilistic regression
Key Class (public API): WrapCalibratedExplainer
Required calibration: true (calibration set is mandatory).
All examples in this repo use WrapCalibratedExplainer.
Typical Workflow (3 lines):
from calibrated_explanations import WrapCalibratedExplainer
explainer = WrapCalibratedExplainer(model) # wrap your sklearn-like model
explainer.fit(x_proper, y_proper); explainer.calibrate(x_cal, y_cal)
explanation = explainer.explain_factual(x_test) # returns calibrated rules + uncertainty
Core Methods:
fit(x_proper, y_proper)— train/prepare internal state (model fitting or wrapper).calibrate(x_cal, y_cal, feature_names=None)— required: align uncertainty estimates.explain_factual(X)— factual rules + feature importance with [low, high] bounds.explore_alternatives(X)— counterfactual / alternative rules.predict_proba(X[, uq_interval=True])— calibrated probability (with uncertainty interval).predict(X[, uq_interval=True])— point prediction (with uncertainty interval).
Outputs: calibrated prediction intervals, per-feature importance with uncertainty bounds, factual/alternative rule tables.
Task map (critical: regression meanings differ)
Classification (binary/multiclass): Classification in this library is calibrated using Venn-Abers predictors.
- Calibrated probability:
predict_proba(x[, ...]) - Calibrated probability with uncertainty bounds using Venn-Abers:
predict_proba(x, uq_interval=True[, ...]) - Calibrated prediction:
predict(x[, ...]) - Explanations:
explain_factual(x[, ...])andexplore_alternatives(x[, ...])
Conformal interval regression (CPS) ← CE "regression": Regression in this library is conformal interval regression via Conformal Predictive Systems (CPS):
- CPS calibrated point regression:
predict(x[, ...]) - Point regression + calibrated uncertainty intervals = (conformal) interval regression:
predict(x, uq_interval=True, low_high_percentiles=(a, b)[, ...]). Note that one-sided intervals can be obtained by settinga=-np.Inforb=np.Inf. - You can also request CPS-controlled intervals from explanations:
explain_factual(x, low_high_percentiles=(a, b)[, ...])andexplore_alternatives(x, low_high_percentiles=(a, b)[, ...]) - Default:
low_high_percentiles= (5, 95) for 90% intervals.
Probabilistic regression (thresholded probability queries for y):
Probabilistic regression requires assigning a threshold:
- Threshold probability for real-valued target:
predict_proba(x, threshold=t[, ...])gives P(y <= t) - Within-spec probability for real-valued target:
predict_proba(x, threshold=(low, high)[, ...])gives P(low < y <= high) - Add uncertainty bounds with
uq_interval=True - Exceedance explanations:
explain_factual(x, threshold=t[, ...])andexplore_alternatives(x, threshold=t[, ...]) - Within-spec explanations:
explain_factual(x, threshold=(low, high)[, ...])andexplore_alternatives(x, threshold=(low, high)[, ...])
All tasks also support (core capability):
predict(x[, ...])andpredict(x, uq_interval=True[, ...])explain_factual(x[, ...])andexplore_alternatives(x[, ...])
Common optional parameters ([, ...]):
bins=...for conditional calibration. Can also set a Mondrian Calibrator (see crepes.extras.MondrianCategorizer)low_high_percentiles=(a, b)for CPS conformal interval regression intervalsthreshold=torthreshold=(low, high)for probabilistic regression
Local dev: run pip install -e . before running examples/tests locally.
When not to use: raw deep nets without an sklearn wrapper; real-time streaming without a calibration set; extremely high-dimensional (>10k) feature vectors.
Calibrated Explanations turns any scikit-learn-compatible estimator into a calibrated explainer that returns:
- Factual rules – the calibrated reasons your model backed its prediction.
- Alternative rules – what needs to change to flip or reinforce that decision, complete with uncertainty bounds.
- Prediction intervals – uncertainty-aware probabilities or regression ranges that quantify both aleatoric and epistemic risk.
Every quickstart, notebook, and benchmark follows the same recipe: fit your estimator, calibrate on held-out data, then interpret the returned rule table before acting.
Guarantees & Assumptions
- Calibration set required: A held-out calibration set (typically 20-25% of training data) is mandatory for all workflows.
- Interval invariant: All intervals satisfy
low <= predict <= high; violations trigger errors.- Uncertainty decomposition: Intervals capture both aleatoric (data) and epistemic (model) uncertainty.
- Calibration validity: Guarantees hold when calibration and test distributions match (exchangeability assumption).
See ADR-021 for formal semantics.
Your first calibrated explanation (≈5 minutes)
-
Install the essentials
python -m pip install calibrated-explanations
Optional extras:
Extra Purpose Key Packages [viz]Plotting and visualizations matplotlib[notebooks]Jupyter notebook support ipython,jupyter,nbconvert[eval]Reproducing benchmarks lime,shap,xgboost,scipy[external-plugins]High-performance plugins numpy>=1.24,pandas>=2.0,scikit-learn>=1.3Install with:
pip install "calibrated-explanations[viz,notebooks]" -
Run the quickstart – this mirrors the smoke-tested docs example.
from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from calibrated_explanations import WrapCalibratedExplainer dataset = load_breast_cancer() x_train, x_test, y_train, y_test = train_test_split( dataset.data, dataset.target, test_size=0.2, stratify=dataset.target, random_state=0, ) x_proper, x_cal, y_proper, y_cal = train_test_split( x_train, y_train, test_size=0.25, stratify=y_train, random_state=0, ) explainer = WrapCalibratedExplainer(RandomForestClassifier(random_state=0)) explainer.fit(x_proper, y_proper) explainer.calibrate(x_cal, y_cal, feature_names=dataset.feature_names) factual = explainer.explain_factual(x_test[:1]) alternatives = explainer.explore_alternatives(x_test[:1]) probabilities, probability_interval = explainer.predict_proba(x_test[:1], uq_interval=True) low, high = probability_interval print(f"Calibrated probability: {probabilities[0, 1]:.3f}") print(factual[0])
-
Check the output – the first factual explanation prints a calibrated rule table. A real run looks like:
Prediction [ Low , High] 0.077 [0.000, 0.083] Value : Feature Weight [ Low , High ] 0.07 : mean concave points > 0.05 -0.418 [-0.576, -0.256] 0.15 : worst concave points > 0.12 -0.308 [-0.548, 0.077] 0.34 : worst concavity > 0.22 -0.090 [-0.123, 0.077]
- The header row shows the calibrated prediction and its low/high uncertainty interval.
- Each subsequent line is a factual rule: the observed value, the matching feature, and its signed contribution with uncertainty bounds.
-
Interpret what you see – follow the Interpret Calibrated Explanations guide to learn how calibrated intervals, rule weights, and the triangular plot work together. The triangular alternatives tutorial then shows how to narrate trade-offs across alternative rules.
Mental model: fit → calibrate → explain → interpret
- Fit your preferred estimator.
- Calibrate with held-out data to align predicted and observed outcomes.
- Explain with
explain_factualfor calibrated rules andexplore_alternativesfor semi-, super-, and counterfactuals. - Interpret using the how-to guides so decisions account for both aleatoric and epistemic uncertainty.
This workflow is identical across binary, multiclass classification, as well as probabilistic, and interval regression tasks, the difference lies in how you configure the underlying estimator and read the returned intervals.
Choose your path
New practitioners (first run)
- Stay on this README quickstart, then open the classification quickstart for a notebook-friendly walk-through with the breast cancer dataset.
- Compare factual vs. alternative explanations using the triangular plot tutorial.
Practitioners (day-to-day usage)
- Follow the practitioner hub for production checklists, integration how-tos, and interpretation playbooks.
- Explore the probabilistic regression quickstart when you need calibrated thresholds.
- Opt into plugins only when needed via
pip install "calibrated-explanations[external-plugins]"—they remain optional extensions. Note:fastexplanations are experimental and provided via opt-in plugins; they are allowed in the schema for interoperability but are not promoted for primary practitioner workflows. Treatfastas an experimental, opt-in feature and preferfactual/alternativeworkflows for production use.
Agents (CE-first by default)
- Read
AGENTS.mdfor the minimal entrypoint. - Follow the CE-first guide in
docs/get-started/ce_first_agent_guide.md. - Use the helper module in
src/calibrated_explanations/ce_agent_utils.py.
Researchers
- Reproduce published studies through the researcher hub, which links directly to benchmark manifests, dataset splits, and evaluation notebooks.
- Fetch replication artefacts from the evaluation README and align with the release plan checkpoints.
- Cite the work using the ready-made entries in docs/citing.md.
Contributors
- Start with the contributor hub for development environment setup, plugin guardrails, and quality gates.
- Review the contributor hub before submitting pull requests.
Maintainers
- Track release readiness through the root-level
ROADMAP.md, docs/foundations/governance/release_checklist.md, and the implementation plan indocs/improvement/RELEASE_PLAN_v1.md. - Confirm Standards and ADR alignment via docs/improvement/standards/ and docs/improvement/adrs/ and keep docs navigation synced with the IA crosswalk.
Documentation map
- API reference – start with the API index, then browse CLI, plugin, serialization, and visualization references.
- Architecture overview – the architecture notes connect runtime components, telemetry, and plugin boundaries.
- Contributor guidance – see the contributor hub for setup, quality gates, and process notes.
- Release notes & changelog – check release notes and the project CHANGELOG.
- Plugin CLI – inspect registered plugins and trust state with
ce.plugins list all(see the CLI reference). - Project governance – review GOVERNANCE.md, SECURITY.md, and the Code of Conduct.
- Support – see SUPPORT.md for the fastest way to get help.
Licensing & Contributions
Contributions to this project are licensed under the same terms as the project itself (BSD 3-Clause). By contributing, you agree to the Developer Certificate of Origin (DCO) and that your contributions will be available under the project's license. See .github/CONTRIBUTING.md for details on how to sign off your commits.
Feature highlights
- Calibrated prediction confidence for binary and multiclass classification.
- Uncertainty-aware feature importance with aleatoric and epistemic bounds.
- Probabilistic and interval regression that mirrors the classification API.
- Alternative explanations with triangular plots for visualising trade-offs.
- Conjunctional and conditional rules for interaction and fairness analysis.
- Experimental plugin lane for
fastexplanations (opt-in only, not promoted for production—see practitioner notes above).
Installation options
python -m pip install calibrated-explanations # PyPI
conda install -c conda-forge calibrated-explanations # conda-forge, currently only v0.9.0
python -m pip install "calibrated-explanations[dev]" # local development tooling
python -m pip install "calibrated-explanations[viz]" # plotting extras
Python ≥3.8 is supported. Optional extras remain additive so the core package stays lightweight.
Research and reproducibility
- Set up the evaluation environment
python -m venv .venv source .venv/bin/activate python -m pip install --upgrade pip python -m pip install -e .[dev,eval]
The optional[eval]extras pull inxgboost,venn-abers, and plotting dependencies used across the published studies. - Load the benchmark assets – datasets live in the
data/directory (CSV files and zipped archives) and are referenced directly by the evaluation scripts. - Re-run the flagship experiments – each paper has a matching notebook or
script under
evaluation/:Classification_Experiment_sota.pyand the accompanying notebooks cover the 25-dataset binary classification suite.multiclass/andregression/host the multiclass and interval regression pipelines, respectively.ensure/andfastCE/contain the ensured-explanations and accelerated plugin studies. Result archives (*.pkl,.zip) sit beside each run for quick comparison.
- Keep results traceable – preserve the random seeds baked into the scripts
(typically
42or0) and record any deviations alongside the active ADRs noted indocs/improvement/adrs/. - Cite the sources – the theory & literature overview lists DOIs, arXiv IDs, and funding acknowledgements to include in your work.
Contributing and maintenance workflow
- Create a virtual environment
python -m venv .venv source .venv/bin/activate python -m pip install --upgrade pip python -m pip install -e .[dev] -c constraints.txt python -m pip install -r docs/requirements-doc.txt -c constraints.txt
- Run the quality gates locally
pytest ruff check . mypy src tests
- Build the documentation (optional but encouraged)
make -C docs html
- Open a pull request referencing the active milestone and relevant ADRs. The PR guide lists the checklist used during reviews.
- Review community health docs – contributions are expected to follow the Code of Conduct, the contribution licensing guidance in CONTRIBUTING, and the support/security policies in SUPPORT.md and SECURITY.md.
License and citation
- Licensed under the BSD 3-Clause License.
- Cite Calibrated Explanations using the entries in
CITATION.cffor docs/citing.md.
Acknowledgements & support
Funded by the Swedish Knowledge Foundation through the Knowledge Intensive Product Realization SPARK environment at Jönköping University. For questions or support, open an issue on GitHub or review the guidance in SUPPORT.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file calibrated_explanations-0.10.4.tar.gz.
File metadata
- Download URL: calibrated_explanations-0.10.4.tar.gz
- Upload date:
- Size: 372.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9373582f207c11a994cf1ac9d15a95aa9caa1300559e147cb25f67b1d6dce08
|
|
| MD5 |
8190a230a89ddca52db0fcbbf4c00c47
|
|
| BLAKE2b-256 |
9cb393e8c317ff489dbeae90ad28ba07486c5373f8af4a23cf431a4463594fb1
|
File details
Details for the file calibrated_explanations-0.10.4-py3-none-any.whl.
File metadata
- Download URL: calibrated_explanations-0.10.4-py3-none-any.whl
- Upload date:
- Size: 419.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
773519df387e05ceba31535916c9ac354dd04fc91d29edd3e7e87c4edb324ba6
|
|
| MD5 |
6c740cc1795a8f9837b20989ab5e9eb7
|
|
| BLAKE2b-256 |
5e6beefbc96de5f5ee679bcebab92ab5ef884b2437fcb3f51b0f342525666b2b
|