Skip to main content

Extract calibrated explanations from machine learning models.

Project description

Calibrated Explanations (Documentation)

Calibrated Explanations PyPI version

GitHub (Pre-)Release Date Docstring coverage License Downloads

Quick Reference

Purpose: Uncertainty-aware feature-importance explanations for scikit-learn compatible models.

Install:

pip install calibrated-explanations

Primary Use Cases: binary-classification, multiclass-classification, regression, probabilistic regression

Key Class (public API): WrapCalibratedExplainer

Required calibration: true (calibration set is mandatory).

All examples in this repo use WrapCalibratedExplainer.

Typical Workflow (3 lines):

from calibrated_explanations import WrapCalibratedExplainer
explainer = WrapCalibratedExplainer(model)           # wrap your sklearn-like model
explainer.fit(x_proper, y_proper); explainer.calibrate(x_cal, y_cal)
explanation = explainer.explain_factual(x_test)      # returns calibrated rules + uncertainty

Core Methods:

  • fit(x_proper, y_proper) — train/prepare internal state (model fitting or wrapper).
  • calibrate(x_cal, y_cal, feature_names=None) — required: align uncertainty estimates.
  • explain_factual(X) — factual rules + feature importance with [low, high] bounds.
  • explore_alternatives(X) — counterfactual / alternative rules.
  • predict_proba(X[, uq_interval=True]) — calibrated probability (with uncertainty interval).
  • predict(X[, uq_interval=True]) — point prediction (with uncertainty interval).

Outputs: calibrated prediction intervals, per-feature importance with uncertainty bounds, factual/alternative rule tables.

Task map (critical: regression meanings differ)

Classification (binary/multiclass): Classification in this library is calibrated using Venn-Abers predictors.

  • Calibrated probability: predict_proba(x[, ...])
  • Calibrated probability with uncertainty bounds using Venn-Abers: predict_proba(x, uq_interval=True[, ...])
  • Calibrated prediction: predict(x[, ...])
  • Explanations: explain_factual(x[, ...]) and explore_alternatives(x[, ...])

Conformal interval regression (CPS) ← CE "regression": Regression in this library is conformal interval regression via Conformal Predictive Systems (CPS):

  • CPS calibrated point regression: predict(x[, ...])
  • Point regression + calibrated uncertainty intervals = (conformal) interval regression: predict(x, uq_interval=True, low_high_percentiles=(a, b)[, ...]). Note that one-sided intervals can be obtained by setting a=-np.Inf or b=np.Inf.
  • You can also request CPS-controlled intervals from explanations: explain_factual(x, low_high_percentiles=(a, b)[, ...]) and explore_alternatives(x, low_high_percentiles=(a, b)[, ...])
  • Default: low_high_percentiles = (5, 95) for 90% intervals.

Probabilistic regression (thresholded probability queries for y): Probabilistic regression requires assigning a threshold:

  • Threshold probability for real-valued target: predict_proba(x, threshold=t[, ...]) gives P(y <= t)
  • Within-spec probability for real-valued target: predict_proba(x, threshold=(low, high)[, ...]) gives P(low < y <= high)
  • Add uncertainty bounds with uq_interval=True
  • Exceedance explanations: explain_factual(x, threshold=t[, ...]) and explore_alternatives(x, threshold=t[, ...])
  • Within-spec explanations: explain_factual(x, threshold=(low, high)[, ...]) and explore_alternatives(x, threshold=(low, high)[, ...])

All tasks also support (core capability):

  • predict(x[, ...]) and predict(x, uq_interval=True[, ...])
  • explain_factual(x[, ...]) and explore_alternatives(x[, ...])

Common optional parameters ([, ...]):

  • bins=... for conditional calibration. Can also set a Mondrian Calibrator (see crepes.extras.MondrianCategorizer)
  • low_high_percentiles=(a, b) for CPS conformal interval regression intervals
  • threshold=t or threshold=(low, high) for probabilistic regression

Local dev: run pip install -e . before running examples/tests locally.

When not to use: raw deep nets without an sklearn wrapper; real-time streaming without a calibration set; extremely high-dimensional (>10k) feature vectors.

Calibrated Explanations turns any scikit-learn-compatible estimator into a calibrated explainer that returns:

  • Factual rules – the calibrated reasons your model backed its prediction.
  • Alternative rules – what needs to change to flip or reinforce that decision, complete with uncertainty bounds.
  • Prediction intervals – uncertainty-aware probabilities or regression ranges that quantify both aleatoric and epistemic risk.

Every quickstart, notebook, and benchmark follows the same recipe: fit your estimator, calibrate on held-out data, then interpret the returned rule table before acting.

Guarantees & Assumptions

  • Calibration set required: A held-out calibration set (typically 20-25% of training data) is mandatory for all workflows.
  • Interval invariant: All intervals satisfy low <= predict <= high; violations trigger errors.
  • Uncertainty decomposition: Intervals capture both aleatoric (data) and epistemic (model) uncertainty.
  • Calibration validity: Guarantees hold when calibration and test distributions match (exchangeability assumption).

See ADR-021 for formal semantics.


Your first calibrated explanation (≈5 minutes)

  1. Install the essentials

    python -m pip install calibrated-explanations
    

    Optional extras:

    Extra Purpose Key Packages
    [viz] Plotting and visualizations matplotlib
    [notebooks] Jupyter notebook support ipython, jupyter, nbconvert
    [eval] Reproducing benchmarks lime, shap, xgboost, scipy
    [external-plugins] High-performance plugins numpy>=1.24, pandas>=2.0, scikit-learn>=1.3

    Install with: pip install "calibrated-explanations[viz,notebooks]"

  2. Run the quickstart – this mirrors the smoke-tested docs example.

    from sklearn.datasets import load_breast_cancer
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from calibrated_explanations import WrapCalibratedExplainer
    
    dataset = load_breast_cancer()
    x_train, x_test, y_train, y_test = train_test_split(
        dataset.data,
        dataset.target,
        test_size=0.2,
        stratify=dataset.target,
        random_state=0,
    )
    x_proper, x_cal, y_proper, y_cal = train_test_split(
        x_train,
        y_train,
        test_size=0.25,
        stratify=y_train,
        random_state=0,
    )
    
    explainer = WrapCalibratedExplainer(RandomForestClassifier(random_state=0))
    explainer.fit(x_proper, y_proper)
    explainer.calibrate(x_cal, y_cal, feature_names=dataset.feature_names)
    
    factual = explainer.explain_factual(x_test[:1])
    alternatives = explainer.explore_alternatives(x_test[:1])
    probabilities, probability_interval = explainer.predict_proba(x_test[:1], uq_interval=True)
    low, high = probability_interval
    print(f"Calibrated probability: {probabilities[0, 1]:.3f}")
    print(factual[0])
    
  3. Check the output – the first factual explanation prints a calibrated rule table. A real run looks like:

    Prediction [ Low ,  High]
    0.077 [0.000, 0.083]
    Value : Feature                                  Weight [ Low  ,  High ]
    0.07  : mean concave points > 0.05               -0.418 [-0.576, -0.256]
    0.15  : worst concave points > 0.12              -0.308 [-0.548,  0.077]
    0.34  : worst concavity > 0.22                   -0.090 [-0.123,  0.077]
    
    • The header row shows the calibrated prediction and its low/high uncertainty interval.
    • Each subsequent line is a factual rule: the observed value, the matching feature, and its signed contribution with uncertainty bounds.
  4. Interpret what you see – follow the Interpret Calibrated Explanations guide to learn how calibrated intervals, rule weights, and the triangular plot work together. The triangular alternatives tutorial then shows how to narrate trade-offs across alternative rules.


Mental model: fit → calibrate → explain → interpret

  1. Fit your preferred estimator.
  2. Calibrate with held-out data to align predicted and observed outcomes.
  3. Explain with explain_factual for calibrated rules and explore_alternatives for semi-, super-, and counterfactuals.
  4. Interpret using the how-to guides so decisions account for both aleatoric and epistemic uncertainty.

This workflow is identical across binary, multiclass classification, as well as probabilistic, and interval regression tasks, the difference lies in how you configure the underlying estimator and read the returned intervals.


Choose your path

New practitioners (first run)

Practitioners (day-to-day usage)

  • Follow the practitioner hub for production checklists, integration how-tos, and interpretation playbooks.
  • Explore the probabilistic regression quickstart when you need calibrated thresholds.
  • Opt into plugins only when needed via pip install "calibrated-explanations[external-plugins]"—they remain optional extensions. Note: fast explanations are experimental and provided via opt-in plugins; they are allowed in the schema for interoperability but are not promoted for primary practitioner workflows. Treat fast as an experimental, opt-in feature and prefer factual/alternative workflows for production use.

Agents (CE-first by default)

  • Read AGENTS.md for the minimal entrypoint.
  • Follow the CE-first guide in docs/get-started/ce_first_agent_guide.md.
  • Use the helper module in src/calibrated_explanations/ce_agent_utils.py.

Researchers

  • Reproduce published studies through the researcher hub, which links directly to benchmark manifests, dataset splits, and evaluation notebooks.
  • Fetch replication artefacts from the evaluation README and align with the release plan checkpoints.
  • Cite the work using the ready-made entries in docs/citing.md.

Contributors

  • Start with the contributor hub for development environment setup, plugin guardrails, and quality gates.
  • Review the contributor hub before submitting pull requests.

Maintainers


Documentation map

  • API reference – start with the API index, then browse CLI, plugin, serialization, and visualization references.
  • Architecture overview – the architecture notes connect runtime components, telemetry, and plugin boundaries.
  • Contributor guidance – see the contributor hub for setup, quality gates, and process notes.
  • Release notes & changelog – check release notes and the project CHANGELOG.
  • Plugin CLI – inspect registered plugins and trust state with ce.plugins list all (see the CLI reference).
  • Project governance – review GOVERNANCE.md, SECURITY.md, and the Code of Conduct.
  • Support – see SUPPORT.md for the fastest way to get help.

Licensing & Contributions

Contributions to this project are licensed under the same terms as the project itself (BSD 3-Clause). By contributing, you agree to the Developer Certificate of Origin (DCO) and that your contributions will be available under the project's license. See .github/CONTRIBUTING.md for details on how to sign off your commits.


Feature highlights

  • Calibrated prediction confidence for binary and multiclass classification.
  • Uncertainty-aware feature importance with aleatoric and epistemic bounds.
  • Probabilistic and interval regression that mirrors the classification API.
  • Alternative explanations with triangular plots for visualising trade-offs.
  • Conjunctional and conditional rules for interaction and fairness analysis.
  • Experimental plugin lane for fast explanations (opt-in only, not promoted for production—see practitioner notes above).

Installation options

python -m pip install calibrated-explanations           # PyPI
conda install -c conda-forge calibrated-explanations    # conda-forge, currently only v0.9.0
python -m pip install "calibrated-explanations[dev]"    # local development tooling
python -m pip install "calibrated-explanations[viz]"    # plotting extras

Python ≥3.8 is supported. Optional extras remain additive so the core package stays lightweight.


Research and reproducibility

  1. Set up the evaluation environment
    python -m venv .venv
    source .venv/bin/activate
    python -m pip install --upgrade pip
    python -m pip install -e .[dev,eval]
    
    The optional [eval] extras pull in xgboost, venn-abers, and plotting dependencies used across the published studies.
  2. Load the benchmark assets – datasets live in the data/ directory (CSV files and zipped archives) and are referenced directly by the evaluation scripts.
  3. Re-run the flagship experiments – each paper has a matching notebook or script under evaluation/:
    • Classification_Experiment_sota.py and the accompanying notebooks cover the 25-dataset binary classification suite.
    • multiclass/ and regression/ host the multiclass and interval regression pipelines, respectively.
    • ensure/ and fastCE/ contain the ensured-explanations and accelerated plugin studies. Result archives (*.pkl, .zip) sit beside each run for quick comparison.
  4. Keep results traceable – preserve the random seeds baked into the scripts (typically 42 or 0) and record any deviations alongside the active ADRs noted in docs/improvement/adrs/.
  5. Cite the sources – the theory & literature overview lists DOIs, arXiv IDs, and funding acknowledgements to include in your work.

Contributing and maintenance workflow

  1. Create a virtual environment
    python -m venv .venv
    source .venv/bin/activate
    python -m pip install --upgrade pip
    python -m pip install -e .[dev] -c constraints.txt
    python -m pip install -r docs/requirements-doc.txt -c constraints.txt
    
  2. Run the quality gates locally
    pytest
    ruff check .
    mypy src tests
    
  3. Build the documentation (optional but encouraged)
    make -C docs html
    
  4. Open a pull request referencing the active milestone and relevant ADRs. The PR guide lists the checklist used during reviews.
  5. Review community health docs – contributions are expected to follow the Code of Conduct, the contribution licensing guidance in CONTRIBUTING, and the support/security policies in SUPPORT.md and SECURITY.md.

License and citation


Acknowledgements & support

Funded by the Swedish Knowledge Foundation through the Knowledge Intensive Product Realization SPARK environment at Jönköping University. For questions or support, open an issue on GitHub or review the guidance in SUPPORT.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

calibrated_explanations-0.10.4.tar.gz (372.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

calibrated_explanations-0.10.4-py3-none-any.whl (419.5 kB view details)

Uploaded Python 3

File details

Details for the file calibrated_explanations-0.10.4.tar.gz.

File metadata

  • Download URL: calibrated_explanations-0.10.4.tar.gz
  • Upload date:
  • Size: 372.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for calibrated_explanations-0.10.4.tar.gz
Algorithm Hash digest
SHA256 b9373582f207c11a994cf1ac9d15a95aa9caa1300559e147cb25f67b1d6dce08
MD5 8190a230a89ddca52db0fcbbf4c00c47
BLAKE2b-256 9cb393e8c317ff489dbeae90ad28ba07486c5373f8af4a23cf431a4463594fb1

See more details on using hashes here.

File details

Details for the file calibrated_explanations-0.10.4-py3-none-any.whl.

File metadata

File hashes

Hashes for calibrated_explanations-0.10.4-py3-none-any.whl
Algorithm Hash digest
SHA256 773519df387e05ceba31535916c9ac354dd04fc91d29edd3e7e87c4edb324ba6
MD5 6c740cc1795a8f9837b20989ab5e9eb7
BLAKE2b-256 5e6beefbc96de5f5ee679bcebab92ab5ef884b2437fcb3f51b0f342525666b2b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page