Skip to main content

A scikit-learn-compatible modulefor estimating prediction intervals.

Project description

GitHubActions Codecov CircleCI ReadTheDocs License PythonVersion PyPi

https://github.com/simai-ml/MAPIE/raw/master/doc/images/mapie_logo_nobg_cut.png

MAPIE - Model Agnostic Prediction Interval Estimator

MAPIE allows you to easily estimate prediction intervals on single-output data using your favourite scikit-learn-compatible regressor.

Prediction intervals output by MAPIE encompass both aleatoric and epistemic uncertainty and are backed by strong theoretical guarantees [1].

🔗 Requirements

Python 3.7+

MAPIE stands on the shoulders of giants.

Its only internal dependency is scikit-learn.

🛠 Installation

Install via pip:

pip install mapie

To install directly from the github repository :

pip install git+https://github.com/simai-ml/MAPIE

⚡️ Quickstart

Let us start with a basic regression problem. Here, we generate one-dimensional noisy data that we fit with a linear model.

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

regressor = LinearRegression()
X, y = make_regression(n_samples=500, n_features=1, noise=20, random_state=59)

Since MAPIE is compliant with the standard scikit-learn API, we follow the standard sequential fit and predict process like any scikit-learn regressor. We set two values for alpha to estimate prediction intervals at approximately one and two standard deviations from the mean.

from mapie.estimators import MapieRegressor
alpha = [0.05, 0.32]
mapie = MapieRegressor(regressor)
mapie.fit(X, y)
y_pred, y_pis = mapie.predict(X, alpha=alpha)

MAPIE returns a np.ndarray of shape (n_samples, 3, len(alpha)) giving the predictions, as well as the lower and upper bounds of the prediction intervals for the target quantile for each desired alpha value. The estimated prediction intervals can then be plotted as follows.

from matplotlib import pyplot as plt
from mapie.metrics import coverage_score
plt.xlabel("x")
plt.ylabel("y")
plt.scatter(X, y, alpha=0.3)
plt.plot(X, y_pred, color="C1")
order = np.argsort(X[:, 0])
plt.plot(X[order], y_pis[order][:, 0, 1], color="C1", ls="--")
plt.plot(X[order], y_pis[order][:, 1, 1], color="C1", ls="--")
plt.fill_between(
    X[order].ravel(),
    y_pis[order][:, 0, 0].ravel(),
    y_pis[order][:, 1, 0].ravel(),
    alpha=0.2
)
coverage_scores = [
    coverage_score(y, y_pis[:, 0, i], y_pis[:, 1, i])
    for i, _ in enumerate(alpha)
]
plt.title(
    f"Target and effective coverages for "
    f"alpha={alpha[0]:.2f}: ({1-alpha[0]:.3f}, {coverage_scores[0]:.3f})\n"
    f"Target and effective coverages for "
    f"alpha={alpha[1]:.2f}: ({1-alpha[1]:.3f}, {coverage_scores[1]:.3f})"
)
plt.show()

The title of the plot compares the target coverages with the effective coverages. The target coverage, or the confidence interval, is the fraction of true labels lying in the prediction intervals that we aim to obtain for a given dataset. It is given by the alpha parameter defined in MapieRegressor, here equal to 0.05 and 0.32, thus giving target coverages of 0.95 and 0.68. The effective coverage is the actual fraction of true labels lying in the prediction intervals.

https://github.com/simai-ml/MAPIE/raw/master/doc/images/quickstart_1.png

📘 Documentation

How does MAPIE works ? It is basically based on cross-validation and relies on:

  • Residuals on the whole trainig set obtained by cross-validation,

  • Perturbed models generated during the cross-validation.

MAPIE then combines all these elements in a way that provides prediction intervals on new data with strong theoretical guarantees [1].

https://github.com/simai-ml/MAPIE/raw/master/doc/images/mapie_internals.png

The full documentation can be found on this link. It contains the following sections:

📝 Contributing

You are welcome to propose and contribute new ideas. We encourage you to open an issue so that we can align on the work to be done. It is generally a good idea to have a quick discussion before opening a pull request that is potentially out-of-scope. For more information on the contribution process, please go here.

🤝 Affiliations

MAPIE has been developed through a collaboration between Quantmetry, Michelin, and ENS Paris-Saclay with the financial support from Région Ile de France.

Quantmetry Michelin ENS IledeFrance

🔍 References

MAPIE methods are based on the work by Foygel-Barber et al. (2021).

[1] Rina Foygel Barber, Emmanuel J. Candès, Aaditya Ramdas, and Ryan J. Tibshirani. “Predictive inference with the jackknife+.” Ann. Statist., 49(1):486–507, February 2021.

📝 License

MAPIE is free and open-source software licensed under the 3-clause BSD license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MAPIE-0.2.2.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

MAPIE-0.2.2-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file MAPIE-0.2.2.tar.gz.

File metadata

  • Download URL: MAPIE-0.2.2.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for MAPIE-0.2.2.tar.gz
Algorithm Hash digest
SHA256 1b34c05dc9bec21997e0f25896f71c9cc11b660028e4c56e57fb49506acbc399
MD5 34d118ebd043e5d2fa60343b212cc119
BLAKE2b-256 507f679928a36c5d206011abca84b5de9162774b740443c03c04d4afce3eb3bf

See more details on using hashes here.

File details

Details for the file MAPIE-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: MAPIE-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for MAPIE-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 65a6c9bd0613466be1e81cbc848d231b35d29e3086cb07390023245340ffdd90
MD5 6e6184d37e6f69c9dfe20ae7c661ea64
BLAKE2b-256 2a2625940aada8457656805796771c86035eb827d914c7315dfdbad00465dc45

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page