Skip to main content

A scikit-learn meta-estimator for computing tight conformal predictions

Project description

Open in Dev Containers Open in GitHub Codespaces

👖 Conformal Tights

A scikit-learn meta-estimator that adds conformal prediction of coherent quantiles and intervals to any scikit-learn regressor. Features:

  1. 🍬 Meta-estimator: add prediction of quantiles and intervals to any scikit-learn regressor
  2. 🌡️ Conformally calibrated: accurate quantiles, and intervals with reliable coverage
  3. 🚦 Coherent quantiles: quantiles increase monotonically instead of crossing each other
  4. 👖 Tight quantiles: selects the lowest dispersion that provides the desired coverage
  5. 🎁 Data efficient: requires only a small number of calibration examples to fit
  6. 🐼 Pandas support: optionally predict on DataFrames and receive DataFrame output

Using

Installing

First, install this package with:

pip install conformal-tights

Predicting quantiles

Conformal Tights exposes a meta-estimator called ConformalCoherentQuantileRegressor that you can use to wrap any scikit-learn regressor, after which you can use predict_quantiles to predict conformally calibrated quantiles. Example usage:

from conformal_tights import ConformalCoherentQuantileRegressor
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

# Fetch dataset and split in train and test
X, y = fetch_openml("ames_housing", version=1, return_X_y=True, as_frame=True, parser="auto")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)

# Create a regressor, wrap it, and fit on the train set
my_regressor = XGBRegressor(objective="reg:absoluteerror")
conformal_predictor = ConformalCoherentQuantileRegressor(estimator=my_regressor)
conformal_predictor.fit(X_train, y_train)

# Predict with the wrapped regressor
ŷ_test = conformal_predictor.predict(X_test)

# Predict quantiles with the conformal wrapper
ŷ_test_quantiles = conformal_predictor.predict_quantiles(X_test, quantiles=(0.025, 0.05, 0.1, 0.9, 0.95, 0.975))

When the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of ŷ_test_quantiles yields:

house_id 0.025 0.05 0.1 0.9 0.95 0.975
1357 114784 120894 131618 175761 188052 205449
2367 67417 80074 86754 117854 127583 142322
2822 119423 132048 138725 178526 197246 214206
2126 94031 99850 110891 150249 164703 182528
1544 68996 81516 88232 121774 132425 147110

Let's visualize the predicted quantiles on the test set:

Expand to see the code that generated the graph above
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

%config InlineBackend.figure_format = "retina"
plt.rcParams["font.size"] = 8
idx = (-ŷ_test.sample(50, random_state=42)).sort_values().index
y_ticks = list(range(1, len(idx) + 1))
plt.figure(figsize=(4, 5))
for j in range(3):
    end = ŷ_test_quantiles.shape[1] - 1 - j
    coverage = round(100 * (ŷ_test_quantiles.columns[end] - ŷ_test_quantiles.columns[j]))
    plt.barh(
        y_ticks,
        ŷ_test_quantiles.loc[idx].iloc[:, end] - ŷ_test_quantiles.loc[idx].iloc[:, j],
        left=ŷ_test_quantiles.loc[idx].iloc[:, j],
        label=f"{coverage}% Prediction interval",
        color=["#b3d9ff", "#86bfff", "#4da6ff"][j],
    )
plt.plot(y_test.loc[idx], y_ticks, "s", markersize=3, markerfacecolor="none", markeredgecolor="#e74c3c", label="Actual value")
plt.plot(ŷ_test.loc[idx], y_ticks, "s", color="blue", markersize=0.6, label="Predicted value")
plt.xlabel("House price")
plt.ylabel("Test house index")
plt.yticks(y_ticks, y_ticks)
plt.tick_params(axis="y", labelsize=6)
plt.grid(axis="x", color="lightsteelblue", linestyle=":", linewidth=0.5)
plt.gca().xaxis.set_major_formatter(ticker.StrMethodFormatter("${x:,.0f}"))
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)
plt.legend()
plt.tight_layout()
plt.show()

Predicting intervals

In addition to quantile prediction, you can use predict_interval to predict conformally calibrated prediction intervals. Compared to quantiles, these focus on reliable coverage over quantile accuracy. Example usage:

# Predict an interval for each example with the conformal wrapper
ŷ_test_interval = conformal_predictor.predict_interval(X_test, coverage=0.95)

# Measure the coverage of the prediction intervals on the test set
coverage = ((ŷ_test_interval.iloc[:, 0] <= y_test) & (y_test <= ŷ_test_interval.iloc[:, 1])).mean()
print(coverage)  # 96.6%

When the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of ŷ_test_interval yields:

house_id 0.025 0.975
1357 107203 206290
2367 66665 146005
2822 115592 220315
2126 85288 183038
1544 67890 150646

Contributing

Prerequisites
1. Set up Git to use SSH
  1. Generate an SSH key and add the SSH key to your GitHub account.
  2. Configure SSH to automatically load your SSH keys:
    cat << EOF >> ~/.ssh/config
    Host *
      AddKeysToAgent yes
      IgnoreUnknown UseKeychain
      UseKeychain yes
    EOF
    
2. Install Docker
  1. Install Docker Desktop.
3. Install VS Code or PyCharm
  1. Install VS Code and VS Code's Dev Containers extension. Alternatively, install PyCharm.
  2. Optional: install a Nerd Font such as FiraCode Nerd Font and configure VS Code or configure PyCharm to use it.
Development environments

The following development environments are supported:

  1. ⭐️ GitHub Codespaces: click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.
  2. ⭐️ Dev Container (with container volume): click on Open in Dev Containers to clone this repository in a container volume and create a Dev Container with VS Code.
  3. Dev Container: clone this repository, open it with VS Code, and run Ctrl/⌘ + + PDev Containers: Reopen in Container.
  4. PyCharm: clone this repository, open it with PyCharm, and configure Docker Compose as a remote interpreter with the dev service.
  5. Terminal: clone this repository, open it with your terminal, and run docker compose up --detach dev to start a Dev Container in the background, and then run docker compose exec dev zsh to open a shell prompt in the Dev Container.
Developing
  • This project follows the Conventional Commits standard to automate Semantic Versioning and Keep A Changelog with Commitizen.
  • Run poe from within the development environment to print a list of Poe the Poet tasks available to run on this project.
  • Run poetry add {package} from within the development environment to install a run time dependency and add it to pyproject.toml and poetry.lock. Add --group test or --group dev to install a CI or development dependency, respectively.
  • Run poetry update from within the development environment to upgrade all dependencies to the latest versions allowed by pyproject.toml.
  • Run cz bump to bump the package's version, update the CHANGELOG.md, and create a git tag.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conformal_tights-0.2.2.tar.gz (17.5 kB view hashes)

Uploaded Source

Built Distribution

conformal_tights-0.2.2-py3-none-any.whl (14.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page