Skip to main content

Optimal binning and conformal prediction for regression

Project description

Optimal Binning for Regression via LOO-CRPS

Research project extending the Venn-ABERS calibrated prediction framework to regression (continuous response, real-valued covariate).

Overview

Given training data $(x_i, y_i)$, we sort by $x$, partition the observations into $K$ contiguous bins, and use the within-bin empirical CDF as the predictive distribution. The three main contributions are:

  1. Optimal binning: bin boundaries are chosen to minimise the total leave-one-out CRPS via dynamic programming in $O(n^2 K)$ time.
  2. Cross-validated K selection: the number of bins $K$ is selected by test CRPS on a held-out split; within-sample LOO-CRPS is gameable and must not be used for model selection.
  3. Conformal prediction: the bin ECDF is wrapped in a full conformal predictor (CRPS as nonconformity score) to obtain finite-sample coverage guarantees.

The Venn prediction band — the family of augmented ECDFs as the hypothetical label varies — is also formalised as the direct regression analog of the Venn-ABERS interval.

Package usage

import numpy as np
from crpsconfreg import BinningPredictor

# x and y must be sorted by x
pred = BinningPredictor().fit(x_train, y_train, K_max=20)  # CV selects K*

# Conformal prediction interval with coverage >= 1 - epsilon
lo, hi = pred.predict_interval(x_new, epsilon=0.10)

# Conformal p-value for an observed (x*, y*) pair
pvals = pred.conformal_pvalue(x_new, y_new)

# Within-bin empirical CDF
cdf = pred.predict_ecdf(x_new, t_grid)

print(f"K* = {pred.K_}, bin sizes = {pred.bin_sizes()}")

Pass K=k to fit() to fix the number of bins and bypass cross-validation.

Installation

uv sync
# or
pip install -e .

Repository structure

Path Description
src/crpsconfreg/ Python package
src/crpsconfreg/binning.py LOO-CRPS cost, DP, bin boundaries
src/crpsconfreg/selection.py CRPS scoring, CV K selection
src/crpsconfreg/conformal.py Conformal p-values (vectorised), prediction intervals
src/crpsconfreg/predictor.py BinningPredictor high-level class
tests/ 61 pytest tests
dp_formulation.tex / .pdf Full write-up: derivations, CV K selection, Venn band, conformal prediction, numerical illustration
conformal_binning.ipynb Clean notebook: CV K selection, Venn band, p-value curves, fan plot, coverage check
optimal_binning.ipynb Original exploratory notebook (includes Bayesian regularisation exploration)
save_figures.py Reproduces all figures in figures/ as PDFs
showcase.ipynb Interactive notebook: edit one config cell to plug in any DGP
demo/app.py Panel + Pyodide static web demo (Python in the browser)
demo-js/ TypeScript + Vite + Plotly.js web demo (instant load, no WASM)
docs/ Generated static site served by GitHub Pages

Key results

  • LOO-CRPS cost of a bin $S$ with $m \ge 2$ observations: $\mathrm{cost}(S) = \frac{m}{(m-1)^2} \sum_{\ell < r} |y_\ell - y_r|$
  • CV selects $K^*=3$ on the heteroscedastic example ($n=200$, $Y|X=x \sim \mathcal{N}(x,(1+x)^2)$), with bin sizes 51/60/89
  • Empirical coverage at $\varepsilon=0.10$: 91.0% (target $\ge$ 90%) on a 2000-point test set

Interactive showcase

showcase.ipynb lets you plug in any data-generating process and immediately see the full pipeline — K selection, partition, predictive CDFs, conformal p-value curves, prediction bands, and empirical coverage. Edit only the configuration cell at the top:

def ygiven_x(x, rng):          # sample Y | X = x
    return rng.normal(loc=x, scale=0.5 + x)

def true_quantile(x, p):       # oracle p-quantile (or set to None)
    from scipy.stats import norm
    return norm.ppf(p, loc=x, scale=0.5 + x)

n_train, x_lo, x_hi = 300, 0.0, 3.0
K_max, epsilon, seed = 20, 0.10, 42

Three alternatives are included as comments: skewed (gamma), bimodal mixture, and sinusoidal mean.

Live demo

A static web demo runs entirely in the browser with no server and no Python runtime. It is built with TypeScript, Vite, and Plotly.js (demo-js/), giving instant page loads. Four DGP presets are included; you can also write a custom Data Generating Process in the editor. All plots from the notebook are reproduced.

Enable GitHub Pages (Settings → Pages → Source: docs/ on js-demo) to host it at https://ptocca.github.io/RegressionVenn/.

To rebuild docs/ after editing the demo source:

cd demo-js
npm install      # first time only
npm run build    # outputs to ../docs/

The earlier Panel + Pyodide demo (demo/app.py, branch master) is still available and can be rebuilt with bash demo/build.sh.

Running the tests

uv run pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crpsconfreg-0.2.1.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crpsconfreg-0.2.1-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file crpsconfreg-0.2.1.tar.gz.

File metadata

  • Download URL: crpsconfreg-0.2.1.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crpsconfreg-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4ba455ab816d15b4ee4f7f4c562e15d161f18f07fff913fce1842b362cd8c9bd
MD5 44445c791549eedf181b825254ff0ba2
BLAKE2b-256 0721c29c7484d02eda3fec8a3d5d25434e48d49872a63dd4743773aa258decb3

See more details on using hashes here.

File details

Details for the file crpsconfreg-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: crpsconfreg-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crpsconfreg-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bc8154ae4786e8ba0eb3d9dd7d0d4ed5e547d2dd3f1bc125927ebca32f97ffaf
MD5 6fc910598c6f9a772b3f7fa0330b9b61
BLAKE2b-256 f2e457bd7037b624913d4b522af5f4781239fbab98d13cc4e03f03f47d037fc3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page