Skip to main content

Optimal binning and conformal prediction for regression

Reason this release was yanked:

Release contained unneeded files

Project description

Optimal Binning for Regression via LOO-CRPS

Research project extending the Venn-ABERS calibrated prediction framework to regression (continuous response, real-valued covariate).

Overview

Given training data $(x_i, y_i)$, we sort by $x$, partition the observations into $K$ contiguous bins, and use the within-bin empirical CDF as the predictive distribution. The three main contributions are:

  1. Optimal binning: bin boundaries are chosen to minimise the total leave-one-out CRPS via dynamic programming in $O(n^2 K)$ time.
  2. Cross-validated K selection: the number of bins $K$ is selected by test CRPS on a held-out split; within-sample LOO-CRPS is gameable and must not be used for model selection.
  3. Conformal prediction: the bin ECDF is wrapped in a full conformal predictor (CRPS as nonconformity score) to obtain finite-sample coverage guarantees.

The Venn prediction band — the family of augmented ECDFs as the hypothetical label varies — is also formalised as the direct regression analog of the Venn-ABERS interval.

Package usage

import numpy as np
from crpsconfreg import BinningPredictor

# x and y must be sorted by x
pred = BinningPredictor().fit(x_train, y_train, K_max=20)  # CV selects K*

# Conformal prediction interval with coverage >= 1 - epsilon
lo, hi = pred.predict_interval(x_new, epsilon=0.10)

# Conformal p-value for an observed (x*, y*) pair
pvals = pred.conformal_pvalue(x_new, y_new)

# Within-bin empirical CDF
cdf = pred.predict_ecdf(x_new, t_grid)

print(f"K* = {pred.K_}, bin sizes = {pred.bin_sizes()}")

Pass K=k to fit() to fix the number of bins and bypass cross-validation.

Installation

uv sync
# or
pip install -e .

Repository structure

Path Description
src/crpsconfreg/ Python package
src/crpsconfreg/binning.py LOO-CRPS cost, DP, bin boundaries
src/crpsconfreg/selection.py CRPS scoring, CV K selection
src/crpsconfreg/conformal.py Conformal p-values (vectorised), prediction intervals
src/crpsconfreg/predictor.py BinningPredictor high-level class
tests/ 61 pytest tests
dp_formulation.tex / .pdf Full write-up: derivations, CV K selection, Venn band, conformal prediction, numerical illustration
conformal_binning.ipynb Clean notebook: CV K selection, Venn band, p-value curves, fan plot, coverage check
optimal_binning.ipynb Original exploratory notebook (includes Bayesian regularisation exploration)
save_figures.py Reproduces all figures in figures/ as PDFs
showcase.ipynb Interactive notebook: edit one config cell to plug in any DGP
demo/app.py Panel + Pyodide static web demo (Python in the browser)
demo-js/ TypeScript + Vite + Plotly.js web demo (instant load, no WASM)
docs/ Generated static site served by GitHub Pages

Key results

  • LOO-CRPS cost of a bin $S$ with $m \ge 2$ observations: $\mathrm{cost}(S) = \frac{m}{(m-1)^2} \sum_{\ell < r} |y_\ell - y_r|$
  • CV selects $K^*=3$ on the heteroscedastic example ($n=200$, $Y|X=x \sim \mathcal{N}(x,(1+x)^2)$), with bin sizes 51/60/89
  • Empirical coverage at $\varepsilon=0.10$: 91.0% (target $\ge$ 90%) on a 2000-point test set

Interactive showcase

showcase.ipynb lets you plug in any data-generating process and immediately see the full pipeline — K selection, partition, predictive CDFs, conformal p-value curves, prediction bands, and empirical coverage. Edit only the configuration cell at the top:

def ygiven_x(x, rng):          # sample Y | X = x
    return rng.normal(loc=x, scale=0.5 + x)

def true_quantile(x, p):       # oracle p-quantile (or set to None)
    from scipy.stats import norm
    return norm.ppf(p, loc=x, scale=0.5 + x)

n_train, x_lo, x_hi = 300, 0.0, 3.0
K_max, epsilon, seed = 20, 0.10, 42

Three alternatives are included as comments: skewed (gamma), bimodal mixture, and sinusoidal mean.

Live demo

A static web demo runs entirely in the browser with no server and no Python runtime. It is built with TypeScript, Vite, and Plotly.js (demo-js/), giving instant page loads. Four DGP presets are included; you can also write a custom Data Generating Process in the editor. All plots from the notebook are reproduced.

Enable GitHub Pages (Settings → Pages → Source: docs/ on js-demo) to host it at https://ptocca.github.io/RegressionVenn/.

To rebuild docs/ after editing the demo source:

cd demo-js
npm install      # first time only
npm run build    # outputs to ../docs/

The earlier Panel + Pyodide demo (demo/app.py, branch master) is still available and can be rebuilt with bash demo/build.sh.

Running the tests

uv run pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crpsconfreg-0.2.0.tar.gz (39.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crpsconfreg-0.2.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file crpsconfreg-0.2.0.tar.gz.

File metadata

  • Download URL: crpsconfreg-0.2.0.tar.gz
  • Upload date:
  • Size: 39.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crpsconfreg-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bbc55737c2ed5dfed14d8a5b099caaf03dd50716201262b3681087b9036a8a56
MD5 2acbceddebaa2813a6f5d9adaef62dfb
BLAKE2b-256 f40bd077a1510c26fcceb79597eb6d814926318d015f88e4dd886f455737fea0

See more details on using hashes here.

File details

Details for the file crpsconfreg-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: crpsconfreg-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crpsconfreg-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 762a6ec00d9624a5eb4130b6227b323ce7948d42c381bc8fee40f1c21380c5de
MD5 30f0a47d1cbf3aabaae5e73d7da2357c
BLAKE2b-256 b586102dafe77ee1842c5562390fbcd06e3d6569af3e0ae709e4408f40d82671

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page