Skip to main content

Conformal coverage guarantees for any reward function — wrap a Python callable with provable (1-α) coverage in 5 lines.

Project description

vlabs-calibrate

Conformal coverage guarantees for any reward function. Five lines of Python.

vlabs-calibrate wraps any Python reward callable with a split-conformal prediction interval providing marginal (1 − α) coverage under exchangeability. Drop-in replacement for your reward function — get calibrated intervals plus a verified-coverage flag instead of a bare scalar.

The math is the split-conformal procedure of Lei et al. (2018). The pitch: every RL training run today ships uncalibrated rewards. vlabs-calibrate is the first piece of infrastructure to fix that.

0.1.0a1 — alpha. Public surface is stable for the documented use cases (continuous and binary reward functions). API may evolve in 0.2.0 once we add per-feature Mondrian conformal and exchangeability diagnostics.

Install

pip install vlabs-calibrate

Python >=3.10, single core dependency: numpy.

For development inside this monorepo:

pip install -e packages/vlabs-calibrate

Quickstart

import numpy as np
import vlabs_calibrate as vc

# Your reward function — could be anything; signature is open.
def my_reward(*, prompt: str, completion: str, ground_truth: str) -> float:
    return float(completion.strip() == ground_truth.strip())

# Synthesise a calibration set: noisy reference labels + per-trace sigma.
rng = np.random.default_rng(0)
traces = []
for i in range(200):
    completion = "4" if rng.random() < 0.8 else "5"
    sigma = 0.2
    reward = my_reward(prompt="2+2?", completion=completion, ground_truth="4")
    reference = float(np.clip(reward + sigma * rng.standard_normal(), 0.0, 1.0))
    traces.append({
        "prompt": "2+2?",
        "completion": completion,
        "ground_truth": "4",
        "reference_reward": reference,
        "uncertainty": sigma,
    })

# Calibrate — one line.
calibrated = vc.calibrate(my_reward, traces, alpha=0.1)

# Use anywhere — drop-in replacement for `my_reward`.
result = calibrated(prompt="2+2?", completion="4", ground_truth="4", sigma=0.2)
print(result.reward, result.interval, result.target_coverage)
# → 1.0  (lo, hi)  0.9

Public surface

name kind purpose
calibrate(fn, traces, *, alpha=0.1, ...) function builds a calibrated wrapper
CalibratedRewardFn dataclass / callable __call__ returns CalibrationResult; has .evaluate()
CalibrationResult frozen dataclass .reward, .interval, .sigma, .quantile, .alpha, .covered
CoverageReport frozen dataclass aggregate diagnostics from evaluate()
Trace TypedDict shape spec for calibration entries
vc.core submodule low-level conformal primitives
vc.nonconformity submodule built-in non-conformity scores + registry
vc.__version__ str package version

Built-in non-conformity scores

name formula when to use
scaled_residual (default) |reward − reference| / max(σ, eps) continuous reward + per-sample σ
abs_residual |reward − reference| continuous reward, no σ
binary 0.0 if reward == reference else 1.0 0/1 reward; see caveat below

Binary reward caveat. For 0/1 rewards the standard split-conformal guarantee is degenerate: the (1 − α) quantile is either 0 or 1, producing a trivial covered or [0, 1] interval. For binary tasks consider Mondrian / class-conditional conformal (Vovk & Gammerman) — planned for 0.2.0.

Tests

pip install -e "packages/vlabs-calibrate[dev]"
pytest packages/vlabs-calibrate/tests/

The new package's tests are not yet wired into the repo-root pytest run; that is intentional for 0.1.0a1 (Phase 15.B will add the path to root pyproject.toml and CI in a separate, scoped change).

License

Apache-2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlabs_calibrate-0.1.0a1.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlabs_calibrate-0.1.0a1-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file vlabs_calibrate-0.1.0a1.tar.gz.

File metadata

  • Download URL: vlabs_calibrate-0.1.0a1.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for vlabs_calibrate-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 0db2cf29a2cb4e4e8926e078ae2d65abafaf446b1ca3a6efdd3802834de1f954
MD5 4a8bb4fe91c0553fac1503c17ddc0b59
BLAKE2b-256 e367d7a1995c6bcd2536de577209562a88b79ebd8484b860a0a49dcb42a23077

See more details on using hashes here.

File details

Details for the file vlabs_calibrate-0.1.0a1-py3-none-any.whl.

File metadata

File hashes

Hashes for vlabs_calibrate-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 4f1991ec4c9e6b3b2dd6176d5bb957ce80084921bb3d76aa112f68d396db1d72
MD5 7b3272228c30316a984668170e75dda4
BLAKE2b-256 6db43111cb959d17607222fef3944f848eec6775270249ae7f38c41d93be039f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page