Small Sample Beta Correction - PAC guarantees with small datasets
Project description
SSBC: Small-Sample Beta Correction
Small-Sample Beta Correction provides PAC (Probably Approximately Correct) guarantees for conformal prediction with small calibration sets.
- PyPI package: https://pypi.org/project/ssbc/
- Free software: MIT License
- Documentation: https://ssbc.readthedocs.io.
Overview
SSBC addresses the challenge of constructing valid prediction sets when you have limited calibration data. Traditional conformal prediction assumes large calibration sets, but in practice, data is often scarce. SSBC provides finite-sample PAC guarantees and rigorous operational bounds for deployment.
What Makes SSBC Unique?
Unlike asymptotic methods, SSBC provides:
-
Finite-Sample PAC Coverage (via SSBC algorithm)
- Rigorous guarantees that hold for ANY sample size
- Automatically adapts to class imbalance via Mondrian conformal prediction
- Example: "≥90% coverage with 95% probability" even with n=50
-
Rigorous Operational Bounds (via LOO-CV + Clopper-Pearson)
- PAC-controlled bounds on automation rates, error rates, escalation rates
- Confidence intervals account for estimation uncertainty
- Example: "Singleton rate [0.85, 0.97] with 90% PAC guarantee"
-
Uncertainty Quantification
- Bootstrap analysis for recalibration uncertainty
- Cross-conformal validation for finite-sample diagnostics
- Empirical validation for verifying theoretical guarantees
-
Contract-Ready Guarantees
- Transform theory into deployable systems
- Resource planning (human oversight needs)
- SLA compliance (performance bounds)
Core Statistical Properties
🎯 Distribution-Free: No assumptions about data distribution 🎯 Model-Agnostic: Works with ANY probabilistic classifier 🎯 Frequentist: Valid frequentist guarantees, no prior needed 🎯 Non-Bayesian: No Bayesian assumptions or hyperpriors 🎯 Finite-Sample: Exact guarantees for small n, not asymptotic 🎯 Exchangeability Only: Minimal assumption (test/calibration exchangeable)
📖 For detailed theory and deployment guide, see docs/theory.md
Installation
pip install ssbc
Or from source:
git clone https://github.com/phzwart/ssbc.git
cd ssbc
pip install -e .
Quick Start
Unified Workflow (Recommended)
The complete workflow is available through a single function:
from ssbc import BinaryClassifierSimulator, generate_rigorous_pac_report
# Generate or load calibration data
sim = BinaryClassifierSimulator(
p_class1=0.2,
beta_params_class0=(1, 7),
beta_params_class1=(5, 2),
seed=42
)
labels, probs = sim.generate(n_samples=100)
# Generate comprehensive PAC report with operational bounds
report = generate_rigorous_pac_report(
labels=labels,
probs=probs,
alpha_target=0.10, # Target 90% coverage
delta=0.10, # 90% PAC confidence
test_size=1000, # Expected deployment size
use_union_bound=True, # Simultaneous guarantees
# Optional uncertainty analyses
run_bootstrap=True, # Recalibration uncertainty
n_bootstrap=1000,
simulator=sim,
run_cross_conformal=True, # Finite-sample diagnostics
n_folds=10,
)
# Access results
pac_bounds = report['pac_bounds_marginal']
print(f"Singleton rate: {pac_bounds['singleton_rate_bounds']}")
print(f"Expected: {pac_bounds['expected_singleton_rate']:.3f}")
Output includes:
- ✅ PAC coverage guarantees (SSBC-corrected thresholds)
- ✅ Rigorous operational bounds (singleton, doublet, abstention, error rates)
- ✅ Per-class and marginal statistics
- ✅ Optional: Bootstrap uncertainty intervals
- ✅ Optional: Cross-conformal validation diagnostics
Core SSBC Algorithm
For fine-grained control, use the core algorithm directly:
from ssbc import ssbc_correct
result = ssbc_correct(
alpha_target=0.10, # Target 10% miscoverage
n=50, # Calibration set size
delta=0.10, # PAC parameter (90% confidence)
mode="beta" # Infinite test window
)
print(f"Corrected α: {result.alpha_corrected:.4f}")
print(f"u*: {result.u_star}")
Validation and Diagnostics
Empirically validate your PAC bounds:
from ssbc import validate_pac_bounds, print_validation_results
# Generate report
report = generate_rigorous_pac_report(labels, probs, delta=0.10)
# Validate empirically
validation = validate_pac_bounds(
report=report,
simulator=sim,
test_size=1000,
n_trials=10000
)
# Print results
print_validation_results(validation)
Cross-conformal validation for calibration diagnostics:
from ssbc import cross_conformal_validation
results = cross_conformal_validation(
labels=labels,
probs=probs,
n_folds=10,
alpha_target=0.10,
delta=0.10
)
print(f"Singleton rate: {results['marginal']['singleton']['mean']:.3f}")
print(f"Std dev: {results['marginal']['singleton']['std']:.3f}")
Key Features
- ✅ Small-Sample Correction: PAC-valid conformal prediction for small calibration sets
- ✅ Mondrian Conformal Prediction: Per-class calibration for handling class imbalance
- ✅ PAC Operational Bounds: Rigorous bounds on deployment rates (LOO-CV + Clopper-Pearson)
- ✅ Bootstrap Uncertainty: Recalibration variability analysis
- ✅ Cross-Conformal Validation: Finite-sample diagnostics via K-fold
- ✅ Empirical Validation: Verify theoretical guarantees in practice
- ✅ Comprehensive Statistics: Detailed reporting with exact confidence intervals
- ✅ Hyperparameter Tuning: Interactive parallel coordinates visualization
- ✅ Simulation Tools: Built-in data generators for testing
Examples
The examples/ directory contains comprehensive demonstrations:
Essential Examples
# Core algorithm
python examples/ssbc_core_example.py
# Mondrian conformal prediction
python examples/mondrian_conformal_example.py
# Complete workflow with all uncertainty analyses
python examples/complete_workflow_example.py
# SLA/deployment contracts
python examples/sla_example.py
# Alpha scanning across thresholds
python examples/alpha_scan_example.py
# Empirical validation
python examples/pac_validation_example.py
Understanding the Output
Per-Class Statistics (Conditioned on True Label)
For each class, the report shows:
- Abstentions: Empty prediction sets (no confident prediction)
- Singletons: Single-label predictions (automated decisions)
- Doublets: Both labels included (escalated to human review)
- Singleton Error Rate: P(error | singleton prediction)
Marginal Statistics (Deployment View)
Overall performance metrics (deployment perspective):
- Coverage: Fraction of predictions containing the true label
- Automation Rate: Fraction of confident predictions (singletons)
- Escalation Rate: Fraction requiring human review (doublets + abstentions)
- Error Rate: Among automated decisions
PAC Operational Bounds
Rigorous bounds on all operational metrics:
- Computed via Leave-One-Out Cross-Validation (LOO-CV)
- Clopper-Pearson confidence intervals account for estimation uncertainty
- Union bound ensures all metrics hold simultaneously
- Valid for any future test set from the same distribution
Citation
If you use SSBC in your research, please cite:
@software{ssbc2024,
author = {Zwart, Petrus H},
title = {SSBC: Small-Sample Beta Correction},
year = {2024},
url = {https://github.com/phzwart/ssbc}
}
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE file for details.
Credits
This package was created with Cookiecutter and the audreyfeldroy/cookiecutter-pypackage project template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ssbc-1.3.3.tar.gz.
File metadata
- Download URL: ssbc-1.3.3.tar.gz
- Upload date:
- Size: 119.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
247514cd74f58d5e6c1007897156d71974f6c62615a747fdacc89d36be52f2b9
|
|
| MD5 |
c37b5b241095496207290c61f9c9e9ac
|
|
| BLAKE2b-256 |
bd63d6e5324d76b5b7855942f63a0064f04ef521cff6f4270f05765871ca2434
|
Provenance
The following attestation bundles were made for ssbc-1.3.3.tar.gz:
Publisher:
release.yml on phzwart/ssbc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ssbc-1.3.3.tar.gz -
Subject digest:
247514cd74f58d5e6c1007897156d71974f6c62615a747fdacc89d36be52f2b9 - Sigstore transparency entry: 656308630
- Sigstore integration time:
-
Permalink:
phzwart/ssbc@e3db0c2776a4835f65eea198ae13abda5e31a801 -
Branch / Tag:
refs/tags/v1.3.3 - Owner: https://github.com/phzwart
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e3db0c2776a4835f65eea198ae13abda5e31a801 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ssbc-1.3.3-py3-none-any.whl.
File metadata
- Download URL: ssbc-1.3.3-py3-none-any.whl
- Upload date:
- Size: 82.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
870c5f7a0bfe5063ef4d12fec02ed932f764f19bde858c724a3be74b1eecc8c9
|
|
| MD5 |
9aeb07e34cb3ef250681fe303c9639f3
|
|
| BLAKE2b-256 |
8f490e02cb5743d659b14545313be1021c056057b6ed6bb3fc7bd1c129c02fd3
|
Provenance
The following attestation bundles were made for ssbc-1.3.3-py3-none-any.whl:
Publisher:
release.yml on phzwart/ssbc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ssbc-1.3.3-py3-none-any.whl -
Subject digest:
870c5f7a0bfe5063ef4d12fec02ed932f764f19bde858c724a3be74b1eecc8c9 - Sigstore transparency entry: 656308641
- Sigstore integration time:
-
Permalink:
phzwart/ssbc@e3db0c2776a4835f65eea198ae13abda5e31a801 -
Branch / Tag:
refs/tags/v1.3.3 - Owner: https://github.com/phzwart
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e3db0c2776a4835f65eea198ae13abda5e31a801 -
Trigger Event:
push
-
Statement type: