Open-Set Recognition (OSR) and OOD-detection metrics for ML research
Project description
osr-metrics
Open-Set Recognition (OSR) and OOD-detection metrics for machine-learning research.
A small, framework-agnostic Python library that bundles the metrics needed for credible OSR / OOD-detection publications, with consistent score-direction conventions and first-principles-verified formulas.
What's inside
| Group | Metrics |
|---|---|
| OOD detection | auroc, fpr_at_tpr, fpr_at_95tpr, aupr_in, aupr_out |
| Open-Set Recognition | compute_aoscr (canonical Dhamija/Vaze), oscr_curve, compute_nf_rejection_at_tpr |
| Multi-label classification | macro_auprc, macro_auprc_id_labels, macro_f1_with_thresholds, per_label_auprc, f1_per_label |
| Four-class OSR partitioning | build_fourclass_masks, compute_fourclass_metrics, partition_ood_by_purity |
| Calibration | expected_calibration_error, brier_score |
| Statistical comparison | delong_test (O(n log n) rank-based), bootstrap_ci (with optional stratification) |
All functions take plain numpy arrays and return scalars or simple
dictionaries — no PyTorch, TensorFlow, or framework lock-in.
Score-direction convention
For every OOD/novelty metric in this library, higher score = more OOD.
ID-positive metrics (aupr_in) handle the sign flip internally so you don't
have to.
Install
pip install osr-metrics
Requires Python 3.10+, numpy, scikit-learn, scipy.
Development install
git clone https://github.com/hxtruong6/osr-metrics.git
cd osr-metrics
pip install -e .[dev]
Quick start
import numpy as np
from osr_metrics import auroc, fpr_at_95tpr, compute_aoscr, expected_calibration_error
# OOD detection
scores = np.random.randn(1000) # higher = more OOD
labels = np.random.randint(0, 2, 1000) # 1 = OOD, 0 = ID
print("AUROC:", auroc(scores, labels))
print("FPR@95TPR:", fpr_at_95tpr(scores, labels))
# Open-Set Classification Rate (joint classify+reject)
cls_pred = np.random.randint(0, 5, 1000)
cls_true = np.random.randint(0, 5, 1000)
print("AOSCR:", compute_aoscr(scores, labels, cls_pred, cls_true))
# Calibration
probs = np.random.uniform(0, 1, (1000, 14))
multi_labels = (np.random.uniform(0, 1, (1000, 14)) < probs).astype(int)
print("ECE:", expected_calibration_error(probs, multi_labels))
Statistical comparison
from osr_metrics import delong_test, bootstrap_ci, auroc
# Pairwise AUROC comparison (DeLong 1988)
z, p = delong_test(scores_method_a, scores_method_b, labels)
print(f"DeLong z={z:.3f}, p={p:.4f}")
# Bootstrap CI (use stratify=True for imbalanced data)
lo, mean, hi = bootstrap_ci(scores, labels, auroc, n_bootstrap=1000, stratify=True)
print(f"AUROC = {mean:.4f} 95% CI = [{lo:.4f}, {hi:.4f}]")
Four-class OSR partitioning
For multi-label problems with held-out labels (chest X-ray OSR style):
from osr_metrics import build_fourclass_masks, compute_fourclass_metrics
label_names = ["A", "B", "C", "D"]
held_out = ["C", "D"]
metrics = compute_fourclass_metrics(scores, label_vecs, label_names, held_out)
# Returns: auroc_full, fpr95_full, auroc_pure, auroc_mixed,
# auroc_mixed_vs_id_disease, auroc_nf_vs_pure,
# auroc_disease_only, counts...
Partitions images into four mutually exclusive classes:
id_disease— only known labelsno_finding— all-zero label vectorpure_ood— only held-out labelsmixed_ood— both known + held-out labels
Five AUROC pairings answer different questions:
| Key | Negatives | Positives | What it asks |
|---|---|---|---|
auroc_pure |
ID-disease + NF | Pure OOD | Upper-bound separability |
auroc_mixed |
ID-disease + NF | Mixed OOD | Mixed-OOD detection difficulty |
auroc_mixed_vs_id_disease |
ID-disease only | Mixed OOD | Near-OOD sensitivity (NF removed) |
auroc_nf_vs_pure |
NF only | Pure OOD | Diagnostic floor: healthy-vs-anything |
auroc_full |
ID-disease + NF | Pure + Mixed OOD | Full population measurement |
Why another metrics library?
Most OOD/OSR libraries (PyTorch-OOD, OpenOOD) couple metrics with detection
methods, datasets, and a heavy framework. osr-metrics is just the metrics —
useful when you want to compute AOSCR or DeLong on cached scores from any
pipeline, regardless of how those scores were produced.
Documentation
docs/USAGE.md— "which metric should I use?" decision tree.docs/EXAMPLES.md— end-to-end runnable examples including the full publication metric panel, DeLong comparison, and seed aggregation.CHANGELOG.md— version history.CITATION.cff— citation metadata.
Testing
pytest tests/ -v
Each metric is verified against a first-principles brute-force reference; the test suite covers numerical equivalence, edge cases (empty class, single-value scores), and known properties (DeLong z=0 on identical inputs, ECE=0.9 on overconfident-wrong, etc.).
License
MIT.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file osr_metrics-0.1.2.tar.gz.
File metadata
- Download URL: osr_metrics-0.1.2.tar.gz
- Upload date:
- Size: 27.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c0d731b35e3e82b6837f474aaa942af3189c094e9ae7d906e9729b01688ca43
|
|
| MD5 |
cb642e87addbf079a23d5801a9537fd7
|
|
| BLAKE2b-256 |
d314e880b7993c053bb6c1e0d6180e6e506f6020b07e23e58f7d9058b4494b22
|
Provenance
The following attestation bundles were made for osr_metrics-0.1.2.tar.gz:
Publisher:
release.yml on hxtruong6/osr-metrics
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osr_metrics-0.1.2.tar.gz -
Subject digest:
4c0d731b35e3e82b6837f474aaa942af3189c094e9ae7d906e9729b01688ca43 - Sigstore transparency entry: 1401982086
- Sigstore integration time:
-
Permalink:
hxtruong6/osr-metrics@de4aadb300abb70b99decd47e9a428421281c8bb -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/hxtruong6
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@de4aadb300abb70b99decd47e9a428421281c8bb -
Trigger Event:
push
-
Statement type:
File details
Details for the file osr_metrics-0.1.2-py3-none-any.whl.
File metadata
- Download URL: osr_metrics-0.1.2-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d260a21ed07a8000f1c8aee14b7d44b50892c51e9927d581c33bb6d416936e7c
|
|
| MD5 |
9413b932f9ed8fc59083f8250eabd461
|
|
| BLAKE2b-256 |
fb667de0bd4e0c5ba348eb2450837e2df3647a620f0a060a15e12301031b2af0
|
Provenance
The following attestation bundles were made for osr_metrics-0.1.2-py3-none-any.whl:
Publisher:
release.yml on hxtruong6/osr-metrics
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osr_metrics-0.1.2-py3-none-any.whl -
Subject digest:
d260a21ed07a8000f1c8aee14b7d44b50892c51e9927d581c33bb6d416936e7c - Sigstore transparency entry: 1401982207
- Sigstore integration time:
-
Permalink:
hxtruong6/osr-metrics@de4aadb300abb70b99decd47e9a428421281c8bb -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/hxtruong6
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@de4aadb300abb70b99decd47e9a428421281c8bb -
Trigger Event:
push
-
Statement type: