Open-Set Recognition (OSR) and OOD-detection metrics for ML research

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hxtruong6

These details have not been verified by PyPI

Project description

osr-metrics

Open-Set Recognition (OSR) and OOD-detection metrics for machine-learning research.

A small, framework-agnostic Python library that bundles the metrics needed for credible OSR / OOD-detection publications, with consistent score-direction conventions and first-principles-verified formulas.

What's inside

Group	Metrics
OOD detection	`auroc`, `fpr_at_tpr`, `fpr_at_95tpr`, `aupr_in`, `aupr_out`
Open-Set Recognition	`compute_aoscr` (canonical Dhamija/Vaze), `oscr_curve`, `compute_nf_rejection_at_tpr`
Multi-label classification	`macro_auprc`, `macro_auprc_id_labels`, `macro_f1_with_thresholds`, `per_label_auprc`, `f1_per_label`
Four-class OSR partitioning	`build_fourclass_masks`, `compute_fourclass_metrics`, `partition_ood_by_purity`
Calibration	`expected_calibration_error`, `brier_score`
Statistical comparison	`delong_test` (O(n log n) rank-based), `bootstrap_ci` (with optional stratification)

All functions take plain numpy arrays and return scalars or simple dictionaries — no PyTorch, TensorFlow, or framework lock-in.

Scope

This library targets the semantic-shift setting (OSR / near-OOD / far-OOD): novel class labels appear at test time. Covariate shift (domain generalization), regression OOD, and continual / open-world learning are out of scope.

Capability matrix — which function for which setting?

Read across to find your setting; functions marked ✅ apply directly. ⚠ = applies with a small adapter (see footnote). ❌ = not applicable.

Function	Multi-class (single-label)	Multi-label	Pure OOD detection	OSR (classify+reject)	Calibration	Statistical test
`auroc`	✅	✅	✅	—	—	—
`fpr_at_tpr` / `fpr_at_95tpr`	✅	✅	✅	—	—	—
`aupr_in` / `aupr_out`	✅	✅	✅	—	—	—
`compute_aoscr` / `oscr_curve`	✅	⚠ ¹	—	✅	—	—
`compute_nf_rejection_at_tpr`	❌	✅	—	✅ ²	—	—
`partition_ood_by_purity`	❌	✅	—	✅ ²	—	—
`build_fourclass_masks` / `compute_fourclass_metrics`	❌	✅	—	✅ ²	—	—
`macro_auprc` / `macro_auprc_id_labels`	❌ ³	✅	—	—	—	—
`per_label_auprc` / `f1_per_label`	❌ ³	✅	—	—	—	—
`macro_f1_with_thresholds`	❌ ³	✅	—	—	—	—
`expected_calibration_error`	⚠ ⁴	✅	—	—	✅	—
`brier_score`	⚠ ⁴	✅	—	—	✅	—
`delong_test`	✅	✅	✅	✅	—	✅
`bootstrap_ci`	✅	✅	✅	✅	✅	✅

¹ Multi-label OSCR/AOSCR: pass an exact-match indicator (1 if all labels predicted correctly, else 0) as class_predictions with true_classes=ones(N). See compute_aoscr docstring.

² Clinical / multi-label OSR helpers — depend on a per-sample "No Finding" (all-zero label vector) indicator that has no analogue in multi-class single-label settings.

³ Multi-class single-label closed-set classification — use sklearn.metrics.accuracy_score and sklearn.metrics.f1_score(..., average='macro') directly. A native multi-class wrapper is on the roadmap.

⁴ Multi-class softmax calibration (Guo 2017 form) is not yet the form implemented here. Current functions flatten across (sample, label). For multi-class softmax, use sklearn.calibration.calibration_curve or torchmetrics.CalibrationError until the multi-class overload lands.

Score-direction convention

For every OOD/novelty metric in this library, higher score = more OOD. ID-positive metrics (aupr_in) handle the sign flip internally so you don't have to.

Install

pip install osr-metrics

Requires Python 3.10+, numpy, scikit-learn, scipy.

Development install

git clone https://github.com/hxtruong6/osr-metrics.git
cd osr-metrics
pip install -e .[dev]

Quick start

import numpy as np
from osr_metrics import auroc, fpr_at_95tpr, compute_aoscr, expected_calibration_error

# OOD detection
scores = np.random.randn(1000)          # higher = more OOD
labels = np.random.randint(0, 2, 1000)  # 1 = OOD, 0 = ID
print("AUROC:", auroc(scores, labels))
print("FPR@95TPR:", fpr_at_95tpr(scores, labels))

# Open-Set Classification Rate (joint classify+reject)
cls_pred = np.random.randint(0, 5, 1000)
cls_true = np.random.randint(0, 5, 1000)
print("AOSCR:", compute_aoscr(scores, labels, cls_pred, cls_true))

# Calibration
probs = np.random.uniform(0, 1, (1000, 14))
multi_labels = (np.random.uniform(0, 1, (1000, 14)) < probs).astype(int)
print("ECE:", expected_calibration_error(probs, multi_labels))

Statistical comparison

from osr_metrics import delong_test, bootstrap_ci, auroc

# Pairwise AUROC comparison (DeLong 1988)
z, p = delong_test(scores_method_a, scores_method_b, labels)
print(f"DeLong z={z:.3f}, p={p:.4f}")

# Bootstrap CI (use stratify=True for imbalanced data)
lo, mean, hi = bootstrap_ci(scores, labels, auroc, n_bootstrap=1000, stratify=True)
print(f"AUROC = {mean:.4f}  95% CI = [{lo:.4f}, {hi:.4f}]")

Four-class OSR partitioning

For multi-label problems with held-out labels (chest X-ray OSR style):

from osr_metrics import build_fourclass_masks, compute_fourclass_metrics

label_names = ["A", "B", "C", "D"]
held_out = ["C", "D"]
metrics = compute_fourclass_metrics(scores, label_vecs, label_names, held_out)
# Returns: auroc_full, fpr95_full, auroc_pure, auroc_mixed,
#          auroc_mixed_vs_id_disease, auroc_nf_vs_pure,
#          auroc_disease_only, counts...

Partitions images into four mutually exclusive classes:

id_disease — only known labels
no_finding — all-zero label vector
pure_ood — only held-out labels
mixed_ood — both known + held-out labels

Five AUROC pairings answer different questions:

Key	Negatives	Positives	What it asks
`auroc_pure`	ID-disease + NF	Pure OOD	Upper-bound separability
`auroc_mixed`	ID-disease + NF	Mixed OOD	Mixed-OOD detection difficulty
`auroc_mixed_vs_id_disease`	ID-disease only	Mixed OOD	Near-OOD sensitivity (NF removed)
`auroc_nf_vs_pure`	NF only	Pure OOD	Diagnostic floor: healthy-vs-anything
`auroc_full`	ID-disease + NF	Pure + Mixed OOD	Full population measurement

Why another metrics library?

Most OOD/OSR libraries (PyTorch-OOD, OpenOOD) couple metrics with detection methods, datasets, and a heavy framework. osr-metrics is just the metrics — useful when you want to compute AOSCR or DeLong on cached scores from any pipeline, regardless of how those scores were produced.

Documentation

docs/USAGE.md — "which metric should I use?" decision tree.
docs/EXAMPLES.md — end-to-end runnable examples including the full publication metric panel, DeLong comparison, and seed aggregation.
CHANGELOG.md — version history.
CITATION.cff — citation metadata.

Testing

pytest tests/ -v

Each metric is verified against a first-principles brute-force reference; the test suite covers numerical equivalence, edge cases (empty class, single-value scores), and known properties (DeLong z=0 on identical inputs, ECE=0.9 on overconfident-wrong, etc.).

License

MIT.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hxtruong6

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.0

May 1, 2026

0.4.0

May 1, 2026

0.3.1

Apr 30, 2026

0.3.0

Apr 30, 2026

0.2.0

Apr 29, 2026

This version

0.1.3

Apr 29, 2026

0.1.2

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osr_metrics-0.1.3.tar.gz (30.9 kB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

osr_metrics-0.1.3-py3-none-any.whl (21.2 kB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file osr_metrics-0.1.3.tar.gz.

File metadata

Download URL: osr_metrics-0.1.3.tar.gz
Upload date: Apr 29, 2026
Size: 30.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for osr_metrics-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`b0fc8a65563a69bfa1dc3f499c974713fd863b7dc1d6c43472986ba194836bae`
MD5	`d22b36f28af7a32d4aad60037f6523b3`
BLAKE2b-256	`83ab39d5a2c6402cd8167024c2ed03723353a6764ffd90b595ce95d58b15dae5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for osr_metrics-0.1.3.tar.gz:

Publisher: release.yml on hxtruong6/osr-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: osr_metrics-0.1.3.tar.gz
- Subject digest: b0fc8a65563a69bfa1dc3f499c974713fd863b7dc1d6c43472986ba194836bae
- Sigstore transparency entry: 1402169365
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: hxtruong6/osr-metrics@09b0d3445210db4caaa4c8158eb6a54001b51a8b
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/hxtruong6
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@09b0d3445210db4caaa4c8158eb6a54001b51a8b
- Trigger Event: push

File details

Details for the file osr_metrics-0.1.3-py3-none-any.whl.

File metadata

Download URL: osr_metrics-0.1.3-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 21.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for osr_metrics-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`88116ccef47eaca7aa742f470a3fcc6675fe78860cbff64f5c3fae7cfecded4a`
MD5	`320f3541dfedf2c27689e2501268507a`
BLAKE2b-256	`ee66c2c00c3e496816cb9ca4565ba7f3b4fe914f7472820f71217e7b9830e3ab`

See more details on using hashes here.

Provenance

The following attestation bundles were made for osr_metrics-0.1.3-py3-none-any.whl:

Publisher: release.yml on hxtruong6/osr-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: osr_metrics-0.1.3-py3-none-any.whl
- Subject digest: 88116ccef47eaca7aa742f470a3fcc6675fe78860cbff64f5c3fae7cfecded4a
- Sigstore transparency entry: 1402169483
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: hxtruong6/osr-metrics@09b0d3445210db4caaa4c8158eb6a54001b51a8b
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/hxtruong6
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@09b0d3445210db4caaa4c8158eb6a54001b51a8b
- Trigger Event: push

osr-metrics 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

osr-metrics

What's inside

Scope

Capability matrix — which function for which setting?

Score-direction convention

Install

Development install

Quick start

Statistical comparison

Four-class OSR partitioning

Why another metrics library?

Documentation

Testing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance