Quantify uncertainty around classification performance metrics

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hrmerrill

Project description

Classifier Uncertainty

pypi coverage docstring coverage Unlicense

About

This package implements methods from Tötsch N and Hoffmann D. 2021 to quantify the uncertainty around classification performance metrics. Classifiers are often tested on relatively small data sets, which should lead to uncertain performance metrics. Even when tested on large data sets, performance is often presented as a percentage with three decimals, and competing classifiers are ranked assuming such a precision. Reducing metric uncertainty below 0.001% would require tens of billions of data points.

The original authors' Python implementation is available at niklastoe/classifier_metric_uncertainty. This package was built independently and extends that work with:

Score-based input — accepts raw (y_true, y_score) pairs and sweeps thresholds; the original takes confusion matrix counts only
ROC and PR curves with uncertainty bands — including AUC posterior distributions
Economic value analysis — Value Score (Wilks 2001) and mean expense posteriors
Custom metrics — evaluate any f(tp, fn, tn, fp) over the posterior CM samples

Installation

pip install classifier-uncertainty

Quick start

from classifier_uncertainty import BinaryClassifier

# From ground-truth labels and classifier scores
bc = BinaryClassifier(y_true, y_score)

# Or from published confusion matrix counts (e.g. from a paper)
bc = BinaryClassifier.from_cm(tp=26, fn=0, tn=6, fp=2)

# fix the binarization threshold
t = bc.at_threshold(0.5)

What questions can this answer?

How well is a classifier likely to perform on a new, similar dataset?

t.tpr().point_estimate, t.tpr().credible_interval()

How will performance change if prevalence changes?

t.precision().point_estimate  # at observed prevalence
t.at_prevalence(0.05).precision().point_estimate  # projected to production

How likely is classifier A better than classifier B on a given metric?

(bc_a.at_threshold().tpr().samples > bc_b.at_threshold().tpr().samples).mean()

How likely is this model more cost-effective than business-as-usual?

(t_model.mean_expense(C, L).samples < t_bau.mean_expense(C, L).samples).mean()

Does this classifier meet my minimum recall requirement?

(t.tpr().samples > 0.8).mean()

Do precision and recall meet requirements simultaneously?

((t.tpr().samples > 0.8) & (t.precision().samples > 0.8)).mean()

Is this classifier better than random guessing?

(t.bookmaker_informedness().samples > 0).mean()

Should I trust this published result?

BinaryClassifier.from_cm(tp=26, fn=0, tn=6, fp=2).at_threshold().tpr().credible_interval()

For Developers

Setup

uv sync  # install package + dev dependencies into .venv

Development workflow

All changes should be made on a branch and merged via pull request — do not commit directly to main.

git checkout -b feat/my-feature   # or fix/, docs/, refactor/, etc.

# ... make changes ...

make format      # auto-fix formatting and lint violations
make check       # lint, type-check, and verify docstring coverage
make test        # run tests with coverage (90% minimum)
make docs-serve  # preview docs locally at http://127.0.0.1:8000

git push -u origin feat/my-feature
# open a pull request on GitHub

CI runs make check and make test automatically on every push and pull request. A PR cannot be merged if CI fails.

What triggers what

Action	CI checks	Docs deployed	Package published
Push to any branch / open PR	✓
Merge to `main`	✓	✓
Push a `v*` tag			✓

Docs-only change (e.g. fix a typo in docs/ or a docstring): open a PR and merge to main — docs redeploy automatically, no tag needed.

Code-only change (e.g. bug fix): merge to main, then tag when ready to publish (see below). Docs will also redeploy on merge, reflecting any updated docstrings.

Publishing a new package version

Bump the version in pyproject.toml:

make patch   # 0.1.0 → 0.1.1  (bug fixes)
make minor   # 0.1.0 → 0.2.0  (new features)
make major   # 0.1.0 → 1.0.0  (breaking changes)

Commit, tag, and push:

git add pyproject.toml
git commit -m "chore: bump version to v0.x.x"
git tag v0.x.x
git push && git push --tags

Pushing the tag triggers the publish workflow, which runs the test suite and publishes the package to PyPI. Check that the release appeared:

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hrmerrill

Release history Release notifications | RSS feed

This version

0.2.0

Jun 24, 2026

0.1.0

Jun 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

classifier_uncertainty-0.2.0.tar.gz (99.7 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

classifier_uncertainty-0.2.0-py3-none-any.whl (14.1 kB view details)

Uploaded Jun 24, 2026 Python 3

File details

Details for the file classifier_uncertainty-0.2.0.tar.gz.

File metadata

Download URL: classifier_uncertainty-0.2.0.tar.gz
Upload date: Jun 24, 2026
Size: 99.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for classifier_uncertainty-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d4160ed4698973db8ca9fb790a3e14403a3a3d7cf55b5c83ff93d947e700fcfd`
MD5	`64bd66cc85615e7150b71258165dc9d5`
BLAKE2b-256	`014ad58776ecd81aadbc6629303058d02cb162d2447f353c8a6f75a81c02c0e6`

See more details on using hashes here.

File details

Details for the file classifier_uncertainty-0.2.0-py3-none-any.whl.

File metadata

Download URL: classifier_uncertainty-0.2.0-py3-none-any.whl
Upload date: Jun 24, 2026
Size: 14.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for classifier_uncertainty-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d0a7478e3e6c0ace40e6ed39601bc557ca45a9ce1c9af37a5a2e8747a9745a58`
MD5	`4ed835136155c4fa6514c976fea26f45`
BLAKE2b-256	`2ab7eb4e1759d534e7e7329df6c892b1f9c0d219c94e171e0e391c14bcc7f272`

See more details on using hashes here.

classifier-uncertainty 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

Classifier Uncertainty

About

Installation

Quick start

What questions can this answer?

For Developers

Setup

Development workflow

What triggers what

Publishing a new package version

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes