Skip to main content

Quantify uncertainty around classification performance metrics

Project description

Classifier Uncertainty

pypi CI coverage docstring coverage Unlicense

About

This package implements methods from Tötsch N and Hoffmann D. 2021 to quantify the uncertainty around classification performance metrics. Classifiers are often tested on relatively small data sets, which should lead to uncertain performance metrics. Even when tested on large data sets, performance is often presented as a percentage with three decimals, and competing classifiers are ranked assuming such a precision. Reducing metric uncertainty below 0.001% would require tens of billions of data points.

The original authors' Python implementation is available at niklastoe/classifier_metric_uncertainty. This package was built independently and extends that work with:

  • Score-based input — accepts raw (y_true, y_score) pairs and sweeps thresholds; the original takes confusion matrix counts only
  • ROC and PR curves with uncertainty bands — including AUC posterior distributions
  • Economic value analysis — Value Score (Wilks 2001) and mean expense posteriors
  • Custom metrics — evaluate any f(tp, fn, tn, fp) over the posterior CM samples

Installation

pip install classifier-uncertainty

Quick start

from classifier_uncertainty import BinaryClassifier

# From ground-truth labels and classifier scores
bc = BinaryClassifier(y_true, y_score)

# Or from published confusion matrix counts (e.g. from a paper)
bc = BinaryClassifier.from_cm(tp=26, fn=0, tn=6, fp=2)

# fix the binarization threshold
t = bc.at_threshold(0.5)

What questions can this answer?

How well is a classifier likely to perform on a new, similar dataset?

t.tpr().point_estimate, t.tpr().credible_interval()

How will performance change if prevalence changes?

t.precision().point_estimate  # at observed prevalence
t.at_prevalence(0.05).precision().point_estimate  # projected to production

How likely is classifier A better than classifier B on a given metric?

(bc_a.at_threshold().tpr().samples > bc_b.at_threshold().tpr().samples).mean()

How likely is this model more cost-effective than business-as-usual?

(t_model.mean_expense(C, L).samples < t_bau.mean_expense(C, L).samples).mean()

Does this classifier meet my minimum recall requirement?

(t.tpr().samples > 0.8).mean()

Do precision and recall meet requirements simultaneously?

((t.tpr().samples > 0.8) & (t.precision().samples > 0.8)).mean()

Is this classifier better than random guessing?

(t.bookmaker_informedness().samples > 0).mean()

Should I trust this published result?

BinaryClassifier.from_cm(tp=26, fn=0, tn=6, fp=2).at_threshold().tpr().credible_interval()

For Developers

Setup

uv sync  # install package + dev dependencies into .venv

Development workflow

All changes should be made on a branch and merged via pull request — do not commit directly to main.

git checkout -b feat/my-feature   # or fix/, docs/, refactor/, etc.

# ... make changes ...

make format      # auto-fix formatting and lint violations
make check       # lint, type-check, and verify docstring coverage
make test        # run tests with coverage (90% minimum)
make docs-serve  # preview docs locally at http://127.0.0.1:8000

git push -u origin feat/my-feature
# open a pull request on GitHub

CI runs make check and make test automatically on every push and pull request. A PR cannot be merged if CI fails.

What triggers what

Action CI checks Docs deployed Package published
Push to any branch / open PR
Merge to main
Push a v* tag

Docs-only change (e.g. fix a typo in docs/ or a docstring): open a PR and merge to main — docs redeploy automatically, no tag needed.

Code-only change (e.g. bug fix): merge to main, then tag when ready to publish (see below). Docs will also redeploy on merge, reflecting any updated docstrings.

Publishing a new package version

  1. Bump the version in pyproject.toml:
    make patch   # 0.1.0 → 0.1.1  (bug fixes)
    make minor   # 0.1.0 → 0.2.0  (new features)
    make major   # 0.1.0 → 1.0.0  (breaking changes)
    
  2. Commit, tag, and push:
    git add pyproject.toml
    git commit -m "chore: bump version to v0.x.x"
    git tag v0.x.x
    git push && git push --tags
    

Pushing the tag triggers the publish workflow, which runs the test suite and publishes the package to PyPI. Check that the release appeared:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

classifier_uncertainty-0.2.0.tar.gz (99.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

classifier_uncertainty-0.2.0-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file classifier_uncertainty-0.2.0.tar.gz.

File metadata

  • Download URL: classifier_uncertainty-0.2.0.tar.gz
  • Upload date:
  • Size: 99.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for classifier_uncertainty-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d4160ed4698973db8ca9fb790a3e14403a3a3d7cf55b5c83ff93d947e700fcfd
MD5 64bd66cc85615e7150b71258165dc9d5
BLAKE2b-256 014ad58776ecd81aadbc6629303058d02cb162d2447f353c8a6f75a81c02c0e6

See more details on using hashes here.

File details

Details for the file classifier_uncertainty-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: classifier_uncertainty-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.24 {"installer":{"name":"uv","version":"0.11.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for classifier_uncertainty-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d0a7478e3e6c0ace40e6ed39601bc557ca45a9ce1c9af37a5a2e8747a9745a58
MD5 4ed835136155c4fa6514c976fea26f45
BLAKE2b-256 2ab7eb4e1759d534e7e7329df6c892b1f9c0d219c94e171e0e391c14bcc7f272

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page