Skip to main content

A strict experimental harness for reproducible, statistically valid model evaluation.

Project description

statbelt

PyPI version Python versions License Release workflow

statbelt is a strict experimental harness for reproducible, statistically aware model evaluation in Python.

Status: Alpha (APIs may evolve).
Supported Python: 3.11+.

Installation

Install from PyPI:

pip install statbelt

For local development:

uv sync --all-groups

Quick Start

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from statbelt import ExperimentalHarness

X, y = make_classification(n_samples=120, random_state=21)

report = (
    ExperimentalHarness()
    .data(X, y)
    .task("binary_classification")
    .compare(
        ("logreg", LogisticRegression(max_iter=500)),
        ("rf", RandomForestClassifier(n_estimators=25, random_state=21)),
    )
    .metrics("accuracy", "roc_auc", "log_loss")
    .design(cv=5, random_state=42)
    .inference(alpha=0.05, bootstrap_resamples=2000)
    .fasten("statbelt.lock.json")
    .evaluate()
)

print(report.summary())

Core Features

  • ExperimentalHarness builder-style API for binary classification comparisons.
  • Deterministic stratified k-fold evaluation with shared folds across models.
  • Bootstrap confidence intervals over fold-level metrics.
  • Lock artifact output (statbelt.lock.json) with config and split indices.
  • Strict staged workflow: configure -> fasten() -> evaluate().

Supported Task and Metrics

Supported task:

  • binary_classification

Supported metrics:

  • accuracy
  • precision
  • recall
  • f1
  • roc_auc
  • log_loss

Validation is fail-fast. For example:

  • log_loss requires predict_proba.
  • roc_auc requires predict_proba or decision_function.

Development

uv sync --all-groups
uv run ruff check .
uv run pytest

For release operations (tagging, TestPyPI gate, PyPI publish), see RELEASING.md.

Current Limits

  • Binary classification only.
  • Confidence intervals only (no pairwise hypothesis tests yet).
  • Python API is the only supported interface in v0.

License

This project is licensed under the GNU Affero General Public License, version 3 or later (AGPL-3.0-or-later). See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statbelt-0.1.1.tar.gz (33.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

statbelt-0.1.1-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file statbelt-0.1.1.tar.gz.

File metadata

  • Download URL: statbelt-0.1.1.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for statbelt-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c6bfec45529744b5a60299cd710875840e71a12018c4357e0551fd1621d9e181
MD5 f8619234091c57266e23d1b564e0d5ae
BLAKE2b-256 ba05eb48a77f1c061ce2beb1584563ee442d7535c98f4861e4210d7658b548d7

See more details on using hashes here.

File details

Details for the file statbelt-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: statbelt-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for statbelt-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 44045a800fbedaf4a72c16d742e89acb7f11ec5c459aa74db4b07ea93dad677e
MD5 5ef6a4c7cc77af71831d628963653c04
BLAKE2b-256 ca5d898ed1679baa280893a57c390d24d36ec5d05e1d8ed34b2fa339f5ee27da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page