Skip to main content

A strict experimental harness for reproducible, statistically valid model evaluation.

Project description

statbelt

statbelt is a Python package for reproducible, statistically aware model evaluation.

Current release status: Alpha.

Supported Python versions: 3.11+.

v0 Features

  • ExperimentalHarness builder-style API for binary classification.
  • Deterministic stratified k-fold evaluation with shared folds across models.
  • Bootstrap confidence intervals over fold-level metrics.
  • Lock artifact output (statbelt.lock.json) containing config and split indices.
  • Strict staged workflow: configure -> fasten() -> evaluate().
  • CLI smoke entry point (statbelt) for environment checks.

Supported v0 Task

  • binary_classification

Supported v0 Metrics

  • accuracy
  • precision
  • recall
  • f1
  • roc_auc
  • log_loss

Validation is fail-fast. For example, log_loss requires predict_proba, and roc_auc requires predict_proba or decision_function.

Installation

Install from PyPI:

pip install statbelt

For local development:

uv sync --all-groups

Quick Start

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from statbelt import ExperimentalHarness

X, y = make_classification(n_samples=120, random_state=21)

report = (
    ExperimentalHarness()
    .data(X, y)
    .task("binary_classification")
    .compare(
        ("logreg", LogisticRegression(max_iter=500)),
        ("rf", RandomForestClassifier(n_estimators=25, random_state=21)),
    )
    .metrics("accuracy", "roc_auc", "log_loss")
    .design(cv=5, random_state=42)
    .inference(alpha=0.05, bootstrap_resamples=2000)
    .fasten("statbelt.lock.json")
    .evaluate()
)

print(report.summary())

CLI Smoke Check

statbelt

Expected output:

Hello from statbelt!

Development

uv sync --all-groups
uv run ruff check .
uv run pytest

uv run pytest includes a terminal coverage report (missing lines included) via pytest-cov.

Current Limits

  • Binary classification only.
  • Confidence intervals only (no pairwise hypothesis tests yet).
  • Python API is the primary surface in v0; CLI remains minimal.

License

This project is licensed under the GNU Affero General Public License, version 3 or later (AGPL-3.0-or-later). See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statbelt-0.1.0.tar.gz (33.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

statbelt-0.1.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file statbelt-0.1.0.tar.gz.

File metadata

  • Download URL: statbelt-0.1.0.tar.gz
  • Upload date:
  • Size: 33.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for statbelt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3d1bff87b036a5b595e889a1b7e7e3fa28a3d5688700b2541cffc3d435e624ea
MD5 88ba76b4a3d61b5b47b769607f4e11b4
BLAKE2b-256 aba56c5c27684eab89a5f20b60fd9794577102432eaf830eff3234581f2f9de4

See more details on using hashes here.

File details

Details for the file statbelt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: statbelt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for statbelt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d44d984a3847e7699016c9abc294553721bf634063b9d3f9960f84a43314e598
MD5 260cb4f8286f1e1ccdf8fde153deb3d2
BLAKE2b-256 0cb8bab86b6219d7b998f01f5773e11ce2510e3c0d8d011478cbdb61d583d0db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page