Calibrax: Unified benchmarking framework for the JAX scientific ML ecosystem

These details have not been verified by PyPI

Project links

Project description

Calibrax

Validated against: scikit-learn and SciPy references for representative regression, classification, distance, and divergence metrics.

Early Development — API is unstable and subject to breaking changes. Pin to specific commits if stability is required.

Calibrax (Calibrate + JAX) is a unified benchmarking and metrics framework for the JAX scientific ML ecosystem. It extracts and consolidates shared benchmarking, profiling, statistical analysis, and evaluation functionality from Datarax, Artifex, and Opifex.

Features

Metrics (111 registered Tier 0 metrics, 17 domains, 4-tier architecture)

Calibrax provides a 4-tier metric system covering the full spectrum of ML evaluation. The current registry contains 111 Tier 0 pure-function metrics; Tier 1-3 APIs, optional plugins, and metric-learning losses are part of the package architecture but are not all registered metric entries today.

Tier	Name	Pattern	Examples
0	Pure Functions	`fn(predictions, targets) -> scalar`	MSE, cosine distance, BLEU
1	Frozen Backbone	`update() -> compute() -> reset()`	FID, BERTScore, Inception Score
2	Learned	`nnx.Module` with trainable weights	LPIPS
3	Metric Learning	Differentiable embedding loss	Contrastive, Triplet, ArcFace

Functional domains: regression, classification, calibration, segmentation, distance, divergence, information, ranking, statistical, clustering, fairness, image, text, audio, geometric, graph, manifold

Key capabilities:

MetricRegistry with axiom-based discovery for registered Tier 0 metrics (list_true_metrics(), list_by_invariance("rotation"))
Geometric distance hierarchy — Euclidean, Riemannian (SPD, Grassmann, Stiefel), pseudo-Riemannian (ultrahyperbolic), Finsler (Randers)
Graph metrics — spectral distance, resistance distance, Floyd-Warshall shortest paths
Reference checks — representative Tier 0 metrics are tested against scikit-learn and SciPy references with 1e-6 tolerance; see Peer Comparison
Composition — MetricCollection, WeightedMetric, MetricSuite, ThresholdMetric
Wrappers — BootstrapMetric (confidence intervals), ClasswiseWrapper, MetricTracker, MinMaxTracker
Metric learning losses — contrastive, triplet margin, NTXent, ArcFace, CosFace, ProxyNCA, ProxyAnchor, with hard/semi-hard negative mining

Benchmarking & Profiling

Timing — Warm-up aware timing with JIT compilation separation
Resource monitoring — CPU, memory, GPU memory/clock/power tracking
Energy & carbon — Energy measurement with carbon footprint estimation
FLOPS & roofline — XLA-level FLOP counting, roofline performance analysis
Compilation — XLA compilation profiling and tracing
Complexity — Algorithmic complexity analysis
Hardware — Automatic hardware detection and capability reporting

Analysis & Infrastructure

Statistical analysis — Bootstrap confidence intervals, hypothesis testing, effect sizes, outlier detection
Regression detection — Direction-aware detection with configurable severity levels
Comparison & ranking — Cross-configuration comparison, Pareto front analysis, aggregate scoring
Validation — Convergence analysis and accuracy assessment
Storage — JSON-per-run file backend with baseline management
Exporters — W&B and MLflow integration, publication-ready LaTeX/HTML/CSV tables and matplotlib plots
CI integration — Regression gate with git bisect automation
Monitoring — Production alerting with configurable thresholds
CLI — calibrax ingest|export|check|baseline|trend|summary|profile

Quick Start

import jax.numpy as jnp
from calibrax.metrics import MetricRegistry, calculate_all
from calibrax.metrics.functional.regression import mse, mae, r_squared

predictions = jnp.array([1.1, 2.3, 2.8, 4.2, 4.7])
targets = jnp.array([1.0, 2.0, 3.0, 4.0, 5.0])

# Individual metrics
print(f"MSE: {mse(predictions, targets):.4f}")
print(f"R²:  {r_squared(predictions, targets):.4f}")

# Batch computation of all registered metrics
results = calculate_all(predictions, targets, metrics=["mse", "mae", "rmse", "r_squared"])

# Registry discovery
registry = MetricRegistry()
true_metrics = registry.list_true_metrics()
rotation_inv = registry.list_by_invariance("rotation")

Installation

# Basic installation
uv pip install calibrax

# With statistical analysis (scipy)
uv pip install "calibrax[stats]"

# With GPU monitoring
uv pip install "calibrax[gpu]"

# With image quality plugins (FID, Inception Score)
uv pip install "calibrax[image]"

# With text quality plugins (BERTScore)
uv pip install "calibrax[text]"

# With publication export (matplotlib)
uv pip install "calibrax[publication]"

Development Setup

The recommended way to set up a development environment is with the included setup.sh script. It auto-detects your platform (Linux CUDA, macOS Intel, Apple Silicon), creates a virtual environment, installs all dependencies, and generates an activation script.

git clone https://github.com/avitai/calibrax.git
cd calibrax

# Standard setup with automatic GPU detection
./setup.sh

# Activate the environment
source ./activate.sh

setup.sh Options

Flag	Description
`--cpu-only`	Force CPU-only setup, skip GPU/Metal detection
`--metal`	Enable Metal acceleration on Apple Silicon Macs
`--deep-clean`	Clear JAX cache, pip cache, pytest cache, and other artifacts
`--force`	Force reinstallation even if environment exists
`--verbose`, `-v`	Show detailed output during setup

# Examples
./setup.sh --cpu-only         # CPU-only development
./setup.sh --metal            # Apple Silicon with Metal
./setup.sh --force --verbose  # Force reinstall with full output
./setup.sh --deep-clean       # Clean everything and start fresh

Manual Setup

If you prefer to set up manually:

git clone https://github.com/avitai/calibrax.git
cd calibrax
uv venv
uv pip install -e ".[dev,test,stats]"
uv run pre-commit install

Architecture

src/calibrax/
├── core/          Data models, protocols, adapters, result container, registry
├── profiling/     Timing, resources, GPU, energy, FLOPS, roofline, compilation,
│                  complexity, hardware, tracing, carbon
├── statistics/    Statistical analyzer, significance testing
├── analysis/      Regression, comparison, ranking, scaling, Pareto, changepoint
├── validation/    Convergence, accuracy, validation framework
├── monitoring/    Alerts, production monitoring
├── storage/       JSON store, baselines
├── exporters/     W&B, MLflow, publication-ready output
├── metrics/
│   ├── functional/   111 Tier 0 pure functions across 17 domains
│   ├── stateful/     Tier 1-2 base classes (FrozenBackboneMetric, LearnedMetric)
│   ├── learning/     Tier 3 metric learning losses and miners
│   ├── plugins/      Optional-dependency metrics (FID, BERTScore, LPIPS)
│   ├── composition.py   MetricCollection, WeightedMetric, MetricSuite, ThresholdMetric
│   ├── wrappers.py      BootstrapMetric, ClasswiseWrapper, MetricTracker, MinMaxTracker
│   └── _registry.py     MetricRegistry singleton with axiom-based discovery
├── ci/            CI regression gate, bisection engine
└── cli/           Command-line interface

Examples

Runnable examples are in examples/metrics/, available as both Python scripts and Jupyter notebooks:

Example	Level	Topics
01_quickstart.py	Beginner	Individual metrics, `calculate_all`, registry queries
02_regression_deep_dive.py	Beginner	Same-shape regression metrics, outlier sensitivity
03_classification.py	Intermediate	Classification, calibration, segmentation
04_distances.py	Intermediate	Euclidean, hyperbolic, divergences, information theory
05_composition.py	Intermediate	Collections, weighted metrics, quality gates, tracking
06_image_quality.py	Intermediate	PSNR, SSIM, MS-SSIM, BLEU, ROUGE
07_metric_learning.py	Advanced	Contrastive, triplet, NTXent, ArcFace, mining
08_manifold_graph.py	Advanced	SPD, Grassmann, spectral distance, Floyd-Warshall

Development

# Activate the local environment first
source activate.sh

# Run tests
uv run pytest tests/ -v --cov=calibrax --cov-report=term-missing

# Lint & format
uv run ruff check src/ tests/ --fix
uv run ruff format src/ tests/

# Type check
uv run pyright src/

# All quality checks
uv run pre-commit run --all-files

# Build documentation
uv run mkdocs build --strict --clean

# Convert examples to Jupyter notebooks
uv run python scripts/jupytext_converter.py batch-py-to-nb examples/metrics/

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Apr 30, 2026

0.1.0

Apr 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

calibrax-0.1.1.tar.gz (145.6 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

calibrax-0.1.1-py3-none-any.whl (184.2 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file calibrax-0.1.1.tar.gz.

File metadata

Download URL: calibrax-0.1.1.tar.gz
Upload date: Apr 30, 2026
Size: 145.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for calibrax-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`7b792fb19d2eb87c22fcd342a1dfc562291a57539aa33706cd733d0f7d3f8bd2`
MD5	`69422c7fdfe77579002826353031b542`
BLAKE2b-256	`9c92977079025210b4c1a1bcfb6304c2b9eb4656f53094775d3d824e24c40f00`

See more details on using hashes here.

File details

Details for the file calibrax-0.1.1-py3-none-any.whl.

File metadata

Download URL: calibrax-0.1.1-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 184.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for calibrax-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31c954986a2e53f699f299409fdd3ce001f0276375b614ed044509a7ea7a1047`
MD5	`1df3c71d718191798977a97607339608`
BLAKE2b-256	`b51de00a150c88e6c88701031e4046de1999332cfc1e3c73cd3f82d142a5013c`

See more details on using hashes here.

calibrax 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Calibrax

Features

Metrics (111 registered Tier 0 metrics, 17 domains, 4-tier architecture)

Benchmarking & Profiling

Analysis & Infrastructure

Quick Start

Installation

Development Setup

setup.sh Options

Manual Setup

Architecture

Examples

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes