Skip to main content

Calibrax: Unified benchmarking framework for the JAX scientific ML ecosystem

Project description

Calibrax

CI Build Quality Security Python 3.11+ JAX Ruff uv License: MIT


Early Development — API is unstable and subject to breaking changes. Pin to specific commits if stability is required.


Calibrax (Calibrate + JAX) is a unified benchmarking and metrics framework for the JAX scientific ML ecosystem. It extracts and consolidates shared benchmarking, profiling, statistical analysis, and evaluation functionality from Datarax, Artifex, and Opifex.

Features

Metrics (110+ registered, 17 domains, 4 tiers)

Calibrax provides a 4-tier metric system covering the full spectrum of ML evaluation:

Tier Name Pattern Examples
0 Pure Functions fn(predictions, targets) -> scalar MSE, cosine distance, BLEU
1 Frozen Backbone update() -> compute() -> reset() FID, BERTScore, Inception Score
2 Learned nnx.Module with trainable weights LPIPS
3 Metric Learning Differentiable embedding loss Contrastive, Triplet, ArcFace

Functional domains: regression, classification, calibration, segmentation, distance, divergence, information, ranking, statistical, clustering, fairness, image, text, audio, geometric, graph, manifold

Key capabilities:

  • MetricRegistry with axiom-based discovery (list_true_metrics(), list_by_invariance("rotation"))
  • Geometric distance hierarchy — Euclidean, Riemannian (SPD, Grassmann, Stiefel), pseudo-Riemannian (ultrahyperbolic), Finsler (Randers)
  • Graph metrics — spectral distance, resistance distance, Floyd-Warshall shortest paths
  • CompositionMetricCollection, WeightedMetric, MetricSuite, ThresholdMetric
  • WrappersBootstrapMetric (confidence intervals), ClasswiseWrapper, MetricTracker, MinMaxTracker
  • Metric learning losses — contrastive, triplet margin, NTXent, ArcFace, CosFace, ProxyNCA, ProxyAnchor, with hard/semi-hard negative mining

Benchmarking & Profiling

  • Timing — Warm-up aware timing with JIT compilation separation
  • Resource monitoring — CPU, memory, GPU memory/clock/power tracking
  • Energy & carbon — Energy measurement with carbon footprint estimation
  • FLOPS & roofline — XLA-level FLOP counting, roofline performance analysis
  • Compilation — XLA compilation profiling and tracing
  • Complexity — Algorithmic complexity analysis
  • Hardware — Automatic hardware detection and capability reporting

Analysis & Infrastructure

  • Statistical analysis — Bootstrap confidence intervals, hypothesis testing, effect sizes, outlier detection
  • Regression detection — Direction-aware detection with configurable severity levels
  • Comparison & ranking — Cross-configuration comparison, Pareto front analysis, aggregate scoring
  • Validation — Convergence analysis and accuracy assessment
  • Storage — JSON-per-run file backend with baseline management
  • Exporters — W&B and MLflow integration, publication-ready LaTeX/HTML/CSV tables and matplotlib plots
  • CI integration — Regression gate with git bisect automation
  • Monitoring — Production alerting with configurable thresholds
  • CLIcalibrax ingest|export|check|baseline|trend|summary|profile

Quick Start

import jax.numpy as jnp
from calibrax.metrics import MetricRegistry, calculate_all
from calibrax.metrics.functional.regression import mse, mae, r_squared

predictions = jnp.array([1.1, 2.3, 2.8, 4.2, 4.7])
targets = jnp.array([1.0, 2.0, 3.0, 4.0, 5.0])

# Individual metrics
print(f"MSE: {mse(predictions, targets):.4f}")
print(f"R²:  {r_squared(predictions, targets):.4f}")

# Batch computation of all registered metrics
results = calculate_all(predictions, targets, metrics=["mse", "mae", "rmse", "r_squared"])

# Registry discovery
registry = MetricRegistry()
true_metrics = registry.list_true_metrics()
rotation_inv = registry.list_by_invariance("rotation")

Installation

# Basic installation
uv pip install calibrax

# With statistical analysis (scipy)
uv pip install "calibrax[stats]"

# With GPU monitoring
uv pip install "calibrax[gpu]"

# With image quality plugins (FID, Inception Score)
uv pip install "calibrax[image]"

# With text quality plugins (BERTScore)
uv pip install "calibrax[text]"

# With publication export (matplotlib)
uv pip install "calibrax[publication]"

Development Setup

The recommended way to set up a development environment is with the included setup.sh script. It auto-detects your platform (Linux CUDA, macOS Intel, Apple Silicon), creates a virtual environment, installs all dependencies, and generates an activation script.

git clone https://github.com/avitai/calibrax.git
cd calibrax

# Standard setup with automatic GPU detection
./setup.sh

# Activate the environment
source ./activate.sh

setup.sh Options

Flag Description
--cpu-only Force CPU-only setup, skip GPU/Metal detection
--metal Enable Metal acceleration on Apple Silicon Macs
--deep-clean Clear JAX cache, pip cache, pytest cache, and other artifacts
--force Force reinstallation even if environment exists
--verbose, -v Show detailed output during setup
# Examples
./setup.sh --cpu-only         # CPU-only development
./setup.sh --metal            # Apple Silicon with Metal
./setup.sh --force --verbose  # Force reinstall with full output
./setup.sh --deep-clean       # Clean everything and start fresh

Manual Setup

If you prefer to set up manually:

git clone https://github.com/avitai/calibrax.git
cd calibrax
uv venv
uv pip install -e ".[dev,test,stats]"
uv run pre-commit install

Architecture

src/calibrax/
├── core/          Data models, protocols, adapters, result container, registry
├── profiling/     Timing, resources, GPU, energy, FLOPS, roofline, compilation,
│                  complexity, hardware, tracing, carbon
├── statistics/    Statistical analyzer, significance testing
├── analysis/      Regression, comparison, ranking, scaling, Pareto, changepoint
├── validation/    Convergence, accuracy, validation framework
├── monitoring/    Alerts, production monitoring
├── storage/       JSON store, baselines
├── exporters/     W&B, MLflow, publication-ready output
├── metrics/
│   ├── functional/   110+ Tier 0 pure functions across 17 domains
│   ├── stateful/     Tier 1–2 base classes (FrozenBackboneMetric, LearnedMetric)
│   ├── learning/     Tier 3 metric learning losses and miners
│   ├── plugins/      Optional-dependency metrics (FID, BERTScore, LPIPS)
│   ├── composition.py   MetricCollection, WeightedMetric, MetricSuite, ThresholdMetric
│   ├── wrappers.py      BootstrapMetric, ClasswiseWrapper, MetricTracker, MinMaxTracker
│   └── _registry.py     MetricRegistry singleton with axiom-based discovery
├── ci/            CI regression gate, bisection engine
└── cli/           Command-line interface

Examples

Runnable examples are in examples/metrics/, available as both Python scripts and Jupyter notebooks:

Example Level Topics
01_quickstart.py Beginner Individual metrics, calculate_all, registry queries
02_regression_deep_dive.py Beginner All 12 regression metrics, outlier sensitivity
03_classification.py Intermediate Classification, calibration, segmentation
04_distances.py Intermediate Euclidean, hyperbolic, divergences, information theory
05_composition.py Intermediate Collections, weighted metrics, quality gates, tracking
06_image_quality.py Intermediate PSNR, SSIM, MS-SSIM, BLEU, ROUGE
07_metric_learning.py Advanced Contrastive, triplet, NTXent, ArcFace, mining
08_manifold_graph.py Advanced SPD, Grassmann, spectral distance, Floyd-Warshall

Development

# Run tests
uv run pytest tests/ -v --cov=calibrax --cov-report=term-missing

# Lint & format
uv run ruff check src/ tests/ --fix
uv run ruff format src/ tests/

# Type check
uv run pyright src/

# All quality checks
uv run pre-commit run --all-files

# Build documentation
uv run mkdocs build

# Convert examples to Jupyter notebooks
uv run python scripts/jupytext_converter.py batch-py-to-nb examples/metrics/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

calibrax-0.1.0.tar.gz (142.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

calibrax-0.1.0-py3-none-any.whl (180.5 kB view details)

Uploaded Python 3

File details

Details for the file calibrax-0.1.0.tar.gz.

File metadata

  • Download URL: calibrax-0.1.0.tar.gz
  • Upload date:
  • Size: 142.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for calibrax-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9ee2a20c96c8f2c41ca928532ccfaa813572360909d17f5c9ea5716a89890054
MD5 a1ffa1702b02cb79440182a44611c723
BLAKE2b-256 83d3e00b944fa01706b5f455c7d90310d8d3f8af6062de8bda5db635f5e9dc5f

See more details on using hashes here.

File details

Details for the file calibrax-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: calibrax-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 180.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for calibrax-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a34c328768434c87f515a1bff17fad07ec6f0b0d61d9c00ea43a1d3cdca847ed
MD5 421b4a43de36dddb5c0159b93d3332b1
BLAKE2b-256 f2d45d1ddbb610717cd70012be6c41e2dc5dc76d41fdca86e20fef09aa07bc5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page