Scenario-first ML evaluation engine — stress-test your models to find where metrics lie
Project description
Spectra
Scenario-first ML evaluation engine. Stress-test your models to find where metrics lie.
Spectra runs your model through realistic failure scenarios (label noise, score noise, class imbalance, threshold gaming) and shows you exactly where your metrics break down. Instead of a single accuracy number, you get a transparent stress-test report.
Install
pip install spectra-ml
With web UI support:
pip install spectra-ml[web]
Quick Start
Python SDK
import metrics_lie as spectra
result = spectra.evaluate(
name="my-model-audit",
dataset="data.csv",
model="model.pkl",
metric="auc",
trust_pickle=True,
)
spectra.display(result)
CLI
# Run from spec file
spectra run experiment.json
# Quick evaluation
spectra evaluate model.pkl --dataset data.csv --metric auc --trust-pickle
# Launch web UI
spectra serve
Web UI (Quick Test)
pip install spectra-ml[web]
spectra serve
Upload your model + dataset CSV. Spectra auto-detects columns, task type, and best metric. One click to run a full stress test.
What It Does
- Stress-tests metrics across scenarios: label noise, score noise, class imbalance, threshold gaming
- Detects metric disagreement — when accuracy says "great" but calibration says "broken"
- Runs diagnostics: calibration analysis, subgroup gaps, sensitivity ranking, threshold sweeps
- Produces decision scorecards with weighted components and transparent reasoning
- Compares models with regression detection and structured comparison reports
Supported
| Category | Options |
|---|---|
| Task Types | Binary classification, multiclass, regression, ranking |
| Metrics | 27 metrics: AUC, F1, precision, recall, Brier, ECE, MAE, RMSE, R2, NDCG, and more |
| Model Formats | sklearn pickle, ONNX, PyTorch, TensorFlow, XGBoost, LightGBM, CatBoost, MLflow |
| Scenarios | Label noise, score noise, class imbalance, threshold gaming |
Architecture
spectra run / evaluate / serve
|
Core Engine (metrics_lie)
|- Dataset Loading (CSV)
|- Model Adapter (pickle, ONNX, PyTorch, ...)
|- Scenario Runner (Monte Carlo trials)
|- Metrics (27 metrics across 4 task types)
|- Diagnostics (calibration, gaming, subgroups)
|- Analysis (dashboard, disagreement, sensitivity)
|- Decision Framework (scorecard, components)
'- Artifacts (plots, reports)
Development
git clone https://github.com/StrangeStorm243-bit/when-metrics-lie.git
cd when-metrics-lie
python -m venv .venv && source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -e ".[dev,web]"
pytest
Documentation
Full docs: https://strangestorm243-bit.github.io/when-metrics-lie/
License
Apache 2.0 — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spectra_ml-1.0.0.tar.gz.
File metadata
- Download URL: spectra_ml-1.0.0.tar.gz
- Upload date:
- Size: 148.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44aca381960a9ea33d3d12ea90c4d2da11263270f5df1bfd691dd8df88b652ff
|
|
| MD5 |
07189fee65fe917cab97656c0236c7b7
|
|
| BLAKE2b-256 |
2750f8648300425fc9191d0a218e8a3bfd734b7e2380fa0b46e11fa1b0d43a97
|
File details
Details for the file spectra_ml-1.0.0-py3-none-any.whl.
File metadata
- Download URL: spectra_ml-1.0.0-py3-none-any.whl
- Upload date:
- Size: 118.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f67c6e7960eabf3593759286426252e902429e9fd8465a8198e2f2cf2baf131
|
|
| MD5 |
dc85fc06e7c7bd58135dd722e9fc485a
|
|
| BLAKE2b-256 |
a9009ed133c07fb664029bb13bc0db2e0c5c9c7b2950972d0b51edc791819408
|