MLX Inference Quality Diagnostic Toolkit
Project description
mlx-triage
Your MLX model is producing garbage. Is it the weights? A known MLX bug? Your quantization settings?
mlx-triage answers that in 30 seconds — without loading the model into memory.
pip install mlx-triage
mlx-triage check ./my-model
What It Checks
Tested against 13 models across 5 families (Llama, Qwen, Phi, LiquidAI, Nanbeige), 4 quantization levels (bf16 through 4-bit), from 0.6B to 30B parameters. Zero false negatives. Full validation results ->
Tier 0 — Sanity Checks (no MLX needed, < 30 seconds)
| Check | What it catches |
|---|---|
| Dtype Compatibility | BF16->FP16 precision loss, training/storage dtype mismatches |
| Tokenizer & EOS Config | Missing EOS tokens, chat template issues, Llama 3 dual-stop-token edge cases |
| Weight File Integrity | NaN/Inf values, all-zero layers, corrupt safetensors headers |
| MLX Version & Known Bugs | Outdated MLX with documented bugs affecting your model architecture |
Tier 1 — Statistical Smoke Tests (MLX required)
| Check | What it catches |
|---|---|
| Determinism | Non-reproducible outputs at temp=0 (infrastructure issue, not model) |
| Reference Divergence | MLX output diverging from PyTorch/Transformers reference |
| Quantization Quality | Excessive perplexity indicating broken quantization |
Install
Requires Python 3.11+ and macOS on Apple Silicon (M1-M4).
# From PyPI
pip install mlx-triage
# With MLX for Tier 1 checks
pip install "mlx-triage[mlx]"
# With reference comparison (Tier 1, Test 1.2)
pip install "mlx-triage[reference]"
# Development
git clone https://github.com/swaylenhayes/mlx-triage.git
cd mlx-triage
uv sync --extra dev
Usage
# Tier 0 only (default — no MLX needed)
mlx-triage check /path/to/model
# Tier 0 + Tier 1
mlx-triage check /path/to/model --tier 1
# JSON output
mlx-triage check /path/to/model --format json
# Save report to file
mlx-triage check /path/to/model --tier 1 --output report.json
Tier 0 runs in under 30 seconds on any model. Tier 1 requires MLX and takes 5-15 minutes depending on model size.
How It Works
mlx-triage uses a tiered diagnostic protocol — each tier increases in depth and cost:
-
Tier 0 reads model files directly (safetensors headers, config JSON, tokenizer config) without loading the model into memory. This catches the most common issues instantly.
-
Tier 1 loads the model via MLX and runs statistical tests — determinism checks (10 runs at temp=0), perplexity measurement against a fixed eval corpus, and optional comparison against a PyTorch reference backend.
-
Tiers 2-3 (planned) will add isolation tests (batch invariance, memory pressure, context length stress) and deep diagnostics (layer-wise activation comparison, cross-runtime analysis).
If Tier 0 finds critical issues, Tier 1 is skipped — fix the fundamentals first.
Known Bugs Database
mlx-triage ships with a curated database of documented MLX bugs (known_bugs.yaml), cross-referenced against your installed MLX version and model architecture. Running MLX < 0.22.0 with float16 weights? It flags the known qmv kernel overflow. Got a 4-bit Llama model looping on long prompts? There's a documented bug for that. Safetensors file looks valid but weights are numerically garbage? That's a known silent bfloat16 corruption path.
Contributing a bug report to the database is the easiest way to help — see CONTRIBUTING.md.
Research Basis
The diagnostic protocol is grounded in systematic analysis of MLX infrastructure defects across multiple model architectures and quantization levels. See METHODOLOGY.md for the evidence basis, including infrastructure defect taxonomy, first-party experiments, and cross-model synthesis.
Contributing
Contributions welcome — especially to the known bugs database. See CONTRIBUTING.md.
License
If mlx-triage saved you a debugging session, star it — it helps other MLX developers find the tool.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_triage-0.1.1.tar.gz.
File metadata
- Download URL: mlx_triage-0.1.1.tar.gz
- Upload date:
- Size: 305.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12a3774e5930ec2553499445e6072e261593ad80c0e07dd0628db7ff6a5f5c9f
|
|
| MD5 |
e7e546156d636336716fc7061b92d793
|
|
| BLAKE2b-256 |
a0d2fc782998302a03c785f8dcbcfd421b40237034c852403d556ab13cb62d7b
|
Provenance
The following attestation bundles were made for mlx_triage-0.1.1.tar.gz:
Publisher:
ci.yml on swaylenhayes/mlx-triage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlx_triage-0.1.1.tar.gz -
Subject digest:
12a3774e5930ec2553499445e6072e261593ad80c0e07dd0628db7ff6a5f5c9f - Sigstore transparency entry: 1017535891
- Sigstore integration time:
-
Permalink:
swaylenhayes/mlx-triage@79ec8ed68d64bfc9c8c937c17a0159132e578b9d -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/swaylenhayes
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@79ec8ed68d64bfc9c8c937c17a0159132e578b9d -
Trigger Event:
push
-
Statement type:
File details
Details for the file mlx_triage-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mlx_triage-0.1.1-py3-none-any.whl
- Upload date:
- Size: 31.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b8e6126443850523026d201349fdaeb5f44c303ed9beb30c7fb15ae41df742f
|
|
| MD5 |
d105e870494c1e36f66ab6cd2432916c
|
|
| BLAKE2b-256 |
b1df29a36f45e57f323a9d7aa2f231e32e5b7133a956968ca6f6d361975cf38f
|
Provenance
The following attestation bundles were made for mlx_triage-0.1.1-py3-none-any.whl:
Publisher:
ci.yml on swaylenhayes/mlx-triage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlx_triage-0.1.1-py3-none-any.whl -
Subject digest:
0b8e6126443850523026d201349fdaeb5f44c303ed9beb30c7fb15ae41df742f - Sigstore transparency entry: 1017535896
- Sigstore integration time:
-
Permalink:
swaylenhayes/mlx-triage@79ec8ed68d64bfc9c8c937c17a0159132e578b9d -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/swaylenhayes
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@79ec8ed68d64bfc9c8c937c17a0159132e578b9d -
Trigger Event:
push
-
Statement type: