Fingerprint any transformer's compression potential in 30 minutes.

These details have not been verified by PyPI

Project links

Project description

fraQtl

fraQtl Diagnostic

Fingerprint any transformer's compression potential — fast.

For public installs, use fraqtl-diagnostic >= 0.2.0 for the inference-readiness scanner.

Measures per-layer:

γ (stretched-exponential decay shape of the Hessian spectrum)
knee (spectrum cutoff index)
k95 (directions needed for 95% of eigenvalue energy)
depth-law (how decay shape evolves across layers)
compression potential + suggested bit budgets (Shannon-based)

Works on any HuggingFace-compatible transformer. ~3 min for a 0.5B model on A100, ~5 min for 1B, ~10 min for 7B.

Install

pip install fraqtl-diagnostic           # >=0.2.0 on PyPI
pip install -e /path/to/diagnostic-public  # editable install from source

Use

fraqtl analyze meta-llama/Llama-3.2-1B-Instruct

Inference readiness scan

diagnose-inference checks whether a model's config appears ready for a target serving context. It uses HuggingFace config.json plus textbook memory math: no GPU, no model load, no compression run.

fraqtl diagnose-inference Qwen/Qwen2.5-7B-Instruct --context 65536

Example summary:

model:      Qwen/Qwen2.5-7B-Instruct
arch:       qwen2 (GQA, layers=28, kv_heads=4)
context:    32768 native / 65536 requested
KV memory:  3.758 GB @ 64K (fp16)
flags:      CONTEXT_EXCEEDS_NATIVE ROPE_SCALING_REQUIRED YARN_REQUIRED YARN_MISSING

The command writes:

*_diagnose-inference_*.json — machine-readable readiness receipt
*_diagnose-inference_*.html — browser report with flags, KV memory table, backend checklist, and benchmark checklist

from fraqtl_diagnostic import analyze

report = analyze("meta-llama/Llama-3.2-1B-Instruct")
print(report.summary())
report.to_html("llama-1b_fingerprint.html")
report.to_png("llama-1b_fingerprint.png")

Try it on your GPU in one command (Modal)

If you don't want to fight Python-env dependencies locally, the fastest way to try the tool on a real model is via Modal (free tier gives you an A100):

# one-time: `pip install modal && modal setup`
# assumes a Modal secret named `huggingface` with an HF token

cd diagnostic-public/
modal run tests/modal_try.py --model-id Qwen/Qwen2.5-0.5B
modal run tests/modal_try.py --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0
modal run tests/modal_try.py --model-id mistralai/Mistral-7B-v0.1 --n-seqs 32 --seq-len 512

# pull the report back:
modal volume get fraqtl-hf-cache fraqtl-results/diagnostic-smoke ./reports/

What you get

Three outputs, same data, different framings:

*.json — machine-readable per-layer fingerprint (feed into other tools)
*.html — human-readable report with tables + embedded figure
*.png — 4-panel figure: spectrum overlay, γ depth-law, k95/layer, summary

How to read the output

γ (stretched-exponential shape parameter)

The Hessian input-covariance spectrum λ_i of each linear projection is fit against λ_i ≈ exp(−β · i^γ + c). γ is the shape of the decay:

γ range	interpretation
γ ≈ 0.3	Stretched: fast head decay, long tail → compressible
γ ≈ 0.5	Typical for attention o_proj on Llama/Qwen/Mistral
γ ≈ 0.8	Typical for MLP down_proj on Llama/Qwen/Mistral
γ ≈ 1.0	Pure exponential decay — harder to compress aggressively
γ > 1.0	Super-exponential (flat head, sharp crash) — limited

Lower γ = more compression headroom.

k95 / dim

Fraction of eigendirections needed to capture 95% of eigenvalue energy. A value of 0.1 means "95% of the Hessian mass lives in the top 10% of directions" — prime territory for rank-preserving compression. Values typical on production transformers:

k95/dim range	implication
< 10%	very compressible, low-rank friendly
10–30%	common; most dense transformers fall here
30–50%	harder to compress without structured loss
> 50%	spectrum is near-uniform, limited headroom

Depth-law

Linear fit of γ across layer depth. A negative slope is the common case (shallow layers exponential, deep layers more stretched). The magnitude of the slope × R² tells you whether the shape is a stable architecture property or noisy per-layer.

Suggested bit budget

Shannon-derived bits-per-weight that the information-theoretic ceiling can tolerate at three conservatism tiers. This is a ceiling, not a prediction. Real PPL loss from compression depends on the implementation. The diagnostic tells you how much room the math leaves; the actual compression run tells you how close to the ceiling you got.

Status

v0.2 (current): diagnostic metrics + suggested bit budgets, plus diagnose-inference for config-level serving-readiness checks.

v1.0 (coming with Paper 3, ~4 weeks): adds Shannon-efficiency grading — "your model is at X% of the theoretical ceiling vs competitors at Y%."

Same pip install fraqtl-diagnostic — grading is a layer on top of the existing diagnostic, not a separate tool.

How it works (one-paragraph summary)

For each target projection W : ℝ^d_in → ℝ^d_out, we capture the input covariance H = E[x^T x] on wikitext-2 calibration, then eigendecompose it. The spectrum λ_i encodes how much of the layer's Jacobian mass lives along each eigendirection of the input distribution. Tight universal shape (fixed γ across layers) implies compressible redundancy; a fat-tailed spectrum (high k95/dim) implies less. Shannon rate-distortion gives the information-theoretic floor D*(R) = geomean(λ) · 2^(−2R) at any bit budget R, which the diagnostic reports.

Full derivation + universality data across 8 architectures is in the forthcoming Paper 3.

Want to actually compress your model?

The diagnostic tells you the ceiling. The compression engine is the closed part of the product:

fraqtl.ai/compress

License

Apache 2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

May 24, 2026

0.3.0

May 13, 2026

0.2.3

May 13, 2026

0.2.2 yanked

May 12, 2026

Reason this release was yanked:

bugs

0.2.1

May 12, 2026

This version

0.2.0 yanked

May 12, 2026

Reason this release was yanked:

Replaced by 0.2.1. Drops unused HTML output and relaxes Python pin to allow Python 3.14.

0.1.4

May 11, 2026

0.1.3

Apr 23, 2026

0.1.2

Apr 23, 2026

0.1.1

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fraqtl_diagnostic-0.2.0.tar.gz (47.2 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fraqtl_diagnostic-0.2.0-py3-none-any.whl (45.3 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file fraqtl_diagnostic-0.2.0.tar.gz.

File metadata

Download URL: fraqtl_diagnostic-0.2.0.tar.gz
Upload date: May 12, 2026
Size: 47.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for fraqtl_diagnostic-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a78208647aabac61b281a2e7ddefd184c3134a06e709c51c049b2b3fc5da3d32`
MD5	`823691bb6600ba553a29a797aebc976d`
BLAKE2b-256	`69763190901d3fa90fbcbb646101957f906bd6b2589d20578c1acaeceec4ea06`

See more details on using hashes here.

File details

Details for the file fraqtl_diagnostic-0.2.0-py3-none-any.whl.

File metadata

Download URL: fraqtl_diagnostic-0.2.0-py3-none-any.whl
Upload date: May 12, 2026
Size: 45.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for fraqtl_diagnostic-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aa048fdf5c3469b749abf101abba2abe1f1a0c48c2226a9f31aab1fec7323cf4`
MD5	`30bef1454adb91c36fc24c6d28522583`
BLAKE2b-256	`fdf3621c7f590f106518e5cc7597d8113956d4b7499cc120acf20080eaa3cb34`

See more details on using hashes here.

fraqtl-diagnostic 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fraQtl Diagnostic

Install

Use

Inference readiness scan

Try it on your GPU in one command (Modal)

What you get

How to read the output

γ (stretched-exponential shape parameter)

k95 / dim

Depth-law

Suggested bit budget

Status

How it works (one-paragraph summary)

Want to actually compress your model?

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes