Quantization quality analyzer — pure-Python GGUF/safetensors parsing, layerwise analysis, quality prediction. Zero deps.
Project description
quantbenchx
Quantization quality analyzer for LLMs. Pure-Python GGUF and safetensors parsing, layerwise sensitivity analysis, quality prediction, and mixed-quantization recommendations — zero dependencies.
Point quantbenchx at any .gguf or .safetensors file and get an instant quality report: dtype distribution, estimated perplexity impact, layer sensitivity scores, and mixed-precision recommendations.
Why quantbenchx?
| Problem | quantbenchx Solution |
|---|---|
| "Is Q4_K_M good enough for my use case?" | Estimated perplexity delta + risk level |
| No way to inspect GGUF internals without llama.cpp | Pure-Python parser — just pip install |
| Which layers are most sensitive to quantization? | Layerwise sensitivity scoring with position awareness |
| Choosing between Q4_K_M, Q5_K_S, Q6_K, etc. | Side-by-side format comparison with rankings |
| Mixed-precision quant is complex to configure | Automated recommendations targeting your bpw budget |
Installation
pip install quantbenchx # zero dependencies
pip install quantbenchx[cli] # + click, rich for terminal UI
pip install quantbenchx[all] # everything
Quick Start
1. Profile a quantized model
from quantbenchx import profile_gguf, estimate_quality
profile = profile_gguf("Meta-Llama-3-8B-Q4_K_M.gguf")
print(f"Model: {profile.name}")
print(f"Size: {profile.size_gb:.2f} GB")
print(f"Avg bits/weight: {profile.quant.avg_bits_per_weight:.2f}")
print(f"Compression: {profile.compression_ratio:.1f}x vs FP32")
quality = estimate_quality(profile)
print(f"Risk level: {quality.risk_level}")
print(f"Est. perplexity delta: +{quality.estimated_perplexity_delta:.4f}")
2. Layerwise analysis
from quantbenchx import profile_gguf, analyze_layers, layer_sensitivity
profile = profile_gguf("model.gguf")
# Get sensitivity scores
sensitivity = layer_sensitivity(profile)
for layer_name, score in sorted(sensitivity.items(), key=lambda x: -x[1])[:5]:
print(f" {layer_name}: {score:.3f}")
# Full layerwise breakdown
for row in analyze_layers(profile):
print(f"{row['name']:30s} {row['avg_bits_per_weight']:5.2f} bpw sens={row['sensitivity']:.3f}")
3. Compare quantization formats
from quantbenchx import profile_gguf, compare_profiles, compare_formats
q4 = profile_gguf("model-Q4_K_M.gguf")
q5 = profile_gguf("model-Q5_K_M.gguf")
q8 = profile_gguf("model-Q8_0.gguf")
# Pairwise comparison
diff = compare_profiles(q4, q8)
print(f"Size delta: {diff['size_delta_bytes'] / 1e9:.2f} GB")
print(f"BPW delta: {diff['bpw_delta']:.2f}")
# Multi-format ranking
ranking = compare_formats([q4, q5, q8])
for row in ranking["ranking"]:
print(f" #{row['rank']} {row['name']} — {row['avg_bpw']:.2f} bpw, {row['size_gb']:.2f} GB")
4. Mixed-quantization recommendations
from quantbenchx import profile_gguf, recommend_mixed_quant
profile = profile_gguf("model.gguf")
rec = recommend_mixed_quant(profile, target_bpw=4.5)
print(f"Target: {rec['target_bpw']} bpw → Estimated: {rec['estimated_avg_bpw']} bpw")
print(f"High precision: {rec['n_high_precision_layers']} layers ({rec['high_quant']})")
print(f"Low precision: {rec['n_low_precision_layers']} layers ({rec['low_quant']})")
5. Predict quality for any bpw
from quantbenchx import perplexity_delta
for bpw in [8.0, 6.0, 5.0, 4.5, 4.0, 3.5, 3.0, 2.0]:
delta = perplexity_delta(bpw)
print(f" {bpw:.1f} bpw → +{delta:.4f} perplexity")
CLI
# Profile a GGUF or safetensors file
quantbenchx profile model.gguf
# Markdown output
quantbenchx profile model.gguf --markdown
# Save JSON report
quantbenchx profile model.gguf -o report.json
# Compare two files
quantbenchx compare model-Q4.gguf model-Q8.gguf
# Layerwise analysis
quantbenchx layers model.gguf
# Mixed-quant recommendation
quantbenchx recommend model.gguf --target-bpw 4.5
Supported Formats
| Format | Parser | Status |
|---|---|---|
| GGUF (v2, v3) | Pure Python — reads header only | Full support |
| safetensors | Pure Python — reads JSON header only | Full support |
Supported Dtypes
Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_1, Q4_K_S, Q4_K_M, Q5_0, Q5_1, Q5_K_S, Q5_K_M, Q6_K, Q8_0, IQ1_S, IQ2_XXS, IQ3_XXS, IQ4_XS, F16, BF16, F32
Architecture
quantbenchx/
├── _types.py # DType, TensorInfo, LayerInfo, ModelProfile, QualityEstimate
├── profile.py # Pure-Python GGUF & safetensors parsers
├── layerwise.py # Layer sensitivity analysis, mixed-quant recommendations
├── compare.py # Cross-format and pairwise comparisons
├── predict.py # Quality estimation from bits-per-weight curves
├── report.py # JSON/text/rich/markdown formatting
└── cli.py # Click CLI interface
See Also
Part of the stef41 LLM toolkit — open-source tools for every stage of the LLM lifecycle:
| Project | What it does |
|---|---|
| tokonomics | Token counting & cost management for LLM APIs |
| datacrux | Training data quality — dedup, PII, contamination |
| castwright | Synthetic instruction data generation |
| datamix | Dataset mixing & curriculum optimization |
| toksight | Tokenizer analysis & comparison |
| trainpulse | Training health monitoring |
| ckpt | Checkpoint inspection, diffing & merging |
| infermark | Inference benchmarking |
| modeldiff | Behavioral regression testing |
| vibesafe | AI-generated code safety scanner |
| injectionguard | Prompt injection detection |
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quantbenchx-0.3.0.tar.gz.
File metadata
- Download URL: quantbenchx-0.3.0.tar.gz
- Upload date:
- Size: 47.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29c341a472b09a7ce6547d469f548203131ca834e85b0492e2e20b0558cce8a2
|
|
| MD5 |
35371896623097801d877570c06c7d79
|
|
| BLAKE2b-256 |
c9450dda5caac3eb724e165c07dfd4ee54e8e1cbb14ca4545a32f38cbb8b486d
|
File details
Details for the file quantbenchx-0.3.0-py3-none-any.whl.
File metadata
- Download URL: quantbenchx-0.3.0-py3-none-any.whl
- Upload date:
- Size: 31.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd53882d0ba88150cef4dc0f65400cf3b0e9ac071d159229a05062c283ddf875
|
|
| MD5 |
351cfca6e08b68499b28ec78f21b1cf4
|
|
| BLAKE2b-256 |
40d264130c6ac75aaaa33ffd71cc8d1b60bcef814a2f6fc451a997ed7abca6d9
|