Geometric LLM compression: factor+int4 with verifiable quality certificates

These details have not been verified by PyPI

Project links

Project description

HyperRetro

Geometric LLM compression with verifiable quality certificates.

Install

pip install hyperretro
hyperretro setup          # interactive guided install

Or pick your extras:

pip install "hyperretro[hf]"               # + HuggingFace (compress, export, distill)
pip install "hyperretro[gguf]"             # + GGUF reader (load llama.cpp / Ollama files)
pip install "hyperretro[vllm]"             # + vLLM adapter
pip install "hyperretro[hf,gguf,vllm,bench]"  # full stack

Quick Start

# Compress a model
hyperretro compress Qwen/Qwen2.5-1.5B --ffn-rank 1024 --int4 -o compressed/

# Export to GGUF (llama.cpp / Ollama)
hyperretro export compressed/ --format gguf --quantize Q4_K_M

# Load a GGUF file directly
hyperretro compress ./model-q4_k_m.gguf --ffn-rank 256 -o compressed/

# Get a quality certificate
hyperretro certify --model Qwen/Qwen2.5-1.5B --rank 1024 --out cert.json

# Kernel benchmark
hyperretro bench-kernels

# List available model backends
hyperretro list-backends

CLI Commands (13)

Command	Description
`setup`	Interactive install wizard — picks the right extras
`compress`	Compress a model (HF, GGUF, OM) — SVD + int4
`export`	Export to GGUF / safetensors / HF format
`info`	Inspect a checkpoint: tensors, quant, layers
`certify`	Quality certificate: trust tier + jury-proof PPL bounds
`bench-kernels`	Fused dual-Q8_0 GEMV vs baselines
`benchmark`	Full compression benchmark suite
`distill`	GRC light distillation — recover PPL after compression
`gauge`	AxiomGauge — diagonal gauge optimization (free quality)
`card`	Generate HuggingFace model card
`red-team`	Adversarial attack evaluation (GCG/AutoPrompt/PAIR)
`list-backends`	Show available model backends
`--help`	Full help for any subcommand

Python API (32 exports)

Core compression

import hyperretro

# Load any model (HF, GGUF, OpenMythos, vLLM)
model = hyperretro.load_model("Qwen/Qwen2.5-1.5B")
model = hyperretro.load_model("model-q4_k_m.gguf")          # GGUF auto-detected
model = hyperretro.load_model("mythos_1b", backend="openmythos")

# Compress
compressed = hyperretro.compress(model, ffn_rank=1024, int4=True)

# Export
hyperretro.export_model(compressed, "model.gguf", format="gguf")
hyperretro.export_model(compressed, "compressed/", format="safetensors")

Certificates & benchmarks

cert = hyperretro.certify_compression(state_dict, config, stats, model_id="my-model")
print(cert.summary())
# -> "GOLD: max forward error <= 0.34, 94% spectral efficiency"

bench = hyperretro.run_kernel_bench(rows=4096, in_dim=4096, iters=50)
bench = hyperretro.run_compression_bench("Qwen/Qwen2.5-1.5B", out_dir="/tmp/test")

Geometric tools (requires hypercore)

from hyperretro import (
    AxiomGauge,            # GL(d) diagonal gauge optimizer
    ThermalRankController, # temperature-driven rank scheduler
    OnlineOjaBasis,        # rejection-driven adaptive PCA
    NativeLinear,          # train on compressed Gr(k,d) manifold
    RiemannianAdamW,       # manifold-respecting optimizer
    KExpansionScheduler,   # exponential k-warmup
    TreeDrafter,           # Medusa/EAGLE tree speculative decode
    GCGAttack,             # adversarial prompt attack
)

Kernels

import numpy as np

scales, codes = hyperretro.q8_0_quantize(W)
W_back = hyperretro.q8_0_dequantize(scales, codes)
y_a, y_b = hyperretro.gemv_dual_q8_0(x, Wa, Wb)   # fused dual GEMV
bk = hyperretro.kernels_backend()                   # -> 'gpu' / 'torch' / 'numpy'

Model Backends (4)

Backend	Load from	Extra
HuggingFace	Repo ID, local dir, safetensors	built-in
GGUF	`.gguf` files (llama.cpp / Ollama)	`[gguf]`
OpenMythos	OpenMythos models	`om`
vLLM	vLLM LLM instances	`[vllm]`

Auto-detection: .gguf extension -> GGUF, mythos_ prefix -> OpenMythos, / in name -> HF.

Kernel Backends (6-tier)

#	Backend	Description	Requirements
1	`cuda_cext`	Raw CUDA kernel (fastest)	NVCC + host compiler
2	`cext`	JIT C++ extension	C++ compiler
3	`cpu_opt`	Pre-compiled AVX2	x86_64 CPU
4	`gpu`	Pure-PyTorch CUDA (9.5x numpy)	PyTorch + CUDA
5	`torch`	Pure-PyTorch CPU	PyTorch
6	`numpy`	Always available	nothing

Set HYPERRETRO_FORCE_FALLBACK=1 to force numpy path.

Certificate System

HyperRetro is the only compression tool that produces mathematically verifiable quality certificates:

Trust tier: PLATINUM / GOLD / SILVER / BRONZE
BP-NS bound: per-layer forward-error bound (Eckart-Young)
Spectral efficiency: information retained per parameter
Frobenius certificates: relative error in weight space (Q/K/V)
Jury-proof PPL bounds: strict worst-case + concentration bound

Benchmarks

Config	PPL	Disk	Shrink
fp16 baseline (Qwen2.5-1.5B)	2.33	2955 MB	1.00x
Aware-factored fp16	4.39	2581 MB	1.15x
int4 FFN-only + AWQ	6.04	1242 MB	2.38x

GPU: RTX 4070, dual-Q8 GEMV 4096x4096 = 21.5ms (9.5x vs CPU).

Contributing

git clone https://github.com/NagusameCS/HyperTensor.git
cd HyperTensor
pip install -e ".[dev]"
pytest tests/ -v

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.6

May 17, 2026

This version

0.3.5

May 16, 2026

0.3.4

May 16, 2026

0.3.3

May 15, 2026

0.3.2

May 15, 2026

0.3.1

May 15, 2026

0.3.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hyperretro-0.3.5-py3-none-any.whl (5.9 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file hyperretro-0.3.5-py3-none-any.whl.

File metadata

Download URL: hyperretro-0.3.5-py3-none-any.whl
Upload date: May 16, 2026
Size: 5.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for hyperretro-0.3.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ec004d5e05cf950b1382849c41849d4a4d9a30ebb8095f8cc007e9986adf4593`
MD5	`da717314bdd5946929d28987b803e14c`
BLAKE2b-256	`2389b0df57c268c4369b679c311f376dc9e6070c9d06a1cdce9c9c9f440ce664`

See more details on using hashes here.

hyperretro 0.3.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HyperRetro

Install

Quick Start

CLI Commands (13)

Python API (32 exports)

Core compression

Certificates & benchmarks

Geometric tools (requires hypercore)

Kernels

Model Backends (4)

Kernel Backends (6-tier)

Certificate System

Benchmarks

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes