Geometric LLM compression: factor+int4 with verifiable quality certificates

These details have not been verified by PyPI

Project links

Project description

HyperRetro

HyperTensor, retrofitted into the PyTorch / HuggingFace / vLLM ecosystem.

HyperTensor proper is a standalone runtime. HyperRetro is the integrated sibling project: it takes the same geometric primitives (UGT shared basis, GRC / sink-aware projection, geodesic speculative draft, fused dual-Q8 GEMV) and exposes them as drop-in pieces of the standard inference stack.

hyperretro/
├── kernels/        # PyTorch C++ extension (gemv_dual_q8_0, ...)
├── hf/             # offline HuggingFace compression -> .safetensors
├── vllm/           # speculative-decoding draft adapter
└── bench/          # 3-way benchmark harness (baseline | retro | HyperTensor)

Three retrofits

1. Fused kernels as a PyTorch extension

The CUDA kernel kernel_gemv_dual_q8_0 from runtime/nn/cuda_kernels.cu is wrapped as a JIT-built torch.utils.cpp_extension so users can call it from regular PyTorch:

import hyperretro
import torch

x = torch.randn(4096)
# Wa, Wb may be float matrices or pre-quantized (scale, codes) tuples
out_a, out_b = hyperretro.gemv_dual_q8_0(x, Wa, Wb)

Backend resolution: cext (JIT-compiled C extension) → torch (pure torch reference) → numpy (always works). Force the fallback with HYPERRETRO_FORCE_FALLBACK=1.

2. Offline HuggingFace compression

A single CLI takes a vanilla HF model, runs the GRC projection / sink-aware GRC pipeline (Paper E), and writes the result back out as standard .safetensors shards that load with stock AutoModelForCausalLM.from_pretrained:

pip install -e hyperretro[hf]
hyperretro-compress \
    --model Qwen/Qwen2.5-0.5B-Instruct \
    --out ./qwen-grc-1024/ \
    --rank 1024 \
    --sink 4

The output directory is 100 % HuggingFace-native — no HyperTensor runtime needed at inference time. A hyperretro_report.json is written alongside recording the per-layer Frobenius rel-err.

3. Geodesic speculative draft for vLLM

hyperretro.vllm.GeodesicDraft replaces the random / smaller-model draft proposer in vLLM-style speculative decoding with the geodesic-step draft from Paper C. The adapter is framework-agnostic (propose(h_curr, h_prev) -> (token_ids, confidences)) and includes a register_with_vllm() hook for live deployments.

Benchmarks

hyperretro-bench kernel  --rows 4096 --in-dim 4096
hyperretro-bench spec    --d-model 512 --k 64 --vocab 2048 --steps 64
hyperretro-bench compress --model Qwen/Qwen2.5-0.5B --out /tmp/qwen-retro \
                         --rank 256 --eval-text "The quick brown fox..."

Each subcommand emits a JSON report comparing standard baseline, HyperRetro, and (where applicable) standalone HyperTensor.

License

MIT for code, CC-BY-4.0 for the accompanying documentation/papers — same as the parent HyperTensor project.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.6

May 17, 2026

0.3.5

May 16, 2026

This version

0.3.4

May 16, 2026

0.3.3

May 15, 2026

0.3.2

May 15, 2026

0.3.1

May 15, 2026

0.3.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hyperretro-0.3.4-py3-none-any.whl (5.0 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file hyperretro-0.3.4-py3-none-any.whl.

File metadata

Download URL: hyperretro-0.3.4-py3-none-any.whl
Upload date: May 16, 2026
Size: 5.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for hyperretro-0.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`da9e7ed04512c642443dbf44ec1ab07def67d9a4ac5f571346016519f9d157f0`
MD5	`bba58f2e61714a922ed37300d7de128a`
BLAKE2b-256	`44c4bedd7d6d55b190f49d5697fb81328a996eb363eab877edf7b42f14470708`

See more details on using hashes here.

hyperretro 0.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HyperRetro

Three retrofits

1. Fused kernels as a PyTorch extension

2. Offline HuggingFace compression

3. Geodesic speculative draft for vLLM

Benchmarks

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes