Lossless 5-bit transformer compression — 8 architectures (1.7B–70B, dense + MoE) at sub-1% PPL degradation. Customer-distributable via `uc pack v0.3`.

These details have not been verified by PyPI

Project links

Project description

UltraCompress

Extreme LLM compression — three independent mechanisms that compose multiplicatively: (A) Per-layer streaming compression — Qwen2.5-72B → 8.98 GB peak VRAM on a single RTX 5090, PPL ratio 1.0162×. (B) Sub-3-bpw row-overlay weight compression (Claims 17–20) — beats bitsandbytes nf4 at 30% fewer bits on a 6-model cohort. (C) Fractal Residual Recursion (FRR) (Claims 1–16) — shared-block architectural compression at 311–734×.

⭐ Latest — Streaming compression: full Qwen scaling curve, 72B on a single GPU (2026-05-04)

Per-layer streaming compression validated end-to-end across 8B → 72B with peak VRAM bounded by ~one transformer layer regardless of total model depth. Production-grade quality (PPL ratio ≤ 1.05) at every scale; Qwen2.5-72B compressed to 8.98 GB peak VRAM on a single RTX 5090 with 1.6% PPL drift.

Model	Layers	Baseline PPL	Compressed PPL	PPL ratio	Peak VRAM	Status
Qwen3-8B	36	16.79	17.26	1.0278×	2.26 GB	PROD
Qwen3-14B	40	15.44	15.61	1.0111×	3.37 GB	PROD (best)
Qwen3-32B	64	13.77	14.27	1.0367×	4.85 GB	PROD
Qwen2.5-72B	80	8.92	9.07	1.0162×	8.98 GB	PROD (headline)

Recipe: GSQ scalar 5 bpw + per-block (B=64) absmax + V18-C rank-32 low-rank correction overlay + 200-step KL distillation per layer. Process: load layer fp16 weights via safetensors lazy load → cache teacher hidden output → quantize → fit V18-C against cache → save → free → next layer. Compression time scales linearly: ~1 min/layer overhead.

Bigger models compress at least as well as smaller ones — empirically consistent with arxiv 2505.02214 within the Qwen family. The 100T-on-1-GPU target is now a math problem (multiplicative composition with Track B substrate sharing + Track C inference streaming), not a prayer.

Reproduce on Qwen3-8B (~9 min on a 5090):

python scripts/overlay/streaming_compression_runner.py \
    --model qwen3-8b --bpw 5 --block_size 64 --rank 32 \
    --train_steps 200 --n_calib 100 --n_eval 50

Result JSONs under scripts/overlay/artifacts/streaming_compression_{8b,14b,32b,72b}_smoke.json. Patent supplement covering streaming-compression mechanism filed 2026-05.

Latest — Claim 20: Row-overlay vs external quantizers (n=500, 6-model cohort)

Head-to-head LAMBADA benchmark against two independent external quantization families (bitsandbytes + HQQ). 48 measurements = 6 models × 8 methods × n=500 samples.

method	bpw	cohort T1-retention	median ppl-ratio
bnb_int8	8.000	99.75%	1.005
bnb_nf4	4.000	98.31%	1.054
hqq_4bit_g64	4.500	97.72%	1.078
our_mixed_2p79	2.798	95.63%	1.131
our_fp8_2p79	2.795	95.57%	1.128
hqq_3bit_g64	3.500	72.46%	1.608
hqq_2bit_g16	4.000	34.82%	17.14
hqq_2bit_g64	2.500	3.46%	5284.48

Production tier ladder (Qwen3-8B, validated 2026-05-02):

Operating point	T1 retention	PPL ratio	Compression	Verdict
6 bpw (GSQ k-means)	96.72%	1.0024	2.67x	Zero-degradation tier
5 bpw (GSQ + low-rank correction)	94.39%	1.003	3.2x	Production-grade (default)
5 bpw + additional compression on correction overhead	94.40%	1.0029	3.2x weights + 1.30x correction	Composable stack proof
4 bpw	90.14%	1.014	4.0x	Light degradation
3 bpw	80.97%	1.084	5.3x	Aggressive

The 6 bpw tier is effectively lossless (PPL ratio 1.0024). Three independent compression mechanisms compose multiplicatively without quality loss -- the correction overhead itself compresses an additional 1.30x at storage level with no T1 impact.

Scaling validation (Qwen3-14B, 4 bpw + correction compression + teacher distillation): 88.41% T1 at PPL ratio 0.9752. Full production stack holds at scale.

Headline results at 7–8B scale:

model	ours @ 2.80 bpw	vs bnb_nf4 @ 4.00 bpw	vs hqq_4bit @ 4.50 bpw	bits saved
Qwen3-8B	97.57% T1-ret	−0.67 pp	−1.17 pp	30–38%
Mistral-7B	98.03% T1-ret	−1.32 pp	−0.73 pp	30–38%

Qualitative differentiator. HQQ produces catastrophic failures (ppl-ratio > 10×) on 6/6 models at 2-bit g64 and 4/6 at 2-bit g16. Our row-overlay produces zero catastrophic failures across all 48 measurements.

Full results: RESULTS.md § Claim 20 · PATENT_CLAIMS.md § Claim 20 · raw data: results/h2h_n500_full.json · analysis: docs/claim20_summary.txt.

Reproduce: python scripts/overlay/benchmark_head_to_head.py --methods our_fp8_2p79,our_mixed_2p79,bnb_nf4,bnb_int8,hqq_4bit_g64,hqq_3bit_g64,hqq_2bit_g64,hqq_2bit_g16 --n 500.

Track B — FRR architectural compression (held-out, 1000 samples, seed 42)

Independent re-evaluation on a held-out region of FineWeb-Edu that was least-touched during training. Protocol: 1000 samples, 128-token context, seed 42, bootstrap 95% CIs. Reproduce in ~15 minutes on a single 32GB GPU: python scripts/frr/hires_eval.py --tags hq5_h256 hq5_h128 --n 1000.

Variant	Trainable	Compression	all-T1	all-T10	last-T10	Quality	PPL ratio
HQ5 h256	1,509,916	311×	55.40%	69.64%	64.24%	75.94%	1.216
HQ5 h128	640,284	734×	53.78%	68.00%	62.36%	73.86%	1.254

Interpretation. The h256 student has 0.088% of the teacher's trainable parameters and reproduces its top-10 next-token set 69.64% of the time on unseen text. The h128 student has 0.037% of the teacher's parameters and still reproduces 68.00%. For reference, the typical distillation baseline (DistilBERT / TinyBERT family) achieves 2–7× compression at similar quality; FRR-HQ5 is ~50× beyond that frontier.

Full results: results/hires_results_hq5.json. Pitch for business use: docs/PITCH.md.

Rigor & reproducibility

The numbers above are in-distribution held-out (training samples from the full 500M-token range, eval samples from the tail 50M with a different seed). To defend against a stricter reviewer we also ship:

⭐ Fully-disjoint eval on WikiText-103 test split — DONE. python scripts/overlay/wikitext_eval.py --tags hq5_h256 hq5_h128 --n 1000. WikiText-103 test was never touched during training and is a standard public benchmark. Result: on WT103 HQ5-h256 scores T1 = 55.53% (vs 55.40% in-domain) and T10 = 66.82% (vs 69.64% in-domain). Top-1 agreement is within 0.13 percentage points of the in-domain number — strong evidence the student learned the teacher's distribution rather than just the FineWeb-Edu surface statistics. Raw data: results/wikitext_results.json.
Matched-parameter standard-KD baseline — python scripts/frr/run_baseline_distill.py --h 256 --n_layers 2 --steps 80000 --tag baseline_h256_L2. Trains a vanilla transformer student at the same ~1.5M trainable params using classical Hinton-2015 distillation. Head-to-head delta proves the nested-fractal + entropy-weighted loss is load-bearing.
Pinned dependencies — see requirements.txt for exact versions (torch 2.11.0+cu128, transformers 4.57.2, datasets 4.8.4, numpy 2.2.6).
Full reproduce guide — see REPRODUCE.md for step-by-step.

15-minute interactive demo

python demo.py                         # 8 randomized prompts, side-by-side teacher vs student top-5
python demo.py --prompt "your text"    # single-prompt mode
python demo.py --tag hq5_h128          # 734× model instead of 311×

Training Results (per-run training-eval ceilings, 80K steps each)

Variant	Trainable	Compression	Best T1	Best all-T10	Peak T1	Peak all-T10	Quality
HQ5 h256	1.51 M	311×	55.1%	70.0%	57.0%	70.0%	70.0%
HQ5 h128	0.64 M	734×	54.0%	68.4%	54.4%	68.4%	68.4%
HQ4 h256	1.51 M	311×	54.3%	69.2%	55.7%	69.6%	68.9%
HQ4 h128	0.64 M	734×	53.4%	68.0%	55.7%	68.6%	66.9%
HQ3 h256	1.51 M	311×	54.1%	68.2%	54.7%	68.2%	68.1%
HQ3 h128	0.64 M	734×	54.2%	68.0%	54.2%	68.0%	67.7%

HQ5 h256 is the current flagship. First checkpoint to cross 70% quality on Qwen3-1.7B distillation. Details: docs/HQ5_RESULTS.md, docs/HQ4_RESULTS.md, docs/HQ3_RESULTS.md. Currently training: HQ6 (dual GPU, ENT_POW=2.0) and HQ7 long-horizon (160K steps).

ASVD head fine-tuning (trained separately — stackable with FRR body)

Rank (r)	Head compression	T1	T10	PPL ratio
r=1024	2.0×	91.66%	92.57%	1.345
r=512	3.9×	87.73%	88.93%	2.570
r=256	7.9×	83.22%	82.83%	3.885

r=1024 exceeds the user's 70% T1 / 90% T10 goal on head-only evaluation — see docs/STATUS.md.

⭐ End-to-end stack: FRR body + ASVD head combined (1000 samples, seed 42)

Full end-to-end compression — the actual deployment artifact. FRR body (HQ5 h256) with its output projection replaced by a rank-reduced ASVD head, then fine-tuned.

Config	Params	Compression	all-T1	all-T10	last-T10	PPL ratio	Quality
Teacher (Qwen3-1.7B body)	1092.1 M	1.0×	100%	100%	100%	1.000	100%
HQ5 h256 + full head	312.67 M	3.5×	55.40%	69.64%	64.24%	1.216	75.94%
HQ5 h256 + ASVD r=1024	159.19 M	6.9×	54.91%	69.51%	64.03%	1.410	70.22%
HQ5 h256 + ASVD r=512	80.35 M	13.6×	54.46%	68.98%	64.05%	2.400	55.33%
HQ5 h256 + ASVD r=256	40.93 M	26.7×	53.88%	68.32%	63.40%	3.172	49.92%

Interpretation. The FRR+ASVD end-to-end stack at 26.7× total compression still reproduces 68.32% of the teacher's top-10 next-token set — within 1.3 percentage points of the uncompressed-head baseline. This is the number to compare against GPTQ/AWQ/pruning in public benchmarks. Raw data: results/combined_stack_results_hq5.json.

Pareto frontier — pick your operating point

Compression vs Fidelity

Customer picks where on the curve to land. Existing compression methods (GPTQ, AWQ, SparseGPT, DistilBERT) all cluster at 1.5–7.5× compression with >95% fidelity. FRR+ASVD extends the frontier by an order of magnitude into the 3–27× regime, with a graceful quality–compression trade-off rather than a cliff:

Quality-first deployment (3–7× compression): hq5_h256+full_head or hq5_h256+asvd_r1024_ft — 70% quality with 7× fewer parameters. Appropriate for latency-critical production inference.
Balanced deployment (8–14× compression): hq5_h128 or hq5_h256+asvd_r512_ft — 68–69% T10 with under 80M parameters. Appropriate for edge GPU boxes, 8GB Apple Silicon.
Aggressive deployment (27× compression): hq5_h256+asvd_r256_ft — the 40.9M-parameter model; targets phones, Raspberry Pi class hardware. Quality drops to 50% — appropriate for offline / retrieval-augmented / constrained-vocabulary use cases only.

Raw Pareto data: docs/pareto_frontier.json. Reproduce the chart: python scripts/frr/make_pareto_chart.py.

Cross-model generality (scaling the method)

The method is architecture-agnostic. This release includes:

scaling/teacher_loader.py — auto-detecting Qwen3-family loader. Point it at any cached Qwen3 state dict; it infers hidden size, layer count, head counts, and intermediate dim from the tensors.
scripts/frr/run_frr_generic.py — generic trainer with --teacher_cache flag. Drop-in replacement for the hardcoded 1.7B trainer.
scripts/frr/scale_eval.py — model-agnostic eval with bootstrap CIs.
tests/test_sanity.py — 6-test regression guard (teacher auto-detect on both 0.6B and 1.7B caches, forward determinism, flagship checkpoint reproducibility, random-init floor, ckpt roundtrip).

Verified on both Qwen3-0.6B (hidden=1024) and Qwen3-1.7B (hidden=2048) state dicts. See docs/SCALING_PLAN.md for the cross-scale experimental matrix and docs/KNOWN_ISSUES.md for honest disclosures.

How It Works

Most compression asks: "How do I make these weights smaller?" FRR asks: "Do I even need different weights per layer?"

Adjacent transformer layers show near-zero weight cosine similarity (~0.001) but CKA > 0.9 (functional similarity). FRR learns the shared functional form once and uses lightweight per-scale modulation to induce layer-specific behavior.

Traditional Transformer           FRR Compressed Model
========================          ==========================

Input                             Input
  │                                 │
  ▼                                 ▼
[Layer 0 weights: 54 MB]          [Shared Block: 0.64–1.51 M params]
  │                                 │ + γ₀, β₀ (per-scale)
  ▼                                 ▼
[Layer 1 weights: 54 MB]          [Same Shared Block]
  │                                 │ + γ₁, β₁
  ▼                                 ▼
  ...  (28 layers)                  ...  (4 scales × 7 iterations)
  │                                 │
  ▼                                 ▼
Output                            Output

Total body: 1,410 MB              Total body: 2.56–6.04 MB

Shared-weight (looped) transformers are Turing-complete (Giannou et al., 2023).

Training Objective — the HQ4/HQ5 Ceiling Break

HQ3 plateaued at T1 ≈ 54% because its confidence-weighted CE + margin loss concentrated gradient on tokens the student had already saturated. HQ4 inverts that signal; HQ5 sharpens it further:

hard_weight  = (1 + H(teacher_logits)) ^ entropy_power
total_loss   = hard_weight · fkl
             + 0.3 · rkl
             + latent_w(step) · latent_mse        # 1.0 → 0.1 across steps 20K→50K
             + 0.5 · ce_ramp(step) · ce           # 0.5 → 1.0 across 16K→48K
             + 0.3 · ce_ramp · hard_weight · margin_loss

Two mechanisms working together:

Inverted weighting forces gradient into high-entropy positions — exactly where T10 gains live.
Latent decay releases the mean-seeking attractor so the ce+margin signal can shape the output distribution rather than just the intermediate latents.

Experiment Timeline

Stage	Compression	T1	all-T10	Status	Notes
Baseline	52×	47%	62–65%	Done	Pure-KL distillation
TinyFRR	311–2200×	43–46%	60–64%	Done	Compression sweep h=16…1024
HQ2	311–734×	~50%	67%	Done	Adds hidden-state latent alignment
HQ3	311–734×	54.2%	68.2%	Done	5-loss w/ confidence-weighted CE+margin
HQ4	311–734×	54.3%	69.2%	Done	Inverted entropy weighting + latent decay
HQ5	311–734×	55.1%	70.0%	Done, public	Stronger entropy_power (1.5) + per-width latent floor
HQ6	311–734×	TBD	TBD	Training	ENT_POW=2.0 (h256) + h384 capacity test

Full training logs: logs/ (hq{3,4,5,6}_h{128,256,384}.log).

Quick Start

git clone https://github.com/mounnar/ultracompress.git
cd ultracompress
pip install -r requirements.txt

# 1. Cache the teacher (one-time, ~7 GB for Qwen3-1.7B)
python tools/download_models.py

# 2. Pre-tokenize training data (one-time, ~2 GB for 500M tokens)
python prepare_500M_tokens.py

# 3. Train TinyFRR body with the HQ4 ceiling-break objective
python scripts/frr/run_hq4_ceiling_break.py --h 256 --steps 80000 --tag my_run

# 4. (Optional) Dual-GPU detached launch
python scripts/frr/launch_hq4_detached.py     # spawns h=128 on GPU 0, h=256 on GPU 1

# 5. Fine-tune an ASVD-factored lm_head
python finetune_asvd_head.py --r 1024 --steps 20000 --tag asvd_r1024_ft

Resume support

All run_hq*.py scripts save {ckpt_dir}/latest.pt every 2000 steps. Relaunching the same command auto-resumes.

Detached training on Windows

scripts/frr/launch_hq4_detached.py / scripts/frr/launch_hq5_detached.py use subprocess.Popen with DETACHED_PROCESS | CREATE_BREAKAWAY_FROM_JOB | CREATE_NEW_PROCESS_GROUP so training survives terminal closure, VS Code restart, and parent-shell kills.

Repository Layout

ultracompress/
├── README.md                     This file
├── RESULTS.md                    Per-claim measurement record (Claims 1-20)
├── PATENT_CLAIMS.md              Full patent claims file (20 claims)
├── REPRODUCE.md                  Step-by-step reproduction guide
├── CONTRIBUTING.md               Contribution guide
├── LICENSE                       Apache 2.0
├── requirements.txt              Pinned deps (torch 2.11+cu128, transformers 4.57)
├── pyproject.toml                Package metadata
├── demo.py                       Interactive teacher-vs-student demo
├── serve.py                      Minimal inference server
├── ultracompress.py              CLI entry point
│
├── ultracompress/                Core library (FractalModel, pipeline, coding)
├── scaling/                      Cross-model teacher loaders (Qwen3 family)
├── lib/                          Shared utilities
├── tools/                        Model download, quantization utilities
├── tests/                        Regression tests
│
├── scripts/overlay/              ★ Track A — row-overlay (Claims 17-20)
│   ├── benchmark_head_to_head.py   Unified bnb + HQQ + ours harness
│   ├── _analyze_claim20.py         Claim-20 merge + summary generator
│   ├── lambada_overlay*.py         Overlay drivers (sparse / fp8 / mixed)
│   ├── fit_v17_hifi.py             v17 weight-row fit driver
│   ├── pack_all_v17.py, pack_v17.py, verify_all_v17.py
│   └── ...
├── scripts/frr/                  Track B — FRR architectural compression
│   ├── run_hq4_ceiling_break.py    Flagship HQ4 trainer
│   ├── launch_hq{4,5,6,7}_*.py     Windows detached dual-GPU launchers
│   ├── hires_eval.py               Held-out eval driver
│   └── ...
│
├── results/                      All measurement JSONs (indexed by claim)
├── logs/                         Run logs (indexed by claim)
├── archive/                      Obsolete compress_v8..v18 iteration scripts
└── docs/                         Paper, patent drafts, pitch, claim figures

Key Findings

Functional similarity enables weight sharing. Adjacent layers have CKA > 0.9 despite zero weight cosine similarity.
FRR is Pareto-optimal across 311–2200× compression. Quality degrades gracefully (−0.8 to −2.6 pp last-T10 at 734× vs. baseline).
Hard-token focus beats easy-token focus. HQ3's confidence-weighted loss plateaued at T1=54.2%; HQ4's inverted weighting broke through to 55.7% peak / 69.6% all-T10.
Latent alignment is an on-ramp, not a destination. Keeping latent_w = 1.0 throughout training caps quality; decaying it after step 20K lets the output-space signal dominate and breaks the ceiling.
ASVD head + FRR body compose cleanly. 92.57% T10 head + 68% T10 body predicts a joint end-to-end quality ceiling that has not yet been measured on a unified stack — this is the next milestone.
Reproducibility. All 80K-step runs reproduce to within ±1.5 pp on identical seeds (validated across HQ3 → HQ4 → HQ5).

Competitive Position

Method	Year	Arch. compression	Approach
GPTQ / AWQ	2023	4–8×	Post-training quantization
SparseGPT	2023	2–4×	Unstructured pruning
Relaxed Recursive (Google)	2025	~2×	Shared block + LoRA
Ouroboros V2	2026	~2×	Controller hypernetwork
UltraCompress FRR (HQ4)	2026	311–734×	Fractal recursive block + entropy-aware distillation

Stacked with Q2 + entropy coding, the total compression reaches ~7,500× on quantized weights.

Projection: 100T-parameter model on a single GPU

Stack	100T-param size	Compression ratio
FRR 311× + Q2 + entropy	≈ 12 GB	≈ 8,300×
FRR 734× + Q2 + entropy	≈ 5 GB	≈ 20,000×

These are architectural projections; the 734× FRR body has been trained end-to-end; Q2 + entropy coding have been validated at pipeline scope on Qwen3-0.6B (959× total, 35% T1 / 53% T10).

Citation

@misc{ultracompress2026,
  title  = {Fractal Residual Recursion: Extreme Transformer Compression
            via Shared Recursive Blocks},
  author = {Mounir},
  year   = {2026},
  url    = {https://github.com/mounnar/ultracompress}
}

Status & Contact

Active development — see HQ5 and docs/STATUS.md for the latest training run.
Full result write-ups in docs/HQ3_RESULTS.md, docs/HQ4_RESULTS.md.
Paper draft: docs/PAPER_DRAFT.md. Patent draft: docs/PATENT_DRAFT.md.

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.3

May 12, 2026

0.6.2

May 11, 2026

0.6.1

May 11, 2026

0.6.0

May 11, 2026

0.5.6

May 11, 2026

0.5.5

May 9, 2026

0.5.4

May 9, 2026

0.5.3

May 9, 2026

0.5.2

May 8, 2026

0.5.1

May 8, 2026

This version

0.5.0 yanked

May 8, 2026

Reason this release was yanked:

Broken import in 0.5.0 (track_a_adaptive missing). Use 0.5.1 or later.

0.4.0

May 4, 2026

0.1.3

Apr 29, 2026

0.1.2

Apr 28, 2026

0.1.0 yanked

Apr 26, 2026

Reason this release was yanked:

Superseded by 0.1.2 with corrected package metadata. Please install 0.1.2 or later.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultracompress-0.5.0.tar.gz (316.8 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ultracompress-0.5.0-py3-none-any.whl (358.1 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file ultracompress-0.5.0.tar.gz.

File metadata

Download URL: ultracompress-0.5.0.tar.gz
Upload date: May 8, 2026
Size: 316.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for ultracompress-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`a2c2aae62d1697c16757984470e663a081284833246c363be1f8ec1d0da06c85`
MD5	`92b202fff8dff49e6f72f7d0fe5e88b2`
BLAKE2b-256	`2ed1939c1d2335db00918edb8c5f4681551267a51b3e9cb4c3944bf9af1bd72b`

See more details on using hashes here.

File details

Details for the file ultracompress-0.5.0-py3-none-any.whl.

File metadata

Download URL: ultracompress-0.5.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 358.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for ultracompress-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eb0d559cc9d0ee9346ae75e0ec66f3b9a12bec7cd2c5e6b0a3ec78d756f0ed74`
MD5	`1ac2f12d7b4fef776c2410d983636a9d`
BLAKE2b-256	`2dde673d1ae6ea1e3d531516f6df37bd567dc8e18d2dc40c0da2f327a5f2c2cb`

See more details on using hashes here.

ultracompress 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

UltraCompress

⭐ Latest — Streaming compression: full Qwen scaling curve, 72B on a single GPU (2026-05-04)

Latest — Claim 20: Row-overlay vs external quantizers (n=500, 6-model cohort)

Track B — FRR architectural compression (held-out, 1000 samples, seed 42)

Rigor & reproducibility

15-minute interactive demo

Training Results (per-run training-eval ceilings, 80K steps each)

ASVD head fine-tuning (trained separately — stackable with FRR body)

⭐ End-to-end stack: FRR body + ASVD head combined (1000 samples, seed 42)

Pareto frontier — pick your operating point

Cross-model generality (scaling the method)

How It Works

Training Objective — the HQ4/HQ5 Ceiling Break

Experiment Timeline

Quick Start

Resume support

Detached training on Windows

Repository Layout

Key Findings

Competitive Position

Projection: 100T-parameter model on a single GPU

Citation

Status & Contact

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes