Lossless 5-bit transformer compression — 20 PPL-verified architectures (19 transformer + 1 state-space model with comparator-note caveat) across 22 packs shipped (0.6B–405B; dense + MoE + SSM). Public CLI does pack structure + download-integrity checks; the codec is patent-pending and not distributed. BUSL-1.1 + Additional Use Grant.
Project description
UltraCompress
Lossless 5-bit transformer compression. Published model artifacts are bit-identical to their bf16 reference.
v0.6.18: the public package is intentionally minimal — a small, dependency-free CLI that lets you (a) generate text against a Sipsa- hosted compressed model in 30 seconds (
uc try), (b) browse the full catalog with tiers and PPL ratios (uc catalog), and (c) verify pack structure and download integrity (uc verify) on any pack you download from HuggingFace. It contains no compression or reconstruction code: that methodology is patent-pending and is not distributed. Bit-identical reconstruction verification of a pack is performed by Sipsa Labs under engagement.
Hermes-3-Llama-3.1-405B compressed at 5 bpw lossless: 1.0066x PPL ratio vs streaming bf16 teacher (5.0692 / 5.0358, n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). A 405B-class transformer compressed end-to-end on a single 32 GB consumer GPU.
UltraCompress takes a transformer at fp16/bf16 and produces a 5-bit pack that reconstructs bit-identically to the reference bf16 checkpoint — not "1% PPL drift on WikiText," but a deterministic reconstruction. That is the honest definition of lossless we care about: an auditor can re-derive every weight from the pack, and Sipsa Labs verifies that bit-identity under engagement. The codec is patent-pending.
It exists because the bf16-equivalent quality bar matters in places where "good enough on MMLU" isn't enough — defense, FDA-regulated healthcare, SR 11-7 model validation, internal red-team eval at frontier labs. And as a side-effect of the streaming compression path, it lets us put a 405B-parameter model through a single 32 GB consumer GPU without renting an H100 cluster.
We're a small lab shipping this in public while the patents are pending. Most days the lab notebook gets longer than the marketing site does.
Regulated AI deployment? Phase 0 POC is $5K / 5 business days / customer-picked model — full details in Who this is for below. Direct:
founder@sipsalabs.com. Verticals: healthcare · defense · legal · quant.
Quick start (30 seconds, no GPU, no signup)
pip install ultracompress
uc try sipsa-qwen3-0.6b
That prints a recorded reference response from our 5-bit-compressed Qwen3-0.6B pack plus the compression numbers, and points you at the next step. With a free key from sipsalabs.com/get-access (60-second signup), the same command goes live against api.sipsalabs.com and streams real output from whichever compressed model you pick.
uc catalog
Lists all 20 PPL-verified architectures (19 transformer + 1 state-space model with comparator-note caveat) with their published PPL ratios and tier (free / request / POC).
The public CLI (what pip install gives you)
uc try [model] generate text against a Sipsa-hosted compressed model
uc catalog list the full compressed-model catalog + tiers
uc verify <pack_dir> pack structure + download-integrity self-check
uc info what this package is + links/contact
uc version print version
uc try calls api.sipsalabs.com/v1/chat/completions when you pass --key sk-sps-... or set $SIPSA_API_KEY; without a key, it prints a recorded reference response so you see what compressed output looks like without signup.
uc verify confirms a downloaded pack is well-formed (manifest present and parseable, declared layer count matches the files on disk, no zero-byte layers) and prints a stable SHA-256 pack fingerprint so you can confirm you hold a byte-identical download, or compare against a fingerprint we publish out of band. It does not reconstruct weights and contains no codec knowledge by design.
hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
uc verify ./pack
bpw: 5
layer files: 28
SHA-256 (spot-check; use --full for all):
manifest.json:f3a1…
layer_000.uc:7c2b…
layer_014.uc:9d4f…
layer_027.uc:1ab8…
pack fingerprint (sha256 of sorted file digests):
4e9c… (64 hex)
→ STRUCTURE OK — pack is well-formed; fingerprint above is the
download-integrity reference. This is NOT a reconstruction proof;
bit-identical reconstruction verification is provided by Sipsa
Labs under engagement (founder@sipsalabs.com).
Full bit-identical reconstruction verification (and PPL re-evaluation against the bf16 baseline) is an auditor-grade deliverable Sipsa Labs runs with you under engagement — it is deliberately not shipped in the public package.
What's verified (with JSON receipts)
20 architectures independently PPL-verified end-to-end (0.6B → 405B, dense + MoE + SSM) against each model's own bf16 baseline on the FineWeb-edu held-out tail at seq_len=1024, seed=42. 19 are transformer architectures; the 20th is Mamba-2.8B (state-space model) at 1.00593× canonical PPL, with an explicit comparator-note caveat in the registry: our canonical transformer pipeline (RoPE / attention masks / KV-cache semantics) is architecture-incompatible with SSMs, so the Mamba record uses an SSM-compatible comparator that matches what's in the HF pack. Additional architectures (DeepSeek-32B, Llama-3.1-70B, others) are compressed and SHA-256-verified but their PPL numbers are pending canonical re-eval before formal registry entry. Every published number traces to a published result JSON. A small set of packs is publicly downloadable; the full catalog is available to customers under engagement.
| Model | Params | Class | PPL ratio | HF artifact | Status |
|---|---|---|---|---|---|
| Hermes-3-Llama-3.1-405B | 405B | 405B-class lossless on a single 32 GB consumer GPU | 1.0066 | SipsaLabs/hermes-3-llama-3.1-405b-uc-v3-bpw5 |
live |
| Mistral-7B-v0.3 | 7.2B | sub-0.6% drift | 1.00548 | SipsaLabs/mistral-7b-v0.3-uc-v3-bpw5 |
live |
| Qwen3-1.7B-Base | 1.7B | sub-0.5% drift | 1.00401 | SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 |
live |
| Qwen3-14B | 14.0B | sub-0.5% drift | 1.00403 | SipsaLabs/qwen3-14b-uc-v3-bpw5 |
live |
| Qwen3-8B | 8.0B | sub-0.5% drift | 1.00440 | SipsaLabs/qwen3-8b-uc-v3-bpw5 |
live |
| Mixtral-8x7B-v0.1 (MoE) | 47B (13B active) | sub-0.5% drift | 1.00368 | SipsaLabs/mixtral-8x7b-v0.1-uc-v3-bpw5 |
live |
| Phi-3-mini-4k-instruct | 3.8B | sub-0.3% drift (seq_len=128, not apples-to-apples) | 1.00262 | SipsaLabs/phi-3-mini-4k-instruct-uc-v3-bpw5 |
live |
Hermes-3-405B is the headline. The 1.0066x ratio is 5.0692 / 5.0358 — both halves measured under the same per-layer streaming reconstruction comparator (n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). The bf16 teacher took 7.7 hours on cuda:1; the 5-bpw pack took 14.3 hours. The Mistral-7B 1.00548× row is the tightest dense 7B-class lossless 5-bit ratio we currently publish.
- SSM result: Mamba-2.8B compressed with SHA-256 bit-identical reconstruction verified — first public lossless 5-bit canonical-PPL result on a state-space model that we know of, at 1.00593× canonical ratio. Counted as the 20th verified architecture with an explicit comparator-note caveat in the registry: our canonical transformer pipeline (RoPE / attention masks / KV-cache semantics that don't apply to SSMs) is architecture-incompatible, so the Mamba record uses an SSM-compatible comparator that matches what's in the HF pack. The comparator is documented in the registry.
- HuggingFace: a small public verification set under
huggingface.co/SipsaLabs; full catalog under engagement. - PyPI: pypi.org/project/ultracompress.
What doesn't work yet
Things people sometimes assume work because the rest of it does. They don't, and we'd rather you know:
- Long-context evaluation past seq_len=1024. Every PPL number above is at seq_len=1024 on the FineWeb-edu held-out tail. We have not yet run controlled evals at 4K/8K/32K context.
- State-space models past the current SSM pack. Mamba-2.8B ships + SHA-256-verified + canonical PPL claimed at 1.00593× (with comparator-note caveat documented in registry; our canonical transformer pipeline is architecture-incompatible with SSMs). We tried two tighter paths on top — both made it worse.
- TinyLlama-1.1B-Chat PPL eval. The pack itself is well-formed and the HF artifact uploaded, but the PPL eval forward pass throws a CUDA device-side assert that we haven't traced yet. Shown as deferred, not a fabricated number.
- Qwen3-32B and Llama-3.1-70B PPL ratios. Both have stale or suspect baseline PPL numbers we won't republish. Apples-to-apples re-evals are queued.
- Below 1.0040× on Qwen3-1.7B-Base. This is our tightest dense floor; we tried 5 different paths to break it. Three were within noise; two were catastrophic regressions. 1.0040× stands as the empirical floor at the current configuration.
Why this isn't AWQ / GPTQ / EXL3
Every other 4–5 bit compression library targets a quality threshold ("sub-1% PPL on WikiText"). UltraCompress targets a reconstruction contract: the published artifact reconstructs bit-identically to the reference bf16 checkpoint. Codec internals are patent-pending and deliberately not described here.
This matters when "the model picks a slightly-wrong variable name" is a regulatory finding rather than a cosmetic complaint. Defense / aerospace deploy-bit-exactness is a compliance requirement. FDA-regulated healthcare AI requires model equivalence between dev and deploy. SR 11-7 (Federal Reserve model validation) requires reproducible audit recovery.
For pure-throughput inference on a fixed prompt distribution that matches your AWQ calibration set, with no downstream fine-tuning, AWQ at 4 bpw on vLLM is genuinely fine and we'll say so on a sales call.
As of mid-2026 we are not aware of another published library targeting a bit-identical reconstruction contract (as opposed to a PPL-threshold) for 5-bit transformer compression on the public HuggingFace Hub. If you find one, tell us — we'd rather benchmark against it than claim a gap that isn't there.
Honest negative results
Most projects hide their failures. We catalogue them at the same level of detail as the wins.
- An initialization shortcut we tried — made PPL 0.07 pp WORSE on Mamba and was discarded. Method specifics withheld (patent-pending).
- A multi-pass variant we hypothesized would help — produced a catastrophic 13.7× regression vs. the single-pass baseline. CLOSED.
- Importing an AWQ-style pre-scaling step — produced a catastrophic +13% regression and was ruled out. CLOSED.
- Pushing the training schedule past the current configuration — gained nothing (within noise). The floor stands.
- "Base models compress tighter than instruct" hypothesis — refuted 2/3 of architectures. Dropped.
Detailed methodology for any specific failure is available to design partners under NDA.
Who this is for
- If you serve LLMs in production and your VRAM bill is the constraint, this might help. It scales to a 405B-class model on a single 32 GB consumer GPU (the how is patent-pending). Email
founder@sipsalabs.comwith your stack and a target latency/quality bar; we'll tell you honestly whether UC fits. - If you're in a regulated domain (defense, FDA-regulated healthcare, SR 11-7 model validation, frontier lab red-team), the bit-identical reconstruction contract is the reason to talk to us. Phase 0 POC ($5K, 5 business days, customer-picked model) gets you a pack plus a Sipsa-run bit-identity + PPL audit you can review. Email
founder@sipsalabs.com.
If your workload is "MMLU has to stay above X" and you're not pushing the model into long-tail or downstream-fine-tuning territory, AWQ at 4 bpw is probably a better answer than this. We'll say so.
We're a small company looking for design partners
Sipsa Labs is a small lab shipping in public. Our compression methods are patent-pending; details are in PATENT_NOTICE.md. The CLI source is BUSL-1.1 with an Additional Use Grant — free for companies under $1M ARR, research, and individuals, auto-converting to Apache 2.0 four years post-release. If you're building a derivative product whose core value depends on the underlying invention, email founder@sipsalabs.com.
- Paid Phase 0 POC —
founder@sipsalabs.com, $5K / 5 business days / customer-picked model. Deliverable: a pack plus a Sipsa-run bit-identity + PPL audit on your eval set. - GitHub Sponsors — github.com/sponsors/sipsalabs.
- Press / commentary —
press@sipsalabs.com.
License
- Released under BUSL-1.1 with an Additional Use Grant (free for companies under $1M ARR, research, and individuals; auto-converts to Apache 2.0 four years post-release). See
LICENSE. - The license grant does not extend to the patent-pending compression methodology that produces the artifacts. See
PATENT_NOTICE.md. - Pre-compressed model artifacts on HuggingFace carry the upstream teacher model's license plus this project's patent terms.
Citation
@software{sipsa_ultracompress_2026,
author = {{Sipsa Labs, Inc.}},
title = {UltraCompress: Lossless 5-bit Transformer Compression},
year = {2026},
url = {https://github.com/sipsalabs/ultracompress}
}
Contact
- Commercial / Phase 0 POC:
founder@sipsalabs.com - Security:
security@sipsalabs.com - Press:
press@sipsalabs.com - HuggingFace:
huggingface.co/SipsaLabs - PyPI:
pypi.org/project/ultracompress
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ultracompress-0.6.18.tar.gz.
File metadata
- Download URL: ultracompress-0.6.18.tar.gz
- Upload date:
- Size: 29.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e66aa61eb8c572b17880e60db6a18343039fbfc404cb9f4df2266e7144ee111
|
|
| MD5 |
e5b8bf2fe3350415f6dcf9a977b70d3e
|
|
| BLAKE2b-256 |
1948dd7c794b1f6e0e3f40acf117d26c847e0d9c50b4a6ed469a67f816897c67
|
File details
Details for the file ultracompress-0.6.18-py3-none-any.whl.
File metadata
- Download URL: ultracompress-0.6.18-py3-none-any.whl
- Upload date:
- Size: 26.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d159037ce0b2dfda8e34993eabff68f23606b436579505e357f8bf72eb64bf2
|
|
| MD5 |
37c085d04577d7f30daca29d41e7109a
|
|
| BLAKE2b-256 |
29d8fb7aed7a6dca4f5fd3f63f45730827e1db9f59cadfbbd4ce17d2c29769a5
|