Lossless 5-bit transformer compression — 20 PPL-verified architectures (19 transformer + 1 state-space model with comparator-note caveat) across 22 packs shipped (0.6B–405B; dense + MoE + SSM). Public CLI does pack structure + download-integrity checks; the codec is patent-pending and not distributed. BUSL-1.1 + Additional Use Grant.

These details have not been verified by PyPI

Project links

Project description

UltraCompress

Lossless 5-bit transformer compression. Published model artifacts are bit-identical to their bf16 reference.

v0.6.18: the public package is intentionally minimal — a small, dependency-free CLI that lets you (a) generate text against a Sipsa- hosted compressed model in 30 seconds (uc try), (b) browse the full catalog with tiers and PPL ratios (uc catalog), and (c) verify pack structure and download integrity (uc verify) on any pack you download from HuggingFace. It contains no compression or reconstruction code: that methodology is patent-pending and is not distributed. Bit-identical reconstruction verification of a pack is performed by Sipsa Labs under engagement.

Hermes-3-Llama-3.1-405B compressed at 5 bpw lossless: 1.0066x PPL ratio vs streaming bf16 teacher (5.0692 / 5.0358, n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). A 405B-class transformer compressed end-to-end on a single 32 GB consumer GPU.

UltraCompress takes a transformer at fp16/bf16 and produces a 5-bit pack that reconstructs bit-identically to the reference bf16 checkpoint — not "1% PPL drift on WikiText," but a deterministic reconstruction. That is the honest definition of lossless we care about: an auditor can re-derive every weight from the pack, and Sipsa Labs verifies that bit-identity under engagement. The codec is patent-pending.

It exists because the bf16-equivalent quality bar matters in places where "good enough on MMLU" isn't enough — defense, FDA-regulated healthcare, SR 11-7 model validation, internal red-team eval at frontier labs. And as a side-effect of the streaming compression path, it lets us put a 405B-parameter model through a single 32 GB consumer GPU without renting an H100 cluster.

We're a small lab shipping this in public while the patents are pending. Most days the lab notebook gets longer than the marketing site does.

Regulated AI deployment? Phase 0 POC is $5K / 5 business days / customer-picked model — full details in Who this is for below. Direct: founder@sipsalabs.com. Verticals: healthcare · defense · legal · quant.

Quick start (30 seconds, no GPU, no signup)

pip install ultracompress
uc try sipsa-qwen3-0.6b

That prints a recorded reference response from our 5-bit-compressed Qwen3-0.6B pack plus the compression numbers, and points you at the next step. With a free key from sipsalabs.com/get-access (60-second signup), the same command goes live against api.sipsalabs.com and streams real output from whichever compressed model you pick.

uc catalog

Lists all 20 PPL-verified architectures (19 transformer + 1 state-space model with comparator-note caveat) with their published PPL ratios and tier (free / request / POC).

The public CLI (what `pip install` gives you)

uc try [model]         generate text against a Sipsa-hosted compressed model
uc catalog             list the full compressed-model catalog + tiers
uc verify <pack_dir>   pack structure + download-integrity self-check
uc info                what this package is + links/contact
uc version             print version

uc try calls api.sipsalabs.com/v1/chat/completions when you pass --key sk-sps-... or set $SIPSA_API_KEY; without a key, it prints a recorded reference response so you see what compressed output looks like without signup.

uc verify confirms a downloaded pack is well-formed (manifest present and parseable, declared layer count matches the files on disk, no zero-byte layers) and prints a stable SHA-256 pack fingerprint so you can confirm you hold a byte-identical download, or compare against a fingerprint we publish out of band. It does not reconstruct weights and contains no codec knowledge by design.

hf download SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5 --local-dir ./pack
uc verify ./pack

bpw:             5
layer files:     28
SHA-256 (spot-check; use --full for all):
  manifest.json:f3a1…
  layer_000.uc:7c2b…
  layer_014.uc:9d4f…
  layer_027.uc:1ab8…
pack fingerprint (sha256 of sorted file digests):
  4e9c… (64 hex)

→ STRUCTURE OK — pack is well-formed; fingerprint above is the
  download-integrity reference. This is NOT a reconstruction proof;
  bit-identical reconstruction verification is provided by Sipsa
  Labs under engagement (founder@sipsalabs.com).

Full bit-identical reconstruction verification (and PPL re-evaluation against the bf16 baseline) is an auditor-grade deliverable Sipsa Labs runs with you under engagement — it is deliberately not shipped in the public package.

What's verified (with JSON receipts)

20 architectures independently PPL-verified end-to-end (0.6B → 405B, dense + MoE + SSM) against each model's own bf16 baseline on the FineWeb-edu held-out tail at seq_len=1024, seed=42. 19 are transformer architectures; the 20th is Mamba-2.8B (state-space model) at 1.00593× canonical PPL, with an explicit comparator-note caveat in the registry: our canonical transformer pipeline (RoPE / attention masks / KV-cache semantics) is architecture-incompatible with SSMs, so the Mamba record uses an SSM-compatible comparator that matches what's in the HF pack. Additional architectures (DeepSeek-32B, Llama-3.1-70B, others) are compressed and SHA-256-verified but their PPL numbers are pending canonical re-eval before formal registry entry. Every published number traces to a published result JSON. A small set of packs is publicly downloadable; the full catalog is available to customers under engagement.

Model	Params	Class	PPL ratio	HF artifact	Status
Hermes-3-Llama-3.1-405B	405B	405B-class lossless on a single 32 GB consumer GPU	1.0066	`SipsaLabs/hermes-3-llama-3.1-405b-uc-v3-bpw5`	live
Mistral-7B-v0.3	7.2B	sub-0.6% drift	1.00548	`SipsaLabs/mistral-7b-v0.3-uc-v3-bpw5`	live
Qwen3-1.7B-Base	1.7B	sub-0.5% drift	1.00401	`SipsaLabs/qwen3-1.7b-base-uc-v3-bpw5`	live
Qwen3-14B	14.0B	sub-0.5% drift	1.00403	`SipsaLabs/qwen3-14b-uc-v3-bpw5`	live
Qwen3-8B	8.0B	sub-0.5% drift	1.00440	`SipsaLabs/qwen3-8b-uc-v3-bpw5`	live
Mixtral-8x7B-v0.1 (MoE)	47B (13B active)	sub-0.5% drift	1.00368	`SipsaLabs/mixtral-8x7b-v0.1-uc-v3-bpw5`	live
Phi-3-mini-4k-instruct	3.8B	sub-0.3% drift (seq_len=128, not apples-to-apples)	1.00262	`SipsaLabs/phi-3-mini-4k-instruct-uc-v3-bpw5`	live

Hermes-3-405B is the headline. The 1.0066x ratio is 5.0692 / 5.0358 — both halves measured under the same per-layer streaming reconstruction comparator (n=50, seq_len=1024, FineWeb-edu held-out tail, seed=42). The bf16 teacher took 7.7 hours on cuda:1; the 5-bpw pack took 14.3 hours. The Mistral-7B 1.00548× row is the tightest dense 7B-class lossless 5-bit ratio we currently publish.

SSM result: Mamba-2.8B compressed with SHA-256 bit-identical reconstruction verified — first public lossless 5-bit canonical-PPL result on a state-space model that we know of, at 1.00593× canonical ratio. Counted as the 20th verified architecture with an explicit comparator-note caveat in the registry: our canonical transformer pipeline (RoPE / attention masks / KV-cache semantics that don't apply to SSMs) is architecture-incompatible, so the Mamba record uses an SSM-compatible comparator that matches what's in the HF pack. The comparator is documented in the registry.
HuggingFace: a small public verification set under huggingface.co/SipsaLabs; full catalog under engagement.
PyPI: pypi.org/project/ultracompress.

What doesn't work yet

Things people sometimes assume work because the rest of it does. They don't, and we'd rather you know:

Long-context evaluation past seq_len=1024. Every PPL number above is at seq_len=1024 on the FineWeb-edu held-out tail. We have not yet run controlled evals at 4K/8K/32K context.
State-space models past the current SSM pack. Mamba-2.8B ships + SHA-256-verified + canonical PPL claimed at 1.00593× (with comparator-note caveat documented in registry; our canonical transformer pipeline is architecture-incompatible with SSMs). We tried two tighter paths on top — both made it worse.
TinyLlama-1.1B-Chat PPL eval. The pack itself is well-formed and the HF artifact uploaded, but the PPL eval forward pass throws a CUDA device-side assert that we haven't traced yet. Shown as deferred, not a fabricated number.
Qwen3-32B and Llama-3.1-70B PPL ratios. Both have stale or suspect baseline PPL numbers we won't republish. Apples-to-apples re-evals are queued.
Below 1.0040× on Qwen3-1.7B-Base. This is our tightest dense floor; we tried 5 different paths to break it. Three were within noise; two were catastrophic regressions. 1.0040× stands as the empirical floor at the current configuration.

Why this isn't AWQ / GPTQ / EXL3

Every other 4–5 bit compression library targets a quality threshold ("sub-1% PPL on WikiText"). UltraCompress targets a reconstruction contract: the published artifact reconstructs bit-identically to the reference bf16 checkpoint. Codec internals are patent-pending and deliberately not described here.

This matters when "the model picks a slightly-wrong variable name" is a regulatory finding rather than a cosmetic complaint. Defense / aerospace deploy-bit-exactness is a compliance requirement. FDA-regulated healthcare AI requires model equivalence between dev and deploy. SR 11-7 (Federal Reserve model validation) requires reproducible audit recovery.

For pure-throughput inference on a fixed prompt distribution that matches your AWQ calibration set, with no downstream fine-tuning, AWQ at 4 bpw on vLLM is genuinely fine and we'll say so on a sales call.

As of mid-2026 we are not aware of another published library targeting a bit-identical reconstruction contract (as opposed to a PPL-threshold) for 5-bit transformer compression on the public HuggingFace Hub. If you find one, tell us — we'd rather benchmark against it than claim a gap that isn't there.

Honest negative results

Most projects hide their failures. We catalogue them at the same level of detail as the wins.

An initialization shortcut we tried — made PPL 0.07 pp WORSE on Mamba and was discarded. Method specifics withheld (patent-pending).
A multi-pass variant we hypothesized would help — produced a catastrophic 13.7× regression vs. the single-pass baseline. CLOSED.
Importing an AWQ-style pre-scaling step — produced a catastrophic +13% regression and was ruled out. CLOSED.
Pushing the training schedule past the current configuration — gained nothing (within noise). The floor stands.
"Base models compress tighter than instruct" hypothesis — refuted 2/3 of architectures. Dropped.

Detailed methodology for any specific failure is available to design partners under NDA.

Who this is for

If you serve LLMs in production and your VRAM bill is the constraint, this might help. It scales to a 405B-class model on a single 32 GB consumer GPU (the how is patent-pending). Email founder@sipsalabs.com with your stack and a target latency/quality bar; we'll tell you honestly whether UC fits.
If you're in a regulated domain (defense, FDA-regulated healthcare, SR 11-7 model validation, frontier lab red-team), the bit-identical reconstruction contract is the reason to talk to us. Phase 0 POC ($5K, 5 business days, customer-picked model) gets you a pack plus a Sipsa-run bit-identity + PPL audit you can review. Email founder@sipsalabs.com.

If your workload is "MMLU has to stay above X" and you're not pushing the model into long-tail or downstream-fine-tuning territory, AWQ at 4 bpw is probably a better answer than this. We'll say so.

We're a small company looking for design partners

Sipsa Labs is a small lab shipping in public. Our compression methods are patent-pending; details are in PATENT_NOTICE.md. The CLI source is BUSL-1.1 with an Additional Use Grant — free for companies under $1M ARR, research, and individuals, auto-converting to Apache 2.0 four years post-release. If you're building a derivative product whose core value depends on the underlying invention, email founder@sipsalabs.com.

Paid Phase 0 POC — founder@sipsalabs.com, $5K / 5 business days / customer-picked model. Deliverable: a pack plus a Sipsa-run bit-identity + PPL audit on your eval set.
GitHub Sponsors — github.com/sponsors/sipsalabs.
Press / commentary — press@sipsalabs.com.

License

Released under BUSL-1.1 with an Additional Use Grant (free for companies under $1M ARR, research, and individuals; auto-converts to Apache 2.0 four years post-release). See LICENSE.
The license grant does not extend to the patent-pending compression methodology that produces the artifacts. See PATENT_NOTICE.md.
Pre-compressed model artifacts on HuggingFace carry the upstream teacher model's license plus this project's patent terms.

Citation

@software{sipsa_ultracompress_2026,
  author = {{Sipsa Labs, Inc.}},
  title  = {UltraCompress: Lossless 5-bit Transformer Compression},
  year   = {2026},
  url    = {https://github.com/sipsalabs/ultracompress}
}

Contact

Commercial / Phase 0 POC: founder@sipsalabs.com
Security: security@sipsalabs.com
Press: press@sipsalabs.com
HuggingFace: huggingface.co/SipsaLabs
PyPI: pypi.org/project/ultracompress

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.27

Jun 5, 2026

0.6.26

Jun 5, 2026

0.6.25

May 30, 2026

0.6.24

May 28, 2026

0.6.22

May 27, 2026

0.6.21

May 27, 2026

0.6.20

May 27, 2026

0.6.19

May 26, 2026

This version

0.6.18

May 26, 2026

0.6.17

May 26, 2026

0.6.16

May 24, 2026

0.6.15

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultracompress-0.6.18.tar.gz (29.4 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ultracompress-0.6.18-py3-none-any.whl (26.7 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file ultracompress-0.6.18.tar.gz.

File metadata

Download URL: ultracompress-0.6.18.tar.gz
Upload date: May 26, 2026
Size: 29.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for ultracompress-0.6.18.tar.gz
Algorithm	Hash digest
SHA256	`6e66aa61eb8c572b17880e60db6a18343039fbfc404cb9f4df2266e7144ee111`
MD5	`e5b8bf2fe3350415f6dcf9a977b70d3e`
BLAKE2b-256	`1948dd7c794b1f6e0e3f40acf117d26c847e0d9c50b4a6ed469a67f816897c67`

See more details on using hashes here.

File details

Details for the file ultracompress-0.6.18-py3-none-any.whl.

File metadata

Download URL: ultracompress-0.6.18-py3-none-any.whl
Upload date: May 26, 2026
Size: 26.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for ultracompress-0.6.18-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d159037ce0b2dfda8e34993eabff68f23606b436579505e357f8bf72eb64bf2`
MD5	`37c085d04577d7f30daca29d41e7109a`
BLAKE2b-256	`29d8fb7aed7a6dca4f5fd3f63f45730827e1db9f59cadfbbd4ce17d2c29769a5`

See more details on using hashes here.

ultracompress 0.6.18

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

UltraCompress

Quick start (30 seconds, no GPU, no signup)

The public CLI (what `pip install` gives you)

What's verified (with JSON receipts)

What doesn't work yet

Why this isn't AWQ / GPTQ / EXL3

Honest negative results

Who this is for

We're a small company looking for design partners

License

Citation

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

ultracompress 0.6.18

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

UltraCompress

Quick start (30 seconds, no GPU, no signup)

The public CLI (what pip install gives you)

What's verified (with JSON receipts)

What doesn't work yet

Why this isn't AWQ / GPTQ / EXL3

Honest negative results

Who this is for

We're a small company looking for design partners

License

Citation

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The public CLI (what `pip install` gives you)