Extreme compression for large language models. Download pre-compressed models from Hugging Face Hub; self-compress support coming soon.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sipsa

These details have not been verified by PyPI

Project links

Homepage

Project description

UltraCompress

Extreme compression for large language models. Patent pending — USPTO 64/049,511 + 64/049,517

Run language models on less hardware than they were supposed to need.

UltraCompress is the patent-pending compression infrastructure for transformer language models. The Track A method targets sub-3 bits per weight — ~30% smaller than bitsandbytes NF4 with zero catastrophic failures on a 6-model head-to-head cohort in our internal benchmark. The CLI is shipped on PyPI today; pre-compressed reference models roll out on Hugging Face Hub through April–May 2026.

Who this is for

Engineers running into the 4-bits-per-weight cliff that public methods (bitsandbytes, GPTQ, AWQ, HQQ) fall off below 4 bpw
Product teams targeting on-device deployment — phones, cars, robots, embedded systems
Inference platforms whose margins are GPU-memory-bound at scale
Hardware partners (chip vendors, OEMs) evaluating compression infrastructure for licensing

v0.1 alpha: pre-compressed reference models are uploading to Hugging Face Hub throughout April–May 2026. Run uc list for the live catalog. Examples below show expected post-launch usage.

Install

pip install ultracompress

Quickstart

# Today: scripted demo (no Hub artifacts required)
uc demo

# Today: query the live HF Hub catalog (returns "No pre-compressed models
# published yet" until the first rolling-release artifact lands)
uc list

# Post-artifact example usage (works once an artifact is on the Hub):
uc pull sipsalabs/<model-id>
uc info ./models/<model-id>
uc bench ./models/<model-id> --tasks hellaswag --limit 500

What's available today (v0.1 — alpha)

The CLI itself is shipped on PyPI. The Hugging Face Hub catalog is rolling out through April–May 2026; until the first reference compressed model lands, uc list against the live Hub returns "No pre-compressed models published yet."

uc demo — scripted CLI demo for screen recording (works without any Hub artifacts).
uc list — query the live sipsalabs collection on the Hugging Face Hub. Returns the actual current catalog; expect "no models published yet" until the first rolling-release artifact lands.
uc pull <model-id> — download a pre-compressed model when one is available on the Hub.
uc info <path> — inspect the compression metadata of an already-downloaded artifact.
uc bench <path> --tasks <list> — run downstream benchmarks via lm-eval-harness on a downloaded artifact.

What's coming (v0.2 — Q3 2026)

uc compress <hf-model-id> --bpw 2.8 — self-compression (gated on patent prosecution timeline).
uc serve <path> — inference server with OpenAI-compatible API.
uc export --format gguf — export to llama.cpp GGUF format.
uc export --format coreml — export to Apple CoreML for on-device inference.

Why UltraCompress

The 4-bit-per-weight cliff

Every public LLM compression method (bitsandbytes, GPTQ, AWQ, HQQ) is stable at and above 4 bits per weight. Below 4 bpw, model quality falls off a cliff — most methods produce models whose downstream-task accuracy collapses to near-random. We measure this with a T_cat threshold; on a 6-model cohort, public sub-3-bpw methods produce catastrophic failures on the majority of the cohort.

UltraCompress doesn't.

Track A — post-training row-overlay quantization (USPTO 64/049,511) — shipping now

On a 6-model × 8-method × 500-sample head-to-head benchmark:

Method	Bits per weight	Cohort median T1 retention	Catastrophic failures
bitsandbytes int8	8.000	99.75%	0/6
bitsandbytes nf4	4.000	98.31%	0/6
HQQ 4-bit g64	4.500	97.72%	0/6
UltraCompress 2.8 bpw	2.798	95.63%	0/6
HQQ 3-bit g64	3.500	72.46%	1/6
HQQ 2-bit g64	2.500	3.46%	6/6

Top-k retention curves (top-1, top-10, top-32, top-64, top-128, top-256) will ship in the per-model card on each artifact's Hugging Face Hub repository as the reference compressed models roll out through April–May 2026. T1 alone is the wrong metric for autocomplete, candidate generation, or RAG re-ranking — most customer use cases care about top-k structure.

Track B — Fractal Residual Recursion (USPTO 64/049,517) — v0.2 (Q3 2026)

Architectural compression beyond published academic ratios for transformer language models. Combined with Track A on the v0.2 stack: the strongest end-to-end ratio we've measured for transformer language model architectures in our cohort. Gated on patent prosecution timing.

Track B evidence is separate from Track A shipping artifacts; see docs/evidence/matrix.md for Track B detail. Do not combine retention numbers across tracks as a single quality curve.

Patent status

The UltraCompress compression methods are the subject of pending U.S. patent applications. Pre-compressed models are distributed under a separate licensing arrangement described in LICENSE. The CLI code in this repository is Apache-2.0.

Reporting issues, security, and commercial inquiries

Bugs and feature requests: open an issue.
Security vulnerabilities: see SECURITY.md — report privately to security@sipsalabs.com.
Commercial / design-partner / pilot inquiries: founder@sipsalabs.com.
Patent / licensing: legal@sipsalabs.com.

Contributing: see CONTRIBUTING.md. Changes that touch packaging, CI, docs, and the public CLI surface are very welcome. Pull requests adding the proprietary compression methods will be closed.

Citation

@misc{sipsalabs2026ultracompress,
  title        = {UltraCompress: Extreme Compression for Large Language Models},
  author       = {{Sipsa Labs, Inc.}},
  year         = {2026},
  note         = {U.S.\ patent applications 64/049,511 and 64/049,517, patent pending},
  howpublished = {\url{https://sipsalabs.com}}
}

About

UltraCompress is built by Sipsa Labs — a research lab spanning Systems · Intelligence · Precision.

Patent pending — USPTO 64/049,511 + 64/049,517.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sipsa

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.3

Apr 29, 2026

0.1.2

Apr 28, 2026

0.1.0 yanked

Apr 26, 2026

Reason this release was yanked:

Superseded by 0.1.2 with corrected package metadata. Please install 0.1.2 or later.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultracompress-0.1.3.tar.gz (70.6 kB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ultracompress-0.1.3-py3-none-any.whl (21.5 kB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file ultracompress-0.1.3.tar.gz.

File metadata

Download URL: ultracompress-0.1.3.tar.gz
Upload date: Apr 29, 2026
Size: 70.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ultracompress-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`01e90dbf234f8967f9fcf956af78d3b9ab58e0cce869e1e8d23d5312e16d5ea7`
MD5	`a0f53ece9191a2923c212bba1bf4b642`
BLAKE2b-256	`f1594b3ae3f86c64ed96cf692e3c0c52e84bad17794ac7d3bcaef26392b0a65a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ultracompress-0.1.3.tar.gz:

Publisher: ci.yml on sipsalabs/ultracompress

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ultracompress-0.1.3.tar.gz
- Subject digest: 01e90dbf234f8967f9fcf956af78d3b9ab58e0cce869e1e8d23d5312e16d5ea7
- Sigstore transparency entry: 1399205808
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: sipsalabs/ultracompress@119078d3b7b113c7ac051abab2a53aeddcbe28b3
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/sipsalabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@119078d3b7b113c7ac051abab2a53aeddcbe28b3
- Trigger Event: push

File details

Details for the file ultracompress-0.1.3-py3-none-any.whl.

File metadata

Download URL: ultracompress-0.1.3-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 21.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ultracompress-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ebebfaf8bf5007368dbc0805f78977dfdd522a84b16df6f65bfb185f1ea317d5`
MD5	`1d6e6128f4be7c83ca85de3992bcf63a`
BLAKE2b-256	`6e587346242ae28a81167a75f6a8d6aa5d70907b79e784198e784f05c3e54c68`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ultracompress-0.1.3-py3-none-any.whl:

Publisher: ci.yml on sipsalabs/ultracompress

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ultracompress-0.1.3-py3-none-any.whl
- Subject digest: ebebfaf8bf5007368dbc0805f78977dfdd522a84b16df6f65bfb185f1ea317d5
- Sigstore transparency entry: 1399205810
- Sigstore integration time: Apr 29, 2026
Source repository:
- Permalink: sipsalabs/ultracompress@119078d3b7b113c7ac051abab2a53aeddcbe28b3
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/sipsalabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@119078d3b7b113c7ac051abab2a53aeddcbe28b3
- Trigger Event: push

ultracompress 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

UltraCompress

Who this is for

Install

Quickstart

What's available today (v0.1 — alpha)

What's coming (v0.2 — Q3 2026)

Why UltraCompress

The 4-bit-per-weight cliff

Track A — post-training row-overlay quantization (USPTO 64/049,511) — shipping now

Track B — Fractal Residual Recursion (USPTO 64/049,517) — v0.2 (Q3 2026)

Patent status

Reporting issues, security, and commercial inquiries

Citation

About

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance