Lossless AI model compression — make any model 34% smaller with bit-identical weights, drop-in replacement for HuggingFace from_pretrained

These details have not been verified by PyPI

Project links

Project description

BigSmall — Lossless AI Model Compression

Make any AI model ~34% smaller. Bit-identical weights. Drop-in replacement for from_pretrained.

pip install bigsmall

A 14 GB Mistral-7B becomes 9.3 GB. A fine-tuned model becomes a 5 GB patch on top of its 14 GB base. The decompressed model is every weight bit-for-bit identical to the original — md5-verified on every tensor.

~34% smaller	~65% smaller as a delta patch	25+ ready-to-use models
any BF16 LLM	fine-tunes vs their base	on HuggingFace

What BigSmall does

Three use cases. Pick the one that fits.

1. Make any model smaller

bigsmall compress mistral-7b/ -o mistral-7b.bs
bigsmall decompress mistral-7b.bs -o mistral-7b-restored/

Before: 14.2 GB of safetensors. After: 9.3 GB .bs file. Saved: 4.9 GB (34%).

Every weight is bit-for-bit identical. Every calculation the model does is identical to the original. Works on any safetensors model — LLMs, diffusion, audio, vision, anything.

2. Store fine-tunes as tiny patches

bigsmall compress qwen-instruct/ --delta-from qwen-base/ -o instruct.bs
bigsmall apply qwen-base/ instruct.bs -o qwen-instruct-restored/

Before: 14.2 GB Qwen2.5-7B-Instruct. After: ~5 GB patch. Saved: 9 GB (65%).

If your users already have the public base model, they only need to download what changed. This is the biggest win in BigSmall. Use it for any fine-tune: instruction tuning, DPO, RLHF, domain adaptation, LoRA-merged checkpoints.

3. Download smaller, use instantly

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "wpferrell/phi-3.5-mini-instruct-bigsmall"
)

Works exactly like a normal HuggingFace model — BigSmall decompresses transparently on load. 25+ pre-compressed models ready to use (browse them all).

Prefer the CLI?

bigsmall decompress wpferrell/phi-3.5-mini-instruct-bigsmall -o phi-3.5-mini/

Compression numbers (every published model)

Every row is a real measurement. Click a model to download it.

Model	Original	BigSmall	Saved
Qwen2.5-32B-Instruct	61.0 GB	40.3 GB	34%
Gemma-3-27B-it	51.1 GB	33.4 GB	35%
Qwen2.5-14B-Instruct	29.5 GB	19.5 GB	34%
Gemma-3-12B-it	22.7 GB	14.8 GB	35%
Gemma-2-9B-it	17.2 GB	11.3 GB	34%
Llama-3.1-8B-Instruct	15.0 GB	9.7 GB	35%
Llama-3-8B-Instruct	15.0 GB	9.8 GB	34%
Qwen3-8B	15.3 GB	10.1 GB	34%
Mistral-7B-Instruct v0.3	14.2 GB	8.9 GB	37%
Mistral-7B-Instruct v0.2	14.2 GB	8.9 GB	37%
Qwen2.5-7B-Instruct	14.2 GB	9.4 GB	34%
Phi-3.5-mini-instruct	7.1 GB	4.7 GB	34%
Gemma-3-4B-it	8.0 GB	5.2 GB	35%
Qwen3-4B-Instruct	7.5 GB	5.0 GB	34%
Llama-3.2-3B-Instruct	6.4 GB	3.9 GB	39%
Gemma-2-2B-it	4.9 GB	3.2 GB	34%
Qwen2.5-3B-Instruct	5.7 GB	3.8 GB	34%
Qwen2.5-1.5B-Instruct	2.9 GB	1.9 GB	34%
Llama-3.2-1B-Instruct	2.3 GB	1.5 GB	34%
Gemma-3-1B-it	1.9 GB	1.2 GB	35%
Qwen2.5-0.5B-Instruct	920 MB	610 MB	34%
GPT-2 (117M)	548 MB	414 MB	24%
Gemma-3-270M-it	500 MB	330 MB	34%
Gemma-3-270M	500 MB	330 MB	34%
Gemma-2-2B	9.7 GB	8.1 GB	17%

Browse all 25+ models on HuggingFace →

What "lossless" actually means

Every weight in the model is mathematically identical to the original — same bit pattern, same floating-point value, same gradient, same output.

Not quantization. Quantization rounds weights to fewer bits and the model's behaviour changes.
Not pruning. Pruning deletes weights.
Not approximation. No tricks, no calibration data, no quality drop.

BigSmall finds redundancy in the bit pattern of neural weights and stores it more compactly — the same idea as ZIP for text, but tuned for BF16 floating-point distributions. md5 is verified on every tensor at decompression. If a single bit differs, verify fails.

CLI reference

bigsmall compress SRC [-o OUT] [--delta-from BASE] [--auto-delta] [--resume] [--ecc]
bigsmall decompress SRC [-o OUT] [--base BASE]
bigsmall info SRC.bs                       size, ratio, codecs used
bigsmall scan SRC                          analyse before compressing
bigsmall verify SRC.bs [--fast|--sample N] integrity check
bigsmall diff A.bs B.bs [--patch P.bs]     compare or write a delta
bigsmall apply BASE PATCH.bs -o OUT        reconstruct from base + patch
bigsmall repair SRC.bs [-o OUT]            recover via Reed-Solomon ECC sidecar
bigsmall benchmark SRC                     encode/decode throughput
bigsmall migrate SRC.bs                    re-encode with current codecs
bigsmall status                            list your BigSmall HF repos
bigsmall pipeline run SRC DST              resumable download → compress → upload

Every command has --help. See docs/cli-reference.md for full examples.

Python API

import bigsmall

# Round-trip a model
bigsmall.compress("model/", "model.bs")
bigsmall.decompress("model.bs", "model_back/")

# Fine-tune as a delta patch
bigsmall.compress("finetune/", "patch.bs", delta_from="base/")
bigsmall.apply("base/", "patch.bs", "finetune_back/")

# Inspect before compressing
bigsmall.detect_bf16_native("model/")
bigsmall.scan_model("model/")

# Low-VRAM streaming inference (~12× less VRAM than from_pretrained)
from bigsmall import BigSmallStreamingModel
model = BigSmallStreamingModel.from_pretrained(
    "wpferrell/phi-3.5-mini-instruct-bigsmall",
    device="cuda",
    lru_max_vram_gb=2.0,
)

What's new in v3.13

Delta compression — fine-tunes are now ~34% of full model size as a patch on the base.
Auto-detect the base — --auto-delta finds the base by fingerprint so you don't have to.
BF16-native F32 detection — F32 models that are secretly BF16 (Whisper, several HF checkpoints) now compress 40%+ better automatically.
Resume — interrupted compression picks up exactly where it left off (--resume).
Fast verify — bigsmall verify --fast checks integrity in seconds; --sample 0.001 catches in-blob corruption without a full decode.
mmap decode — large .bs files (>256 MB) memory-map instead of fully reading into RAM.
Reed-Solomon ECC — --ecc writes a parity sidecar; bigsmall repair fixes bit-rot.
New CLI commands — info, scan, diff, apply, repair.
Streaming LRU — BigSmallStreamingModel(lru_max_vram_gb=2.0) keeps hot layers in VRAM.

Full changelog →

Research

The lossless compression ceiling for BF16 neural weights has been measured. It is ~62% of raw BF16 for any model, ~34% for fine-tunes with delta compression. We ran 300+ experiments across every known mathematical approach — entropy coding, cross-tensor prediction, learned translators, persistent homology, optimal transport, quantum-inspired methods, and more — and proved that there is no further compression available within the strict bit-identity contract.

Full findings, all experiments, all dead-ends: 10.5281/zenodo.20279248. Plain-English summary: docs/research.md.

Install

pip install bigsmall                  # core
pip install "bigsmall[hf]"            # + HuggingFace integration
pip install "bigsmall[ecc]"           # + Reed-Solomon error recovery
pip install "bigsmall[all]"           # everything

Requires Python 3.9+. Works on Linux, macOS, and Windows. CPU, NVIDIA, AMD, and Apple Silicon.

License

Code: Elastic License 2.0. Free for personal, research, and commercial use. SaaS providers should see LICENSING.md.

Model weights distributed in .bs format keep the license of the original model.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

4.0.0

Jun 12, 2026

3.15.0

Jun 11, 2026

3.14.4

May 21, 2026

3.14.3

May 21, 2026

3.14.2

May 21, 2026

This version

3.14.1

May 21, 2026

3.14.0

May 20, 2026

3.13.1

May 20, 2026

3.13.0

May 20, 2026

3.12.0

May 19, 2026

3.11.0

May 19, 2026

3.10.0

May 19, 2026

3.9.0

May 19, 2026

3.8.0

May 19, 2026

3.7.0

May 18, 2026

3.6.0

May 18, 2026

3.5.0

May 18, 2026

3.4.0

May 18, 2026

3.3.0

May 18, 2026

3.2.0

May 18, 2026

3.1.0

May 18, 2026

3.0.0

May 18, 2026

2.0.1

May 15, 2026

2.0.0

May 15, 2026

1.0.1

May 14, 2026

1.0.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigsmall-3.14.1.tar.gz (221.4 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bigsmall-3.14.1-py3-none-any.whl (192.3 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file bigsmall-3.14.1.tar.gz.

File metadata

Download URL: bigsmall-3.14.1.tar.gz
Upload date: May 21, 2026
Size: 221.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for bigsmall-3.14.1.tar.gz
Algorithm	Hash digest
SHA256	`4368a46ef9960a2025fab264a00266171426139bc76ccee36f2e15bf6e2f8b38`
MD5	`3ea6da7ec12ff9d043081d751452962e`
BLAKE2b-256	`083af7f616e1b1bb33c918c754f9966d7776b6c26c62100da9634592e984e1e2`

See more details on using hashes here.

File details

Details for the file bigsmall-3.14.1-py3-none-any.whl.

File metadata

Download URL: bigsmall-3.14.1-py3-none-any.whl
Upload date: May 21, 2026
Size: 192.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for bigsmall-3.14.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f23e9c8694a26820b807a9ad3c2450a5a48cccd59122aa5afcdfeb8a4aaf2d32`
MD5	`3c3b105b388fcf79138ac4b26351b00a`
BLAKE2b-256	`e5fddf8952cac019fd6c10904173acfed7f0e15e742b1429b1eb601ef9c7be87`

See more details on using hashes here.

bigsmall 3.14.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BigSmall — Lossless AI Model Compression

What BigSmall does

1. Make any model smaller

2. Store fine-tunes as tiny patches

3. Download smaller, use instantly

Compression numbers (every published model)

What "lossless" actually means

CLI reference

Python API

What's new in v3.13

Research

Install

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes