Lossless neural network weight compression - run any model, no compromises

These details have not been verified by PyPI

Project links

Project description

BigSmall — Make AI Models Smaller, Instantly

Lossless compression for neural network weights. Same model, smaller files. Bit-identical weights, md5-verified.

pip install bigsmall

A 14 GB Mistral 7B becomes 9 GB. A fine-tuned model becomes a small "patch" on top of its base — often less than 35% of the full size. Drop-in compatible with HuggingFace from_pretrained.

What it does

Three things, in plain English:

1. Compress any model

bigsmall compress model.safetensors -o model.bs
bigsmall decompress model.bs -o reconstructed.safetensors

Before: 15 GB safetensors. After: 10 GB .bs file. Quality: every weight bit-for-bit identical to the original.

2. Compress a fine-tuned model as a "patch"

If you have the base model already, store only what changed:

bigsmall compress fine_tuned.safetensors --delta-from base.safetensors -o patch.bs
bigsmall apply base.safetensors patch.bs -o reconstructed.safetensors

Before: 15 GB fine-tuned model. After: ~5 GB patch (depends on how much was fine-tuned). Quality: every weight bit-for-bit identical to the original.

This is the biggest user win. If you're publishing a fine-tune of a public base, your users can store the base once and download patches.

3. Use a pre-compressed model from HuggingFace

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "wpferrell/mistral-7b-instruct-bigsmall"
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")

Works exactly like any other HuggingFace model. BigSmall transparently decompresses in the background.

How much smaller?

Model	Original	BigSmall	Saved
GPT-2 (117M, FP32)	548 MB	414 MB	24%
Llama 3.2-1B-Instruct	2.5 GB	1.5 GB	40%
Llama 3.2-3B-Instruct	6.4 GB	3.9 GB	39%
Mistral 7B Instruct v0.3	14.2 GB	9.3 GB	34%
Qwen 2.5-7B-Instruct	15.2 GB	10.0 GB	34%
Llama 3-8B-Instruct	16.1 GB	9.8 GB	39%
Qwen 2.5-14B-Instruct	29.5 GB	19.5 GB	34%
Gemma 2-9B-it	18.5 GB	11.3 GB	39%
Gemma 3-1B-it	2.0 GB	1.2 GB	40%
Stable Diffusion 1.5 UNet (FP16)	1.72 GB	1.48 GB	14%
Fine-tune patch (Instruct vs base)	14 GB	~5 GB	~65%

Browse all pre-compressed models →

Why lossless matters

Exact same model. Bit-identical weights. Every floating-point value is mathematically identical to the original.
Not quantization. Quantization (INT8, INT4) changes weight values — model behaviour changes too, even if just slightly.
Not pruning. Pruning removes parts of the model.
Not approximation. No tricks, no calibration data, no quality loss.

BigSmall compresses neural network weights the same way ZIP compresses text files: it finds redundancy in the bit pattern and stores it more compactly. The output decodes back to the exact same bits. md5 verified on every tensor.

Install

pip install bigsmall                  # core
pip install "bigsmall[hf]"            # + HuggingFace Hub integration
pip install "bigsmall[ecc]"           # + Reed-Solomon error recovery
pip install "bigsmall[all]"           # everything

Requirements: Python 3.9+, NumPy, safetensors. PyTorch is required for HuggingFace round-trips and for using compressed models in inference.

Works on Linux, macOS, and Windows. CPU + NVIDIA + AMD + Apple Silicon.

What's new in v3.13.0

Delta compression (the big one). Compress a fine-tune as a patch on its base model. bigsmall compress fine_tuned/ --delta-from base/ patch.bs. Often <35% of the full model size, fully lossless.
Auto-detect the base model. bigsmall compress --auto-delta scans known-base fingerprints and suggests the right base. Header embeds a fingerprint of the base used, so decompression warns on mismatch.
Resumable compression. bigsmall compress --resume picks up exactly where it left off if the run was interrupted. Tensor-level checkpointing.
mmap-backed decode. Large .bs files (>256 MB) are now mmap'd instead of fully read into RAM. Lower peak memory, faster start.
GPU INT8 KV cache. LossyKVCacheGPU — opt-in lossy compression for runtime KV cache. ~50% VRAM saving for streaming inference, max error ~0.04 in BF16.
Streaming LRU layer cache. BigSmallStreamingModel(lru_max_vram_gb=2.0) keeps the most-recently-used decoded layers in VRAM.
Reed-Solomon ECC. bigsmall compress --ecc writes a parity sidecar that can recover from ~16 corrupted bytes per 223-byte block. bigsmall repair uses it.
Fast probabilistic verify. bigsmall verify --sample 0.001 decodes 0.1% of weights and verifies their md5 — catches in-blob corruption without the cost of a full verify.
Three new CLI commands. bigsmall scan (analyse before compressing), bigsmall apply (delta + base → original), bigsmall repair (ECC recovery).
V8 codec opt-in. Layer-type-aware codec for attention / embedding tensors. Negligible average gain (~0.07%), available via --use-v8-codec for users who want the option.
bigsmall.detect_bf16_native — detects F32 models that are really BF16 upcast and compresses them as BF16 (44% of raw F32 instead of 83%).
bigsmall.download_delta(repo_id, base_dir, output_dir) — pull a delta repo from HuggingFace and reconstruct the fine-tune.

See CHANGELOG.md for full details.

CLI reference

bigsmall compress SRC [-o OUTPUT] [--delta-from BASE] [--auto-delta]
                       [--resume] [--ecc] [--storage|--balanced|--inference]
bigsmall decompress SRC [-o OUTPUT] [--base BASE]
bigsmall info SRC                       # size, ratio, codecs used
bigsmall scan SRC                       # analyse before compressing
bigsmall stat SRC [--tensor X]          # per-tensor table
bigsmall verify SRC [--fast|--sample N] # integrity check
bigsmall diff A.bs B.bs [--patch P.bs]  # compare or write a delta
bigsmall apply BASE PATCH.bs -o OUT     # reconstruct from base + patch
bigsmall repair SRC.bs [-o OUT]         # recover using .ecc sidecar
bigsmall benchmark SRC                  # encode/decode speed
bigsmall migrate SRC                    # re-encode with current codecs
bigsmall status                         # list your BigSmall HF repos
bigsmall pipeline run SRC DST           # resumable download → compress → upload

Each command has --help for details. See docs/cli-reference.md for examples.

Common workflows

Compress and upload a model to HuggingFace

python -c "
import bigsmall
bigsmall.compress_for_hub('mistralai/Mistral-7B-Instruct-v0.3', './mistral_bs/')
bigsmall.upload_to_hub('./mistral_bs/', repo_id='wpferrell/mistral-7b-bigsmall')
"

Use a compressed model on a low-VRAM GPU

from bigsmall import BigSmallStreamingModel

model = BigSmallStreamingModel.from_pretrained(
    "wpferrell/mistral-7b-instruct-bigsmall",
    device="cuda",
    lru_max_vram_gb=2.0,     # cache 2 GB of decoded layers
)
out = model.generate(input_ids, max_new_tokens=100)

Uses ~12× less VRAM than standard loading by streaming layers on demand.

Distribute a fine-tune as a small patch

# As the publisher:
bigsmall compress fine_tuned.safetensors --delta-from base.safetensors -o patch.bs
# upload patch.bs to your HF repo

# As a user:
python -c "
import bigsmall
bigsmall.download_delta(
    'wpferrell/my-finetune-bigsmall-delta',
    base_dir='~/.cache/huggingface/.../Mistral-7B-Instruct-v0.3',
    output_dir='./reconstructed',
)
"

Research

BigSmall ships from a multi-month research arc that established the per-tensor lossless ceiling for BF16 transformer weights. We measured every meaningful direction — column-major rescan, 2D context coding, head-cluster dedup, QKV split, delta encoding, BF16-native F32 detection — and report what works and what doesn't.

Bottom-line findings:

The per-tensor lossless floor for BF16 transformer weights is ~65-66% of raw. Proven by V4-V8 experiments (300+ tested combinations, see research/).
The biggest meaningful gain available today is delta compression for fine-tuned models — ~34% of raw BF16.
All other intra-tensor angles have been falsified empirically.

Cite the BigSmall paper: Zenodo DOI 10.5281/zenodo.20279248

See docs/research.md for a plain-English summary of what was learned.

License

Code: Elastic License 2.0. Free for personal, research, and commercial use under typical software-product terms. See LICENSING.md for commercial licensing.

Model weights distributed via BigSmall format keep the license of the original model.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

4.0.0

Jun 12, 2026

3.15.0

Jun 11, 2026

3.14.4

May 21, 2026

3.14.3

May 21, 2026

3.14.2

May 21, 2026

3.14.1

May 21, 2026

3.14.0

May 20, 2026

3.13.1

May 20, 2026

This version

3.13.0

May 20, 2026

3.12.0

May 19, 2026

3.11.0

May 19, 2026

3.10.0

May 19, 2026

3.9.0

May 19, 2026

3.8.0

May 19, 2026

3.7.0

May 18, 2026

3.6.0

May 18, 2026

3.5.0

May 18, 2026

3.4.0

May 18, 2026

3.3.0

May 18, 2026

3.2.0

May 18, 2026

3.1.0

May 18, 2026

3.0.0

May 18, 2026

2.0.1

May 15, 2026

2.0.0

May 15, 2026

1.0.1

May 14, 2026

1.0.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigsmall-3.13.0.tar.gz (190.7 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bigsmall-3.13.0-py3-none-any.whl (166.6 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file bigsmall-3.13.0.tar.gz.

File metadata

Download URL: bigsmall-3.13.0.tar.gz
Upload date: May 20, 2026
Size: 190.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for bigsmall-3.13.0.tar.gz
Algorithm	Hash digest
SHA256	`8077027a05942b166c4e45dff58f901fa90743a9668a6149bf45c28c66de695c`
MD5	`49d318efe689390c3192ffb517f4f99b`
BLAKE2b-256	`09a2ef4326968c37156ada0a8d74742c210e7d2c011b0d9d97a902d49dbc5c07`

See more details on using hashes here.

File details

Details for the file bigsmall-3.13.0-py3-none-any.whl.

File metadata

Download URL: bigsmall-3.13.0-py3-none-any.whl
Upload date: May 20, 2026
Size: 166.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for bigsmall-3.13.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b29700aff0c4dba1fe0b5bb39498d60e8bfb31d385313beee753c35527786a1c`
MD5	`01c086c936dd4e1a44cbb96ba11ea273`
BLAKE2b-256	`484b4456a5f017ba95c3f1af3bc3f303884bbfdb1f2ee6dc38f59f1218df1c24`

See more details on using hashes here.

bigsmall 3.13.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BigSmall — Make AI Models Smaller, Instantly

What it does

1. Compress any model

2. Compress a fine-tuned model as a "patch"

3. Use a pre-compressed model from HuggingFace

How much smaller?

Why lossless matters

Install

What's new in v3.13.0

CLI reference

Common workflows

Compress and upload a model to HuggingFace

Use a compressed model on a low-VRAM GPU

Distribute a fine-tune as a small patch

Research

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes