Lossless compression codec tuned for neural-network model weights

These details have not been verified by PyPI

Project links

Project description

z4ai

A lossless compression codec for AI model checkpoints.

Keep model checkpoints small for storage and transfer - bit-for-bit reversible, with per-tensor random access. Most useful on collections of related checkpoints (training runs, fine-tune families, model registries), and in environments the Hugging Face Hub's Xet backend doesn't cover - self-hosted registries, internal MLOps, plain object storage.

Documentation - quickstart, full usage, how it works, and the API reference.

pip install z4ai

import z4ai

blob = z4ai.compress(weights_bytes)   # smaller, self-describing
data = z4ai.decompress(blob)          # byte-identical original
assert data == weights_bytes

Its strongest case is a sequence of related checkpoints - consecutive ones are ~95-99% identical, so each is stored as a tiny delta from the one before:

# store checkpoint N as the bit-exact delta from checkpoint N-1
delta    = z4ai.compress_delta(step_2000, reference=step_1000)
restored = z4ai.decompress_delta(delta, reference=step_1000)   # exact == step_2000

# only the changed bytes cost anything - often 10-100x smaller than a full compress

Or from the command line, on files:

z4ai compress   weights.bin -o weights.z4ai
z4ai decompress weights.z4ai -o weights.bin
z4ai info       weights.z4ai            # ratio + per-plane breakdown

Requires Python >= 3.9; pure Python (NumPy + zstandard), with optional native acceleration.

TL;DR

z4ai is a codec built for the byte structure of float tensors. Compared with ZipNN (the closest weights-specific codec):

Ties on dense weights, with a slight edge on real models (distilgpt2 +1.6-8.2%, pythia-70m +11-29%) from an order-1 context exponent coder.
Wins big on repeated structure - tied embeddings, duplicated layers, multi-shard concatenations - which z4ai dedups across the whole tensor.
2.3-3.0x on reduced-precision fp32 files (fp16/bf16-origin values), automatically.
2.4-10.8x on quantized weights shipped in a wide container - INT4/INT8/FP8 (GPTQ / AWQ / compressed-tensors) dequantised into bf16/fp16/fp32 - via an automatic lossless palette transform. This is the common deployed format, and z4ai beats ZipNN on every case here (e.g. INT8-in-fp32 4.72x vs 1.94x; INT4-in-fp32 10.8x vs 4.6x).
Big wins on sparse / pruned weights, and 10-180x on checkpoint sequences via the lossless compress_delta mode - which ZipNN has no equivalent for.
Slower to compress than ZipNN's compiled-C core; decompress is competitive.

Honest ceiling. On a dense checkpoint a trained float's mantissa is near-random and its exponent carries only ~2.6 bits, capping any lossless codec at ~1.5x (bf16) / ~1.2x (fp32). ZipNN already hits that wall, so z4ai can't meaningfully out-ratio it there - it wins by a hair via order-1 rANS on the exponent. The large wins come from redundancy the entropy bound assumes away: reduced precision, sparsity, structure, and cross-checkpoint deltas.

All numbers below are measured on this repo and reproducible with one command.

Benchmarks vs ZipNN

_{Machine: 16 cores | Python 3.14 | zstandard 0.25.0 | zipnn (latest) |
32 MB per dtype | best-of-3 timing. Every codec is verified byte-exact (lossless).}

Compression ratio (higher is better)

Scenario	dtype	z4ai	ZipNN	zstd‑3	z4ai vs ZipNN
Dense / i.i.d. weights	bf16	1.413	1.417	1.227	-0.3% \| tie
Dense / i.i.d. weights	fp32	1.171	1.172	1.061	-0.1% \| tie
Structured (repeated/duplicated)	bf16	58.1	1.51	16.97	+3750%
Structured (repeated/duplicated)	fp32	47.3	1.20	14.24	+3831%
Sparse (50% zeros)	bf16	2.47	2.20	1.88	+12.5%
Sparse (50% zeros)	fp32	2.21	1.86	1.79	+18.9%
Quantized INT8 (dequantised to…)	bf16	2.39	2.07	1.79	+15.6%
Quantized INT8 (dequantised to…)	fp32	4.72	1.94	3.07	+143%
Quantized INT4 (dequantised to…)	bf16	5.41	3.87	3.91	+39.9%
Quantized INT4 (dequantised to…)	fp32	10.77	4.59	5.13	+135%

_{Quantized rows: per-tensor INT4/INT8 weights dequantised back into a wide
float container (the format most quantized models ship in), same 32 MB/dtype
config as above. z4ai auto-selects the lossless palette transform; several ZipNN
entries here are not byte-exact on the tested build. Reproduce with
python benchmarks/bench_palette.py (which reports a stronger
zstd-19 baseline, so z4ai's margin there is conservative).}

Real & production workloads

Workload	z4ai	ZipNN	Note
Real checkpoint - bert-tiny, 17.7 MB fp32 (downloaded)	1.188	1.202	-1.2% - a single dense checkpoint ~ i.i.d., so a small loss. The win is on redundancy, not dense noise.
Production .safetensors - 201 MB BF16 with a tied embedding	1.525	1.510	+1.0% vs per-tensor ZipNN - z4ai dedups the tied `embed_tokens`/`lm_head` that ZipNN's 256 KiB chunking can't.
Realistic full checkpoint - 107 MB BF16 (tied embeddings, shared blocks, optimizer state, 50% pruned layer)	2.93	1.67	+75.7% - z4ai's whole-buffer LZ dedups the structure real checkpoints carry; ZipNN's chunked Huffman cannot see across chunks.
Checkpoint delta - bert-tiny BF16, 5% of weights changed	51.1	~1.7	30x smaller than from-scratch. `compress_delta` stores only what changed (1% → 184x; 20% → 18x). ZipNN has no delta mode.

Reproduce

python benchmarks/benchmark.py --mb 32 --dtypes bf16 fp32 --scenario iid
python benchmarks/benchmark.py --mb 32 --dtypes bf16 fp32 --scenario structured
python benchmarks/benchmark.py --mb 32 --dtypes bf16 fp32 --scenario sparse
python benchmarks/bench_real_checkpoint.py        # downloads a real .bin checkpoint
python benchmarks/bench_safetensors.py --layers 8 --d 1024
python benchmarks/checkpoint_bench.py --mb 96     # realistic structured checkpoint

Throughput

Codec	compress	decompress
z4ai (i.i.d. bf16, MB/s)	1420	16700
ZipNN	8125	20020

z4ai compresses ~6x slower and decompresses ~1.2x slower than ZipNN's compiled-C core - the deliberate trade for a write-once, read-many artifact. A fused multithreaded native codec (z4ai.chunked) and effort="fast"/"max" tiers trade decode latency against file size.

How it works

A float is [ sign | exponent | mantissa ]: in trained weights the exponent bits repeat heavily while the mantissa looks like noise, and the two are interleaved byte-by-byte - so a general-purpose zip can't separate them (plain zstd on raw fp32 barely reaches ~1.06x). z4ai pulls the bytes apart, matches redundancy across the whole tensor, then entropy-codes each part near its floor:

flowchart TD
    IN["Float tensor bytes<br/>(bf16 / fp16 / fp32 / fp64)"]
    IN -->|split by dtype| SPLIT["Plane / bit-field split"]
    SPLIT --> EXP["Exponent / sign plane<br/>(low entropy)"]
    SPLIT --> MAN["Mantissa plane<br/>(noise-like)"]
    EXP --> ENT["Entropy coding<br/>(rANS / zstd)"]
    MAN --> STORE["Store verbatim, or zstd"]
    ENT --> LDM["Whole-tensor long-distance matching<br/>dedups tied / repeated weights"]
    STORE --> LDM
    LDM --> BEST["Best-of selection: keep the smallest<br/>never worse than plain zstd"]
    BEST --> OUT["Self-describing container<br/>(.z4ai / .zstn)"]

Decoding is the exact inverse, driven entirely by the self-describing header - no side-channel metadata, and the output is never larger than the input. Pruned weights take a zero-aware path; the safetensors/ZSTN container adds a per-tensor index for random-access reads and stores tied weights once.

References

z4ai's building blocks are well-studied; the contribution is applying them to the byte structure of model weights and matching across the whole tensor - and across checkpoints. On dense weights it cannot meaningfully out-ratio ZipNN (the entropy ceiling binds every lossless codec equally); the wins come from structure, reduced precision, sparsity, and cross-checkpoint deltas.

Float field decorrelation - Lindstrom & Isenburg, fpzip, IEEE TVCG 2006 (project); applied to NN weights by ZipNN (the codec benchmarked against here).
Long-distance LZ matching - Ziv & Lempel, LZ77 (1977), via Zstandard (RFC 8878).
Entropy coding (rANS) - Duda, Asymmetric Numeral Systems (2013).
Cross-checkpoint / cross-model delta - ZipLLM (NSDI 2026) - the basis for compress_delta / model_delta.

The full survey (DFloat11, NeuZip, DietGPU, ECF8, ALP, Pcodec, the BF16 entropy ceiling) is in the docs.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 9, 2026

0.1.0

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

z4ai-0.2.0.tar.gz (164.6 kB view details)

Uploaded Jun 9, 2026 Source

File details

Details for the file z4ai-0.2.0.tar.gz.

File metadata

Download URL: z4ai-0.2.0.tar.gz
Upload date: Jun 9, 2026
Size: 164.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for z4ai-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`340d467ab58a518f3f3696050fa0ecd4271aec3a6d5673e21cf64bd7fa84ceab`
MD5	`5316beb2a1f2068dd738af5b1eb13bff`
BLAKE2b-256	`585812f84d85dd8428e0ba0d4fe8d3088fb261aa1bc4122d87b78f905fb9928a`

See more details on using hashes here.

z4ai 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

z4ai

TL;DR

Benchmarks vs ZipNN

Compression ratio (higher is better)

Real & production workloads

Throughput

How it works

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes