Lossless compression codec tuned for neural-network model weights

These details have not been verified by PyPI

Project links

Project description

z4ai

A lossless storage-and-distribution layer for AI model checkpoints.

Keep model checkpoints small for storage and transfer - bit-for-bit reversible, with per-tensor random access. Most useful on collections of related checkpoints (training runs, fine-tune families, model registries), and in environments the Hugging Face Hub's Xet backend doesn't cover - self-hosted registries, internal MLOps, plain object storage.

Documentation - quickstart, full usage, how it works, and the API reference.

import z4ai

blob = z4ai.compress(weights_bytes, dtype="bf16")   # smaller, self-describing
data = z4ai.decompress(blob)                          # byte-identical original
assert data == weights_bytes

Its strongest case is a sequence of related checkpoints - consecutive ones are ~95-99% identical, so each is stored as a tiny delta from the one before:

# store checkpoint N as the bit-exact delta from checkpoint N-1
delta    = z4ai.compress_delta(step_2000, reference=step_1000, dtype="bf16")
restored = z4ai.decompress_delta(delta, reference=step_1000)   # exact == step_2000

# only the changed bytes cost anything - often 10-100x smaller than a full compress

Install

pip install z4ai

Requires Python >= 3.9. Pure Python (NumPy + zstandard); the native entropy core is optional and z4ai degrades gracefully without it. Full installation guide (optional extras, native acceleration) in the docs.

TL;DR

z4ai is a codec built for the byte structure of float tensors. Compared with ZipNN (the closest weights-specific codec):

Ties on dense weights, with a slight edge on real models (distilgpt2 +1.6-8.2%, pythia-70m +11-29%) from an order-1 context exponent coder.
Wins big on repeated structure - tied embeddings, duplicated layers, multi-shard concatenations - which z4ai dedups across the whole tensor.
2.3-3.0x on reduced-precision fp32 files (fp16/bf16-origin values), automatically.
2.4-10.8x on quantized weights shipped in a wide container - INT4/INT8/FP8 (GPTQ / AWQ / compressed-tensors) dequantised into bf16/fp16/fp32 - via an automatic lossless palette transform. This is the common deployed format, and z4ai beats ZipNN on every case here (e.g. INT8-in-fp32 4.72x vs 1.94x; INT4-in-fp32 10.8x vs 4.6x).
Big wins on sparse / pruned weights, and 10-180x on checkpoint sequences via the lossless compress_delta mode - which ZipNN has no equivalent for.
Slower to compress than ZipNN's compiled-C core; decompress is competitive.

Honest ceiling. On a dense checkpoint a trained float's mantissa is near-random and its exponent carries only ~2.6 bits, capping any lossless codec at ~1.5x (bf16) / ~1.2x (fp32). ZipNN already hits that wall, so z4ai can't meaningfully out-ratio it there - it wins by a hair via order-1 rANS on the exponent. The large wins come from redundancy the entropy bound assumes away: reduced precision, sparsity, structure, and cross-checkpoint deltas.

All numbers below are measured on this repo and reproducible with one command.

Benchmarks vs ZipNN

_{Machine: 16 cores | Python 3.14 | zstandard 0.25.0 | zipnn (latest) |
32 MB per dtype | best-of-3 timing. Every codec is verified byte-exact (lossless).}

Compression ratio (higher is better)

Scenario	dtype	z4ai	ZipNN	zstd‑3	z4ai vs ZipNN
Dense / i.i.d. weights	bf16	1.413	1.417	1.227	-0.3% \| tie
Dense / i.i.d. weights	fp32	1.171	1.172	1.061	-0.1% \| tie
Structured (repeated/duplicated)	bf16	58.1	1.51	16.97	+3750%
Structured (repeated/duplicated)	fp32	47.3	1.20	14.24	+3831%
Sparse (50% zeros)	bf16	2.47	2.20	1.88	+12.5%
Sparse (50% zeros)	fp32	2.21	1.86	1.79	+18.9%
Quantized INT8 (dequantised to…)	bf16	2.39	2.07	1.79	+15.6%
Quantized INT8 (dequantised to…)	fp32	4.72	1.94	3.07	+143%
Quantized INT4 (dequantised to…)	bf16	5.41	3.87	3.91	+39.9%
Quantized INT4 (dequantised to…)	fp32	10.77	4.59	5.13	+135%

_{Quantized rows: per-tensor INT4/INT8 weights dequantised back into a wide
float container (the format most quantized models ship in), same 32 MB/dtype
config as above. z4ai auto-selects the lossless palette transform; several ZipNN
entries here are not byte-exact on the tested build. Reproduce with
python benchmarks/bench_palette.py (which reports a stronger
zstd-19 baseline, so z4ai's margin there is conservative).}

Real & production workloads

Workload	z4ai	ZipNN	Note
Real checkpoint - bert-tiny, 17.7 MB fp32 (downloaded)	1.188	1.202	-1.2% - a single dense checkpoint ~ i.i.d., so a small loss. The win is on redundancy, not dense noise.
Production .safetensors - 201 MB BF16 with a tied embedding	1.525	1.510	+1.0% vs per-tensor ZipNN - z4ai dedups the tied `embed_tokens`/`lm_head` that ZipNN's 256 KiB chunking can't.
Realistic full checkpoint - 107 MB BF16 (tied embeddings, shared blocks, optimizer state, 50% pruned layer)	2.93	1.67	+75.7% - z4ai's whole-buffer LZ dedups the structure real checkpoints carry; ZipNN's chunked Huffman cannot see across chunks.
Checkpoint delta - bert-tiny BF16, 5% of weights changed	51.1	~1.7	30x smaller than from-scratch. `compress_delta` stores only what changed (1% → 184x; 20% → 18x). ZipNN has no delta mode.

Reproduce

python benchmarks/benchmark.py --mb 32 --dtypes bf16 fp32 --scenario iid
python benchmarks/benchmark.py --mb 32 --dtypes bf16 fp32 --scenario structured
python benchmarks/benchmark.py --mb 32 --dtypes bf16 fp32 --scenario sparse
python benchmarks/bench_real_checkpoint.py        # downloads a real .bin checkpoint
python benchmarks/bench_safetensors.py --layers 8 --d 1024
python benchmarks/checkpoint_bench.py --mb 96     # realistic structured checkpoint

Throughput

Codec	compress	decompress
z4ai (i.i.d. bf16, MB/s)	1420	16700
ZipNN	8125	20020

z4ai compresses ~6x slower and decompresses ~1.2x slower than ZipNN's compiled-C core - the deliberate trade for a write-once, read-many artifact. A fused multithreaded native codec (z4ai.chunked) and effort="fast"/"max" tiers trade decode latency against file size.

Documentation

Full docs - quickstart, usage, how it works, CLI, and the API reference - live at z4ai.github.io/z4ai.

Page	What's there
Quickstart	Compress a buffer, an ndarray, or a `.safetensors` file in a few lines.
Usage	Effort tiers, sparse/quantized weights, checkpoint & model deltas, per-tensor random access, the high-throughput native path.
How it works	Field decorrelation, whole-tensor matching, the best-of selector, and where the codec pays off.
CLI	`z4ai compress / decompress / info`, pipe-friendly.
Background & references	Prior art (ZipNN, DFloat11, NeuZip, ZipLLM, fpzip, rANS/FSE ...) and the honest entropy-ceiling framing.
API reference	Every public function, generated from the source.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Jun 9, 2026

This version

0.1.0

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

z4ai-0.1.0.tar.gz (159.1 kB view details)

Uploaded Jun 9, 2026 Source

File details

Details for the file z4ai-0.1.0.tar.gz.

File metadata

Download URL: z4ai-0.1.0.tar.gz
Upload date: Jun 9, 2026
Size: 159.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for z4ai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`003e49d236a54b56a7c71f23758eac9a2808f68743e2da29ff3f5657830020d2`
MD5	`2e9a923591e52334cad57e5866c4fde8`
BLAKE2b-256	`ce757133e7621d3d1c7d41640ecc56e65e37a239d981a22007238e8f0ab576de`

See more details on using hashes here.

z4ai 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

z4ai

Install

TL;DR

Benchmarks vs ZipNN

Compression ratio (higher is better)

Real & production workloads

Throughput

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes