Skip to main content

Lossless compression codec tuned for neural-network model weights

Project description

z4ai

z4ai

A lossless storage-and-distribution layer for AI model checkpoints.

Keep model checkpoints small for storage and transfer - bit-for-bit reversible, with per-tensor random access. Most useful on collections of related checkpoints (training runs, fine-tune families, model registries), and in environments the Hugging Face Hub's Xet backend doesn't cover - self-hosted registries, internal MLOps, plain object storage.

CI Docs

Documentation - quickstart, full usage, how it works, and the API reference.

import z4ai

blob = z4ai.compress(weights_bytes, dtype="bf16")   # smaller, self-describing
data = z4ai.decompress(blob)                          # byte-identical original
assert data == weights_bytes

Its strongest case is a sequence of related checkpoints - consecutive ones are ~95-99% identical, so each is stored as a tiny delta from the one before:

# store checkpoint N as the bit-exact delta from checkpoint N-1
delta    = z4ai.compress_delta(step_2000, reference=step_1000, dtype="bf16")
restored = z4ai.decompress_delta(delta, reference=step_1000)   # exact == step_2000

# only the changed bytes cost anything - often 10-100x smaller than a full compress

Install

pip install z4ai

Requires Python >= 3.9. Pure Python (NumPy + zstandard); the native entropy core is optional and z4ai degrades gracefully without it. Full installation guide (optional extras, native acceleration) in the docs.

TL;DR

z4ai is a codec built for the byte structure of float tensors. Compared with ZipNN (the closest weights-specific codec):

  • Ties on dense weights, with a slight edge on real models (distilgpt2 +1.6-8.2%, pythia-70m +11-29%) from an order-1 context exponent coder.
  • Wins big on repeated structure - tied embeddings, duplicated layers, multi-shard concatenations - which z4ai dedups across the whole tensor.
  • 2.3-3.0x on reduced-precision fp32 files (fp16/bf16-origin values), automatically.
  • 2.4-10.8x on quantized weights shipped in a wide container - INT4/INT8/FP8 (GPTQ / AWQ / compressed-tensors) dequantised into bf16/fp16/fp32 - via an automatic lossless palette transform. This is the common deployed format, and z4ai beats ZipNN on every case here (e.g. INT8-in-fp32 4.72x vs 1.94x; INT4-in-fp32 10.8x vs 4.6x).
  • Big wins on sparse / pruned weights, and 10-180x on checkpoint sequences via the lossless compress_delta mode - which ZipNN has no equivalent for.
  • Slower to compress than ZipNN's compiled-C core; decompress is competitive.

Honest ceiling. On a dense checkpoint a trained float's mantissa is near-random and its exponent carries only ~2.6 bits, capping any lossless codec at ~1.5x (bf16) / ~1.2x (fp32). ZipNN already hits that wall, so z4ai can't meaningfully out-ratio it there - it wins by a hair via order-1 rANS on the exponent. The large wins come from redundancy the entropy bound assumes away: reduced precision, sparsity, structure, and cross-checkpoint deltas.

All numbers below are measured on this repo and reproducible with one command.

Benchmarks vs ZipNN

Machine: 16 cores | Python 3.14 | zstandard 0.25.0 | zipnn (latest) | 32 MB per dtype | best-of-3 timing. Every codec is verified byte-exact (lossless).

Compression ratio (higher is better)

Scenariodtypez4aiZipNNzstd‑3z4ai vs ZipNN
Dense / i.i.d. weightsbf161.4131.4171.227-0.3% | tie
Dense / i.i.d. weightsfp321.1711.1721.061-0.1% | tie
Structured (repeated/duplicated)bf1658.11.5116.97+3750%
Structured (repeated/duplicated)fp3247.31.2014.24+3831%
Sparse (50% zeros)bf162.472.201.88+12.5%
Sparse (50% zeros)fp322.211.861.79+18.9%
Quantized INT8 (dequantised to…)bf162.392.071.79+15.6%
Quantized INT8 (dequantised to…)fp324.721.943.07+143%
Quantized INT4 (dequantised to…)bf165.413.873.91+39.9%
Quantized INT4 (dequantised to…)fp3210.774.595.13+135%

Quantized rows: per-tensor INT4/INT8 weights dequantised back into a wide float container (the format most quantized models ship in), same 32 MB/dtype config as above. z4ai auto-selects the lossless palette transform; several ZipNN entries here are not byte-exact on the tested build. Reproduce with python benchmarks/bench_palette.py (which reports a stronger zstd-19 baseline, so z4ai's margin there is conservative).

Real & production workloads

Workloadz4aiZipNNNote
Real checkpoint - bert-tiny, 17.7 MB fp32 (downloaded) 1.1881.202 -1.2% - a single dense checkpoint ~ i.i.d., so a small loss. The win is on redundancy, not dense noise.
Production .safetensors - 201 MB BF16 with a tied embedding 1.5251.510 +1.0% vs per-tensor ZipNN - z4ai dedups the tied embed_tokens/lm_head that ZipNN's 256 KiB chunking can't.
Realistic full checkpoint - 107 MB BF16 (tied embeddings, shared blocks, optimizer state, 50% pruned layer) 2.931.67 +75.7% - z4ai's whole-buffer LZ dedups the structure real checkpoints carry; ZipNN's chunked Huffman cannot see across chunks.
Checkpoint delta - bert-tiny BF16, 5% of weights changed 51.1~1.7 30x smaller than from-scratch. compress_delta stores only what changed (1% → 184x; 20% → 18x). ZipNN has no delta mode.
Reproduce
python benchmarks/benchmark.py --mb 32 --dtypes bf16 fp32 --scenario iid
python benchmarks/benchmark.py --mb 32 --dtypes bf16 fp32 --scenario structured
python benchmarks/benchmark.py --mb 32 --dtypes bf16 fp32 --scenario sparse
python benchmarks/bench_real_checkpoint.py        # downloads a real .bin checkpoint
python benchmarks/bench_safetensors.py --layers 8 --d 1024
python benchmarks/checkpoint_bench.py --mb 96     # realistic structured checkpoint

Throughput

Codeccompressdecompress
z4ai (i.i.d. bf16, MB/s)142016700
ZipNN812520020

z4ai compresses ~6x slower and decompresses ~1.2x slower than ZipNN's compiled-C core - the deliberate trade for a write-once, read-many artifact. A fused multithreaded native codec (z4ai.chunked) and effort="fast"/"max" tiers trade decode latency against file size.

Documentation

Full docs - quickstart, usage, how it works, CLI, and the API reference - live at z4ai.github.io/z4ai.

Page What's there
Quickstart Compress a buffer, an ndarray, or a .safetensors file in a few lines.
Usage Effort tiers, sparse/quantized weights, checkpoint & model deltas, per-tensor random access, the high-throughput native path.
How it works Field decorrelation, whole-tensor matching, the best-of selector, and where the codec pays off.
CLI z4ai compress / decompress / info, pipe-friendly.
Background & references Prior art (ZipNN, DFloat11, NeuZip, ZipLLM, fpzip, rANS/FSE ...) and the honest entropy-ceiling framing.
API reference Every public function, generated from the source.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

z4ai-0.1.0.tar.gz (159.1 kB view details)

Uploaded Source

File details

Details for the file z4ai-0.1.0.tar.gz.

File metadata

  • Download URL: z4ai-0.1.0.tar.gz
  • Upload date:
  • Size: 159.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for z4ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 003e49d236a54b56a7c71f23758eac9a2808f68743e2da29ff3f5657830020d2
MD5 2e9a923591e52334cad57e5866c4fde8
BLAKE2b-256 ce757133e7621d3d1c7d41640ecc56e65e37a239d981a22007238e8f0ab576de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page