DMX — Delta Multiplexed Model Format. Near-lossless neural network weight compression.

These details have not been verified by PyPI

Project links

Project description

DMX — Delta Multiplexed Model Format

A new compression format for neural network weights.

Original:  9.1 GB  (SVD-XT, FP32 — 80% includes FP32→FP16 conversion)
DMX:       1.8 GB

Original:  7.2 GB  (Wan 2.2 14B shard, FP32 — 79.5% includes FP32→FP16)
DMX:       1.5 GB  (142/142 tensors verified)

Original:  16 GB   (Llama 3 8B, FP16 — 55% pure FP16 compression)
DMX:       ~7.2 GB (+0.16% perplexity on wikitext-2)

Try it now

pip install dmx-compress
dmx compress your_model.safetensors compressed.dmx

Download pre-compressed models now

Model	Original	DMX	Savings	Verified
Wan 1.3B	2.7 GB	1.1 GB	60%	825/825 tensors
Wan 2.2 shard	7.2 GB	1.5 GB	79.5%	142/142 tensors
SVD-XT	9.1 GB	1.8 GB	80%	Roundtrip verified

Why this matters for frontier training

From first principles: In high-precision training, a full checkpoint is effectively a near-complete copy of the model state — weights (BF16/FP32) plus optimizer states (often another 2-4x the size). Each save is massive. Teams are forced to make checkpoints sparse (every few thousand steps) to keep storage and I/O under control. This is not a bug in the tools — it is a direct consequence of the numbers involved.

The operational reality: Frontier training runs are now routinely $50M-$200M+ each. While accelerators dominate the budget, checkpoint storage, bandwidth, and recovery time are real, recurring costs. Infra teams track these "quiet" expenses closely.

Where DMX changes the equation: One high-precision baseline anchor + many tiny, exact integer deltas with zero error accumulation (see Test 8 below). 200 checkpoints of a Llama 70B-class model are projected to drop from ~28 TB raw to ~3 TB while remaining mathematically safe for resumption, branching, and analysis.

Aspect	Current Status Quo	With DMX Delta Chains	Benefit
Checkpoint frequency	Sparse (forced by cost)	Dense and safe	Better science and debugging
Storage for 200 ckpts (70B)	~28 TB	~3 TB (projected)	~9x reduction
Resumption fidelity	Full copy required	Exact integer chain	Zero accumulation error (measured)
Fine-tune distribution	Full copy per variant	Small delta per variant	80% savings (measured on TinyLlama 1.1B)

This doesn't reduce the dominant cost (compute), but it meaningfully lowers a real operational friction point that every large lab deals with. It gives researchers and engineers far more usable history than was previously practical.

What is DMX?

DMX is a near-lossless post-training compression format for neural network weights, optimized for storage and distribution. It reduces model file sizes 55-80% while preserving model quality (+0.03-0.16% perplexity).

No retraining required — compress any pretrained safetensors model
Reversible — decompress back to the original format
Broad compatibility — tested on LLMs, diffusion models, video models, encoder-decoder

Storage and transfer comparison

DMX is focused on reducing model size for storage and network transfer — not runtime inference. Here's how a 140 GB model (Llama 3 70B, FP16) compares across compression approaches:

Method	Compressed Size	Savings	Quality Loss	Purpose
safetensors	140 GB	0%	None	Original format
gzip	~134 GB	~4%	None	Generic compression (barely helps on floats)
zstd-19	~129 GB	~8%	None	Better generic compression (still limited)
DFloat11	~98 GB	~30%	None (lossless)	Lossless NN weight compression
ZipNN	~94 GB	~33%	None (lossless)	Lossless NN weight compression
DMX M=7	~63 GB	~55%	+0.03% PPL	Near-original quality, high compression
DMX M=6	~56 GB	~60%	+0.16% PPL	Aggressive storage compression

For reference, quantized inference formats like GGUF Q8 (~50%) and Q4 (~75%) achieve similar or greater compression but are designed for a different purpose — running models directly at reduced precision with fused kernels. DMX and GGUF serve different needs and are not interchangeable.

If lossless is enough, use DFloat11 or ZipNN. If you need to run inference at lower precision, use GGUF. If you need high compression with near-original quality for storage and distribution, that's where DMX lives.

Without DMX	With DMX
Llama 3 70B: 140 GB download	~36 GB download
4-5 models on 1 TB	10+ models on 1 TB

Training & DevOps use cases

Beyond individual model compression, DMX's aligned quantization enables delta encoding between related model files — useful for training infrastructure and model distribution at scale.

Use Case	How DMX Helps
Checkpoint storage	Delta-compress consecutive checkpoints (87.3% measured savings on GPT-2, validated on TinyLlama 1.1B). Both near-lossless (int16) and practically lossless (int32, error below FP32 noise floor) modes available.
Model distribution	Distribute fine-tune variants as small deltas from a shared base model
Crash recovery	Smaller checkpoints = faster reload from storage after GPU failure
Model versioning	Aligned integer space enables meaningful diffs between model versions

Why not just use existing versioning tools?

Every existing ML versioning tool treats model files as opaque blobs:

Tool	Version tracking	Understands weight structure	Delta between versions
Git LFS / DVC	✓	✗	✗ (full copy each version)
HuggingFace Hub	✓	✗	✗ (full copy each version)
W&B / MLflow	✓	✗	✗ (full copy each version)
xdelta (binary diff)	✗	✗	8.5% savings (noise)
DMX	Planned	✓	80-87% savings

The difference: subtracting two model files in raw float produces noise (IEEE 754 bit layout destroys numerical proximity). DMX's aligned quantization creates a coordinate system where subtraction produces clean, sparse integers — enabling meaningful diffs, efficient deltas, and 80-87% compression between related models.

These capabilities are under active development. See Research Directions for details and experimental results.

Key Properties

Up to 80% compression on FP32 models (SVD-XT: 9.1 GB -> 1.8 GB, verified roundtrip)
60-74% compression on FP16 models (Llama 3 8B, Mistral 7B, Wan 1.3B)
55-60% near-lossless compression on FP32 models (GPT-2, Phi-2 — +0.12-0.22% PPL)
GPU-accelerated decompression: 13.8x faster than CPU with --gpu flag
Tested on: LLMs (GPT-2, Llama 3, TinyLlama), diffusion (Wan, SVD-XT), encoder-decoder (T5)
No training required: pure post-training compression, works on any pretrained model

How It Works

BFP Mode (for FP16/BF16 models — recommended)

Standard FP16:  16 bits per weight (5-bit exponent wasted on unused dynamic range)
DMX BFP:        ~7 bits per weight (shared exponent per group + truncated mantissa + entropy coding)

Trained weights cluster in a narrow magnitude range — 74% use only 3 of 31 possible exponents. DMX shares one exponent per group of 32 values, eliminating wasted dynamic range, then entropy-codes the mantissa stream.

int16 Mode (for FP32 models — near-lossless)

Standard FP32:  32 bits per weight
DMX int16:      ~13 bits per weight (aligned cross-layer quantization + entropy coding)

Integer quantization as a preprocessing step (not a lossy final format) transforms float weights into a representation where entropy coding is effective. Aligned cross-layer quantization enforces a global coordinate system across layers, enabling structured compression.

Installation

pip install dmx-compress

Or from source:

git clone https://github.com/willjriley/dmx.git && cd dmx && pip install -e .

Requirements: Python 3.10+, PyTorch 2.0+. GPU (CUDA) is optional — automatically used when available for faster compression and decompression.

Quick Start

# Compress any safetensors model (auto-detects FP16 vs FP32)
dmx compress model.safetensors model.dmx --mode auto

# Practically lossless compression (FP32 models — error below FP32 noise floor)
dmx compress model.safetensors model.dmx --mode int32

# Decompress back to safetensors (auto-uses GPU if available)
dmx decompress model.dmx model.safetensors

# Verify roundtrip quality (with JSON report)
dmx verify model.safetensors model.dmx --report verify.json

# View compression info
dmx info model.dmx

Delta compression (checkpoint / model versioning)

# Delta-compress a checkpoint against a base (near-lossless, ~87% savings)
dmx delta-compress base.safetensors checkpoint.safetensors delta.dmxd

# Practically lossless delta (error below FP32 noise floor, ~87% savings)
dmx delta-compress base.safetensors checkpoint.safetensors delta.dmxd --precision int32

# Reconstruct checkpoint from base + delta
dmx delta-reconstruct base.safetensors delta.dmxd restored.safetensors

# View delta file info (sparsity, compression, per-component breakdown)
dmx delta-info delta.dmxd

Example: Compress and verify a model from HuggingFace

# Download a model
pip install huggingface_hub
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir ./wan_model

# Compress it
dmx compress ./wan_model/model.safetensors wan_compressed.dmx

# Decompress and verify
dmx verify ./wan_model/model.safetensors wan_compressed.dmx --report report.json

Decompression Speed

Model	Mode	CPU	GPU (--gpu)	Speedup
Wan 1.3B	BFP	185s	13.4s	13.8x
SVD-XT	BFP	281s	22.3s	12.5x
SVD-XT	int16	10.5s	--	CPU-bound

Benchmarked on RTX 4090 Laptop, Python 3.13. BFP CPU bottleneck is numpy bit manipulation; GPU path uses PyTorch CUDA ops. A native C/CUDA decoder would be 10-50x faster still.

Benchmarks

BFP Mode (FP16 models)

Model	Type	Original	DMX	Savings	Quality
Llama 3 8B	LLM	16 GB	~7.2 GB	55%	+0.16% PPL (wikitext-2)
Wan 2.2 shard	Video	7.2 GB	1.5 GB	79.5%	142/142 tensors pass
Wan 1.3B	Diffusion	2.7 GB	1.1 GB	60%	825/825 tensors pass
SVD-XT	Video	9.1 GB	1.8 GB	80%	Verified roundtrip

Note: SVD-XT 80% includes FP32->FP16 conversion. Wan 2.2 79.5% is on FP32 source with BFP.

BFP Quality-per-Bit (Llama 3 8B, wikitext-2, 289K tokens)

Config	Bits/Weight	Perplexity	vs FP16
FP16 baseline	16.0	5.4958	--
BFP(M=8)	9.25	5.4964	+0.01%
BFP(M=7)	8.25	5.4973	+0.03%
BFP(M=6)	7.25	5.5045	+0.16%
GGUF Q8_0 (ref)	8.50	~5.55-5.58	~1.0-1.5% (different purpose — inference format)

int16 Mode (FP32 models)

Model	Type	Original	DMX	Savings	PPL Change
SVD-XT	Video	8.9 GB	4.0 GB	55.5%	Lossless
GPT-2	LLM	475 MB	201 MB	57.7%	+0.22%
Phi-2	LLM	10.6 GB	4.2 GB	60.1%	+0.12%

Why DMX beats generic compression

Method	Bits/value	Notes
gzip on safetensors	~15.5	Raw floats look like noise
zstd level 19	14.06	Dictionary matching, no prediction
DMX int16 + entropy	11.45	Aligned quantization enables structured entropy coding
DMX BFP + zstd	~4.2	Shared exponent eliminates wasted dynamic range

Pre-Compressed Models (Try It Now)

Download DMX-compressed models and decompress them yourself:

Model	Original	DMX	Savings	Verified	Link
Wan 1.3B (Diffusion)	2.7 GB	1.1 GB	60%	825/825 tensors	Download
Wan 2.2 14B Shard 6	7.2 GB	1.5 GB	79.5%	142/142 tensors	Download
SVD-XT (Video)	9.1 GB	1.8 GB	80%	Roundtrip verified	Download

Each includes a JSON verification report with SHA-256 hashes and per-tensor cosine similarity scores.

Format Specification

See spec/dmx_spec_v1.md for the complete format specification.

Paper

DMX: Delta Multiplexed Compression for Neural Network Model Weights (PDF) — click to download

Background

DMX is based on the principle that floating-point weights should be transformed into multiple statistically distinct, independently modeled entropy domains prior to compression. Trained neural network weights exhibit extreme exponent clustering — 74% of FP16 values use only 3 of 31 possible exponents, wasting 2.4 bits per value. DMX decomposes the floating-point representation into separate exponent and mantissa streams, each with distinct statistical properties that benefit from independent entropy coding. For FP32 models, aligned cross-layer quantization enforces a global coordinate system across layers, enabling additional integer-domain compression. The format auto-profiles each model to select the optimal compression strategy per component.

Validated Results: Checkpoint Delta Compression

All results are measured on real data using an NVIDIA A100-SXM4-80GB. Scripts are in experiments/checkpoint_delta/.

Compression across architectures

Model	Architecture	Params	Consecutive Delta Zeros	Entropy (bits)	Measured Savings
GPT-2	Decoder-only	163M	33-67%	1.76-3.02	87.3% (measured, 498→63 MB)
T5-small	Encoder-decoder	110M	89-94%	0.49-0.85	Not yet measured in bytes
TinyLlama	Decoder-only	1.1B	—	—	80% (measured, fine-tune base→chat)

Delta compression works across model architectures. T5 encoder-decoder shows higher sparsity than decoder-only models. Real-byte compression for T5 is pending.

Precision tiers

Both tiers achieve comparable compression — the aligned quantization produces similar entropy regardless of bit width:

Tier	Consecutive Entropy	Compression	Error	Use Case
int16 aligned	0.6-1.3 bits	87%	+0.06% RelL2	Maximum compression
int32 aligned	1.0-1.2 bits	~87%	1.87e-7 RelL2	Practically lossless (error below FP32 noise floor)
Raw bit XOR (no alignment)	14-16 bits	8.5%	Bit-exact	Baseline — alignment is essential

Full checkpoint including optimizer states

Training checkpoints include model weights + Adam optimizer states (momentum + variance), typically 3x the weight size. Validated on GPT-2 124M, 1000 training steps:

Component	% of Checkpoint	Delta Sparsity	Entropy	Compression
Weights	33%	55-66% zeros	1.8-2.6 bits	~84%
Momentum (exp_avg)	33%	28-30% zeros	7.5-9.0 bits	~53%
Variance (exp_avg_sq)	33%	91-92% zeros	0.6 bits	~96%
Full checkpoint	100%	—	—	~79%

Safety for training resumption

Training from a DMX-reconstructed checkpoint produces 0.042% loss difference compared to the original — negligible for any practical purpose:

Step |   Original |  DMX Recon |       Diff
    1 |   0.783582 |   0.784023 | 0.00044072
   51 |   1.098088 |   1.098552 | 0.00046420
   91 |   0.537082 |   0.537364 | 0.00028241

Final avg loss (last 20 steps): 0.042% difference

Zero error accumulation in delta chains (Test 8)

Chained reconstruction (base → delta1 → delta2 → ... → deltaN) produces identical results to direct reconstruction (base + deltaN) — verified to 10 decimal places across both int16 and int32 modes. This is not an approximation: delta application is exact integer arithmetic, so error is mathematically constant regardless of chain length. Re-anchoring is needed only for delta size control, never for error control.

Fine-tune variant compression

TinyLlama 1.1B base → chat fine-tune: 80% savings (876 MB delta vs 4.4 GB full copy). Store the base model once, distribute each fine-tune variant as a small delta.

Projected savings at frontier scale

These are projections extrapolated from observed sparsity and scaling behavior across GPT-2 (163M), T5 (110M), and TinyLlama (1.1B). The underlying property — small per-step weight updates due to SGD dynamics — is scale-invariant, but real-byte validation at 70B+ scale is in progress.

Scenario	Raw Storage	Projected DMX	Projected Savings
200 checkpoints of Llama 70B (weights only)	28 TB	~3.6 TB	~87%
200 checkpoints of Llama 70B (full w/ optimizer)	84 TB	~18 TB	~79%
20 fine-tune variants of Llama 70B	2.8 TB	~700 GB	~75%

Caveats: Validation on 8B+ models with frontier training schedules is in progress. Momentum compression (53%) was measured on wikitext-2; diverse training data may yield 40-45%, reducing full-checkpoint savings to ~70-73%. The 87% weight compression is measured on GPT-2; larger models may differ.

Research Directions

DMX's underlying compression technique applies to structured floating-point data beyond individual model files. These are active research areas, not yet proven at scale. We welcome collaboration.

1. Training checkpoint compression (highest priority). Frontier training produces hundreds of near-identical high-precision checkpoints. Aligned cross-layer quantization enables efficient delta encoding between them. Early results are in the Validated Results section above. Key finding: alignment is critical — without it, deltas show almost no sparsity and compress poorly.

2. Model family distribution. Storing fine-tuned variants (chat, code, reasoning, etc.) as small deltas from a shared base model. Early result: TinyLlama base → chat = 80% savings (876 MB vs 4.4 GB).

3. Scientific and sensor data. Early tests on NOAA weather data show similar exponent clustering, suggesting potential applications in climate, seismic, and satellite data.

License & Patent

Code: MIT License — free to use, modify, and distribute.

Methods: Patent Pending (U.S. Provisional Applications filed April 2026). The patented methods cover aligned cross-layer quantization for neural network weight compression and stream-separated block floating point encoding with independent entropy coding. Personal, academic, and open-source use is unrestricted. Commercial use of the patented methods may require a license from the inventor — contact bill.riley@gmail.com.

Citation

@software{riley2026dmx,
  author = {Riley, William J},
  title = {DMX: Delta Multiplexed Model Format},
  year = {2026},
  url = {https://github.com/willjriley/dmx}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.3 yanked

Apr 25, 2026

1.5.2 yanked

Apr 22, 2026

1.4.6 yanked

Apr 22, 2026

1.4.5 yanked

Apr 22, 2026

1.3.5 yanked

Apr 22, 2026

1.2.0 yanked

Apr 20, 2026

1.1.0 yanked

Apr 19, 2026

1.0.0 yanked

Apr 19, 2026

0.7.0 yanked

Apr 11, 2026

0.6.0 yanked

Apr 10, 2026

0.5.0 yanked

Apr 9, 2026

0.4.0 yanked

Apr 9, 2026

This version

0.3.0 yanked

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmx_compress-0.3.0.tar.gz (35.0 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dmx_compress-0.3.0-py3-none-any.whl (26.5 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file dmx_compress-0.3.0.tar.gz.

File metadata

Download URL: dmx_compress-0.3.0.tar.gz
Upload date: Apr 4, 2026
Size: 35.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for dmx_compress-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`fa11d5e03341d4743751d43e8a219b31172347383c9dceeba3abb7065264b33e`
MD5	`291c804cec785c40486bb8d5adb67bcc`
BLAKE2b-256	`e56e1953c3e673a99b9c845b6136eb00233bc9136f6cecfc575a877f15ed879e`

See more details on using hashes here.

File details

Details for the file dmx_compress-0.3.0-py3-none-any.whl.

File metadata

Download URL: dmx_compress-0.3.0-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 26.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for dmx_compress-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c238d812ca5e16dc6819ceece5f791d27a9c026d8bacf380711332232078932b`
MD5	`ba57a40409ecd7f9c8871e97bfaebffa`
BLAKE2b-256	`204917c27d64b65cea77a30502f20288b6d8152e6094a6603975875dfb67a83c`

See more details on using hashes here.

dmx-compress 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DMX — Delta Multiplexed Model Format

Try it now

Download pre-compressed models now

Why this matters for frontier training

What is DMX?

Storage and transfer comparison

Training & DevOps use cases

Why not just use existing versioning tools?

Key Properties

How It Works

BFP Mode (for FP16/BF16 models — recommended)

int16 Mode (for FP32 models — near-lossless)

Installation

Quick Start

Delta compression (checkpoint / model versioning)

Example: Compress and verify a model from HuggingFace

Decompression Speed

Benchmarks

BFP Mode (FP16 models)

BFP Quality-per-Bit (Llama 3 8B, wikitext-2, 289K tokens)

int16 Mode (FP32 models)

Why DMX beats generic compression

Pre-Compressed Models (Try It Now)

Format Specification

Paper

Background

Validated Results: Checkpoint Delta Compression

Compression across architectures

Precision tiers

Full checkpoint including optimizer states

Safety for training resumption

Zero error accumulation in delta chains (Test 8)

Fine-tune variant compression

Projected savings at frontier scale

Research Directions

License & Patent

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes