Structure-aware neural network weight compression. 87% checkpoint delta encoding, not generic file compression.

These details have not been verified by PyPI

Project links

Project description

DMX — Delta Multiplexed Model Format

Structure-aware compression for neural networks.

DMX transforms weight tensors into aligned integer representations, enabling efficient storage and distribution of model variants. The headline capability is 55-87% compression of safetensors / checkpoint files / checkpoint deltas with practically lossless reconstruction.

Safe for production training. Resuming from DMX-reconstructed checkpoints produces 0.15% loss difference after 50 chain resumes over 10,000 training steps — verified on GPU with reproducible scripts. Delta chains use exact integer arithmetic with zero error accumulation regardless of chain length.

Original:  9.1 GB  (SVD-XT, FP32 — 80% includes FP32→FP16 conversion)
DMX:       1.8 GB

Original:  7.2 GB  (Wan 2.2 14B shard, FP32 — 79.5% includes FP32→FP16)
DMX:       1.5 GB  (142/142 tensors verified)

Original:  16 GB   (Llama 3 8B, FP16 — 55% pure FP16 compression)
DMX:       ~7.2 GB (+0.16% perplexity on wikitext-2)

Try it now

pip install dmx-compress
dmx compress your_model.safetensors compressed.dmx

Download pre-compressed models

Model	Original	DMX	Savings	Verified
Wan 1.3B	2.7 GB	1.1 GB	60%	825/825 tensors
Wan 2.2 shard	7.2 GB	1.5 GB	79.5%	142/142 tensors
SVD-XT	9.1 GB	1.8 GB	80%	Roundtrip verified

What is DMX?

DMX is a structure-aware compression system for neural networks. It reduces model file sizes by 55-80% while preserving quality (+0.03-0.16% perplexity), with reversible decompression back to the original format.

DMX also supports delta-based model storage with deterministic reconstruction and ROI-driven adaptive rebasing — enabling efficient versioning across model families.

No retraining required — compress any pretrained safetensors model
Reversible — decompress back to the original format
Broad compatibility — tested on LLMs, diffusion models, video models, encoder-decoder

DMX reduces the cost of storing, moving, and resuming large models — without breaking training.

Compression & Delta Storage

Capability	Evidence
Single-file compression (55-80%)	6+ models, Llama 3 8B through SVD-XT
Checkpoint delta chains (87%)	GPT-2, T5, TinyLlama, Qwen 3B
Full checkpoint w/ optimizer (79%)	GPT-2 1000-step, weights + momentum + variance
Zero chain accumulation error	Exact integer arithmetic, 10K steps / 50 resumes
Fine-tune variant distribution (80%)	Qwen 2.5 3B delta on HuggingFace

How It Works

BFP Mode (for FP16/BF16 models — recommended)

Standard FP16:  16 bits per weight (5-bit exponent wasted on unused dynamic range)
DMX BFP:        ~7 bits per weight (shared exponent per group + truncated mantissa + entropy coding)

Trained weights cluster in a narrow magnitude range — 74% use only 3 of 31 possible exponents. DMX shares one exponent per group of 32 values, eliminating wasted dynamic range, then entropy-codes the mantissa stream.

int16 Mode (for FP32 models — near-lossless)

Standard FP32:  32 bits per weight
DMX int16:      ~13 bits per weight (aligned cross-layer quantization + entropy coding)

Integer quantization as a preprocessing step (not a lossy final format) transforms float weights into a representation where entropy coding is effective. Aligned cross-layer quantization enforces a global coordinate system across layers, enabling structured compression.

Adaptive per-tensor compression

DMX automatically picks the best compression for each tensor in your model — you don't choose a compressor, DMX does, per tensor, every time. Each tensor gets the strongest compression the candidate set can deliver, capturing the maximum benefit available without any manual tuning.

Actual savings depend on model architecture, source precision (FP16 / BF16 / FP32), and the quantization mode you select. Across the model families we have measured, savings typically fall in the 50–80% range vs the original safetensors file, with no manual tuning required.

Why DMX beats generic compression

Method	Bits/value	Notes
gzip on safetensors	~15.5	Raw floats look like noise
zstd level 19	14.06	Dictionary matching, no prediction
DMX int16 + entropy	11.45	Aligned quantization enables structured entropy coding
DMX BFP + zstd	~4.2	Shared exponent eliminates wasted dynamic range

Installation

pip install dmx-compress

Or from source:

git clone https://github.com/willjriley/dmx.git && cd dmx && pip install -e .

Requirements: Python 3.10+, PyTorch 2.0+. GPU (CUDA) is optional — automatically used when available for faster compression and decompression.

Quick Start

# Compress any safetensors model (auto-detects FP16 vs FP32)
dmx compress model.safetensors model.dmx --mode auto

# Practically lossless compression (FP32 models — error below FP32 noise floor)
dmx compress model.safetensors model.dmx --mode int32

# Compress with explicit parallel encoding (defaults to min(8, cpu_count) on CPU,
# 1 on GPU). zstd releases the GIL so threads give real parallelism.
dmx compress model.safetensors model.dmx --parallel-workers 8

# Decompress back to safetensors (auto-uses GPU if available)
dmx decompress model.dmx model.safetensors

# Verify roundtrip quality (with JSON report)
dmx verify model.safetensors model.dmx --report verify.json

# View compression info
dmx info model.dmx

Delta compression (checkpoint / model versioning)

# Delta-compress a checkpoint against a base (near-lossless, ~87% savings)
dmx delta-compress base.safetensors checkpoint.safetensors delta.dmxd

# Practically lossless delta (error below FP32 noise floor, ~87% savings)
dmx delta-compress base.safetensors checkpoint.safetensors delta.dmxd --precision int32

# Reconstruct checkpoint from base + delta
dmx delta-reconstruct base.safetensors delta.dmxd restored.safetensors

# View delta file info (sparsity, compression, per-component breakdown)
dmx delta-info delta.dmxd

Chain compression (training-run checkpoints, every-N-step cadences)

DMX chain compression takes a sequence of related checkpoints (training run, fine-tune steps, branch variants) and stores them as one or more anchors plus deltas, with an automatic anchor-promotion policy that keeps the chain mathematically guaranteed to be no larger than storing each checkpoint with dmx compress independently.

# Chain-compress a sequence of checkpoints into one output directory
dmx chain-compress step_1000.safetensors step_2000.safetensors step_3000.safetensors \
    --output-dir ./compressed_chain

# Reconstruct every checkpoint in the chain back to safetensors
dmx chain-reconstruct ./compressed_chain --output-dir ./restored

# Reconstruct only specific entries by index
dmx chain-reconstruct ./compressed_chain --output-dir ./restored --indices 0 2

The auto-anchor policy promotes a checkpoint to a fresh anchor whenever its delta would be larger than re-encoding the checkpoint from scratch, so the chain is self-calibrating across source dtypes and cadences. No manual tuning required.

Example: Compress and verify a model from HuggingFace

# Download a model
pip install huggingface_hub
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir ./wan_model

# Compress it
dmx compress ./wan_model/model.safetensors wan_compressed.dmx

# Decompress and verify
dmx verify ./wan_model/model.safetensors wan_compressed.dmx --report report.json

Benchmarks

Storage and transfer comparison

How a 140 GB model (Llama 3 70B, FP16) compares across compression approaches:

Method	Compressed Size	Savings	Quality Loss	Purpose
safetensors	140 GB	0%	None	Original format
gzip	~134 GB	~4%	None	Generic compression (barely helps on floats)
zstd-19	~129 GB	~8%	None	Better generic compression (still limited)
DFloat11	~98 GB	~30%	None (lossless)	Lossless NN weight compression
ZipNN	~94 GB	~33%	None (lossless)	Lossless NN weight compression
DMX M=7	~63 GB	~55%	+0.03% PPL	Near-original quality, high compression
DMX M=6	~56 GB	~60%	+0.16% PPL	Aggressive storage compression

For reference, quantized inference formats like GGUF Q8 (~50%) and Q4 (~75%) achieve similar or greater compression but are designed for a different purpose — running models directly at reduced precision with fused kernels. DMX and GGUF serve different needs and are not interchangeable.

If lossless is enough, use DFloat11 or ZipNN. If you need to run inference at lower precision, use GGUF. If you need high compression with near-original quality for storage and distribution, that's where DMX lives.

Without DMX	With DMX
Llama 3 70B: 140 GB download	~36 GB download
4-5 models on 1 TB	10+ models on 1 TB

BFP Mode (FP16 models)

Model	Type	Original	DMX	Savings	Quality
Llama 3 8B	LLM	16 GB	~7.2 GB	55%	+0.16% PPL (wikitext-2)
Wan 2.2 shard	Video	7.2 GB	1.5 GB	79.5%	142/142 tensors pass
Wan 1.3B	Diffusion	2.7 GB	1.1 GB	60%	825/825 tensors pass
SVD-XT	Video	9.1 GB	1.8 GB	80%	Verified roundtrip

Note: SVD-XT 80% includes FP32->FP16 conversion. Wan 2.2 79.5% is on FP32 source with BFP.

BFP Quality-per-Bit (Llama 3 8B, wikitext-2, 289K tokens)

Config	Bits/Weight	Perplexity	vs FP16
FP16 baseline	16.0	5.4958	--
BFP(M=8)	9.25	5.4964	+0.01%
BFP(M=7)	8.25	5.4973	+0.03%
BFP(M=6)	7.25	5.5045	+0.16%
GGUF Q8_0 (ref)	8.50	~5.55-5.58	~1.0-1.5% (different purpose — inference format)

int16 Mode (FP32 models)

Model	Type	Original	DMX	Savings	PPL Change
SVD-XT	Video	8.9 GB	4.0 GB	55.5%	Lossless
GPT-2	LLM	475 MB	201 MB	57.7%	+0.22%
Phi-2	LLM	10.6 GB	4.2 GB	60.1%	+0.12%

Decompression Speed

Model	Mode	CPU	GPU (--gpu)	Speedup
Wan 1.3B	BFP	185s	13.4s	13.8x
SVD-XT	BFP	281s	22.3s	12.5x
SVD-XT	int16	10.5s	--	CPU-bound

Benchmarked on RTX 4090 Laptop, Python 3.13. GPU path uses PyTorch CUDA ops.

Native CUDA kernels are available in kernel/dmx_kernels_v2.cu — 12 kernels covering the full compression and decompression pipeline (quantize, delta compute, BFP compress/decompress, dequantize, delta apply). Compiled and tested on A100. int32 roundtrip error: 9.3e-10.

Why DMX Matters for Training

Frontier training runs are $50M-$200M+ each. Checkpoint storage, bandwidth, and crash recovery are recurring operational costs that compound across every experiment and team. DMX addresses these directly.

What DMX enables

Safe resumption from compressed checkpoints — 0.15% loss difference after 50 chain resumes over 10K training steps
87% checkpoint storage reduction — 200 checkpoints of Llama 70B: ~28 TB raw → ~3 TB (projected)
Full checkpoint compression including optimizer — weights + momentum + variance: 79% savings (measured)
Dense checkpoint history — save 5-10x more often without the storage penalty
Fine-tune distribution — store base model once, each variant as a small delta (80% savings)
Weight-shift analytics — per-layer diffs show exactly what changed between any two checkpoints

No other tool does all of this

Tool	Delta between versions	Chain safety demonstrated
DMX	87% savings (structure-aware)	0.15% loss diff after 50 resumes / 10K steps
ZipNN	XOR-based delta (~44% savings)	Not published
DFloat11	✗ (per-file only, ~30%)	N/A
Git LFS / DVC	✗ (full copy each version)	N/A
HuggingFace Hub	✗ (full copy each version)	N/A
W&B / MLflow	✗ (full copy each version)	N/A
xdelta (binary diff)	~8.5% savings	Not published

Byte-level delta tools (xdelta, ZipNN XOR) operate on raw float bits, where IEEE 754 layout destroys numerical proximity. DMX produces dramatically sparser deltas (87% vs 44%) by encoding in a structure-aware representation where similar values map to similar integers.

The operational impact

Aspect	Current Status Quo	With DMX	Benefit
Checkpoint frequency	Sparse (forced by cost)	Dense and safe	Better science and debugging
Storage for 200 ckpts (70B)	~28 TB	~3 TB (projected)	~9x reduction
Crash recovery	Reload full checkpoint	Reload small delta	Minutes instead of hours
Fine-tune distribution	Full copy per variant	Small delta per variant	80% savings (measured)
Experimentation	Branching is expensive	Branch via small delta	5-10x more experimental forks

DMX transforms checkpoint management into an operational advantage. It enables safe, multi-step training resumptions, preserves per-layer diffs, and drastically reduces the storage cost of model snapshots. Engineers and researchers gain usable model history that was previously impractical, minimizing wasted GPU time, improving training continuity, and lowering cloud storage costs.

Efficient model distribution with deltas

DMX enables a new distribution model: send the base model once, then distribute only small aligned deltas for every variant.

Llama 70B base:          140 GB  (stored/downloaded once)
  → chat fine-tune:       ~28 GB  (delta only)
  → code fine-tune:       ~28 GB  (delta only)
  → medical fine-tune:    ~28 GB  (delta only)

Traditional: 4 × 140 GB = 560 GB
With DMX:    140 + 3 × 28 = 224 GB  (60% savings)

This applies to model hubs (HuggingFace, CivitAI), enterprise model management, and any workflow where multiple variants share a common base. Reconstruction from deltas is verified safe across 10K-step training chains (0.15% loss difference after 50 resumes).

Where this matters today (estimated scale):

Platform	Hosted Models	Est. Fine-Tunes	Redundant Storage	Potential Savings with Deltas
HuggingFace	800K+	~500K (est. 60%)	Petabytes of duplicated base weights	~60-80% bandwidth reduction
CivitAI	100K+	Tens of thousands of SD variants	Each a full 2-4 GB copy of SD base	~80% per variant
Enterprise (per company)	10-100 variants	Per-customer or per-use-case fine-tunes	Full copy per deployment	~80% storage per variant

Estimates based on public model counts and observed fine-tune ratios. Actual savings depend on how much each fine-tune diverges from its base.

Validated: Qwen 2.5 3B model family

Measured on real HuggingFace models — reconstructable delta available:

Qwen/Qwen2.5-3B (base)           13.6 GB — stored once
  → Qwen2.5-3B-Instruct          2.88 GB delta (78.8% savings)
  → Qwen2.5-Coder-3B             5.60 GB delta (58.8% savings — fork, heavier retrain)

Variant	int16 Zeros	int16 Savings	int32 Savings	RelL2 from Base
Instruct (SFT+RLHF)	29.2%	90.7%	67.7%	0.014
Coder (domain retrain)	0.2%	58.8%	14.9%	0.828

The Coder variant has diverged significantly from the base (RelL2 = 0.83). When a variant drifts this far, DMX supports auto-forking — promoting it to a new base and restarting the delta chain. Coder → Coder-Instruct would delta efficiently from the Coder anchor.

Reconstruction quality (verified roundtrip):

Method	Precision Loss	Industry Acceptance
FP32 → FP16 conversion	Measurable (~1e-3)	Standard practice everywhere
GGUF Q8 quantization	~1% PPL increase	Widely deployed in production
DMX int16 delta	+0.06% RelL2	Less loss than FP32→FP16
DMX int32 delta	1.87e-7 RelL2	Below FP32 arithmetic noise

Try the distribution workflow yourself:

pip install dmx-compress

# Download base + delta from HuggingFace (base: 13.6 GB, delta: 2.9 GB)
huggingface-cli download Senat1/dmx-qwen2.5-3b-instruct-delta --local-dir ./qwen-delta

# Reconstruct the full Instruct model from base + delta
dmx delta-reconstruct ./qwen-delta/qwen2.5-3b-base.safetensors ./qwen-delta/instruct.dmxd qwen2.5-3b-instruct.safetensors

If you already have the base model locally, you only need the 2.9 GB delta — not the full 13.6 GB Instruct model.

DMX enables multi-million-dollar savings in storage and bandwidth for hubs and enterprises that maintain many fine-tuned model variants, because only small deltas need to be stored and transmitted instead of full checkpoints.

Validated Results: Checkpoint Delta Compression

All results are measured on real data using an NVIDIA A100-SXM4-80GB. Full result data is in benchmarks/.

Compression across architectures

Model	Architecture	Params	Consecutive Delta Zeros	Entropy (bits)	Measured Savings
GPT-2	Decoder-only	163M	33-67%	1.76-3.02	87.3% (measured, 498→63 MB)
T5-small	Encoder-decoder	110M	89-94%	0.49-0.85	Not yet measured in bytes
TinyLlama	Decoder-only	1.1B	16-63%	1.69-3.73	80% (measured, fine-tune base→chat)

Delta compression works across model architectures and scales. T5 encoder-decoder shows highest sparsity. TinyLlama 1.1B confirms the pattern holds at scale — sparsity increases as training progresses (16% → 63% zeros). int32 aligned entropy matches int16 at all scales tested (1.71 vs 1.69 bits at 1.1B).

Precision tiers

Both tiers achieve comparable compression — the aligned quantization produces similar entropy regardless of bit width:

Tier	Consecutive Entropy	Compression	Error	Use Case
int16 aligned	0.6-1.3 bits	87%	+0.06% RelL2	Maximum compression
int32 aligned	1.0-1.2 bits	~87%	1.87e-7 RelL2	Practically lossless (error below FP32 noise floor)
Raw bit XOR (no alignment)	14-16 bits	8.5%	Bit-exact	Baseline — alignment is essential

Full checkpoint including optimizer states

Training checkpoints include model weights + Adam optimizer states (momentum + variance), typically 3x the weight size. Validated on GPT-2 124M, 1000 training steps:

Component	% of Checkpoint	Delta Sparsity	Entropy	Compression
Weights	33%	55-66% zeros	1.8-2.6 bits	~84%
Momentum (exp_avg)	33%	28-30% zeros	7.5-9.0 bits	~53%
Variance (exp_avg_sq)	33%	91-92% zeros	0.6 bits	~96%
Full checkpoint	100%	—	—	~79%

Safety for training resumption

Training from DMX-reconstructed checkpoints is safe for production use:

Test	Steps	DMX Resumes	Final Loss Diff	Result
Single resume (100 steps)	100	1	0.042%	Negligible
Long-run chain (10K steps)	10,000	50	0.15%	Production-safe

The 10K-step test reconstructed from a DMX delta chain every 200 steps — 50 total resumes over 10,000 training steps. Final loss tracks the clean baseline within 0.15%, with no divergence trend over time.

Zero error accumulation in delta chains (Test 8)

Chained reconstruction (base → delta1 → delta2 → ... → deltaN) produces identical results to direct reconstruction (base + deltaN) — verified to 10 decimal places across both int16 and int32 modes. This is not an approximation: delta application is exact integer arithmetic, so error is mathematically constant regardless of chain length. Re-anchoring is needed only for delta size control, never for error control.

Fine-tune variant compression

TinyLlama 1.1B base → chat fine-tune: 80% savings (876 MB delta vs 4.4 GB full copy). Store the base model once, distribute each fine-tune variant as a small delta.

Projected Savings at Scale

These projections are extrapolated from observed sparsity and scaling behavior on GPT-2 (163M), T5 (110M), and TinyLlama (1.1B). The core property — very small per-step weight updates under SGD — appears scale-invariant, but we are actively validating on 8B+ models with frontier-scale schedules.

Scenario	Raw Storage	Projected DMX	Projected Savings
200 checkpoints of Llama 70B (weights only)	28 TB	~3.6 TB	~87%
200 checkpoints of Llama 70B (full + optimizer)	84 TB	~18 TB	~79%
20 fine-tune variants of Llama 70B	2.8 TB	~700 GB	~75%

Key Caveats

Validation on 8B+ models with real frontier training schedules is in progress.
Optimizer state compression (currently ~53%) may drop to 40–45% on highly diverse data, reducing full-checkpoint savings to ~70–73%.
All projections assume continued zero error accumulation (exact integer arithmetic), as demonstrated in long-chain tests.

These numbers suggest DMX could reduce checkpoint storage and I/O pressure by nearly an order of magnitude while keeping training resumption safe.

Research Directions

Multi-framework integration — DeepSpeed, FSDP, and Megatron-LM callbacks for production training pipelines
Checkpoint-efficient continual learning — delta chains for long-running training with minimal storage overhead

We welcome collaboration — reach out via GitHub Issues or Discussions.

Format Specification

See spec/dmx_spec_v1.md for the complete format specification.

Paper

DMX: Delta Multiplexed Compression for Neural Network Model Weights (PDF) — click to download

Background

DMX is based on the principle that floating-point weights should be transformed into multiple statistically distinct, independently modeled entropy domains prior to compression. Trained neural network weights exhibit extreme exponent clustering — 74% of FP16 values use only 3 of 31 possible exponents, wasting 2.4 bits per value. DMX decomposes the floating-point representation into separate exponent and mantissa streams, each with distinct statistical properties that benefit from independent entropy coding. For FP32 models, aligned cross-layer quantization enforces a global coordinate system across layers, enabling additional integer-domain compression. The format auto-profiles each model to select the optimal compression strategy per component.

License & Patent

Code: MIT License — free to use, modify, and distribute.

Methods: Patent Pending (U.S. Provisional Applications filed April 2026). The patented methods cover aligned cross-layer quantization for neural network weight compression and stream-separated block floating point encoding with independent entropy coding. Personal, academic, and open-source use is unrestricted. Commercial use of the patented methods may require a license from the inventor — contact bill.riley@gmail.com.

Citation

@software{riley2026dmx,
  author = {Riley, William J},
  title = {DMX: Delta Multiplexed Model Format},
  year = {2026},
  url = {https://github.com/willjriley/dmx}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.3 yanked

Apr 25, 2026

1.5.2 yanked

Apr 22, 2026

1.4.6 yanked

Apr 22, 2026

1.4.5 yanked

Apr 22, 2026

1.3.5 yanked

Apr 22, 2026

1.2.0 yanked

Apr 20, 2026

1.1.0 yanked

Apr 19, 2026

1.0.0 yanked

Apr 19, 2026

0.7.0 yanked

Apr 11, 2026

0.6.0 yanked

Apr 10, 2026

0.5.0 yanked

Apr 9, 2026

This version

0.4.0 yanked

Apr 9, 2026

0.3.0 yanked

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmx_compress-0.4.0.tar.gz (55.4 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dmx_compress-0.4.0-py3-none-any.whl (45.6 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file dmx_compress-0.4.0.tar.gz.

File metadata

Download URL: dmx_compress-0.4.0.tar.gz
Upload date: Apr 9, 2026
Size: 55.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for dmx_compress-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`a1bf9644915cec6d907b7e2446fb23a7d01e58074ea55c2e0199a2e8118139c5`
MD5	`288fc65d9362027a5cfe91f3bedcb4fa`
BLAKE2b-256	`c4a117f6d7d6f3d22bf030bdceb0626c54c7d4485000ef6bd3de9446cd1a1f43`

See more details on using hashes here.

File details

Details for the file dmx_compress-0.4.0-py3-none-any.whl.

File metadata

Download URL: dmx_compress-0.4.0-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 45.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for dmx_compress-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e6ec28cd24713fc34c09cbf140ccc3f608f466526d8ae09978c725af54b9c15`
MD5	`f7911084eb055a992a3ac29d8a266ab0`
BLAKE2b-256	`03e7aac28f7a1cac370f69e6653dab800f1794e2395d053a1f31e528fb1a5a2a`

See more details on using hashes here.

dmx-compress 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DMX — Delta Multiplexed Model Format

Try it now

Download pre-compressed models

What is DMX?

Compression & Delta Storage

How It Works

BFP Mode (for FP16/BF16 models — recommended)

int16 Mode (for FP32 models — near-lossless)

Adaptive per-tensor compression

Why DMX beats generic compression

Installation

Quick Start

Delta compression (checkpoint / model versioning)

Chain compression (training-run checkpoints, every-N-step cadences)

Example: Compress and verify a model from HuggingFace

Benchmarks

Storage and transfer comparison

BFP Mode (FP16 models)

BFP Quality-per-Bit (Llama 3 8B, wikitext-2, 289K tokens)

int16 Mode (FP32 models)

Decompression Speed

Why DMX Matters for Training

What DMX enables

No other tool does all of this

The operational impact

Efficient model distribution with deltas

Validated: Qwen 2.5 3B model family

Validated Results: Checkpoint Delta Compression

Compression across architectures

Precision tiers

Full checkpoint including optimizer states

Safety for training resumption

Zero error accumulation in delta chains (Test 8)

Fine-tune variant compression

Projected Savings at Scale

Research Directions

Format Specification

Paper

Background

License & Patent

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes