Lossless compression for neural network weights. Bit-exact storage, content-addressable deltas, embedded provenance.
Project description
DMX
DevOps primitives for neural networks: lossless storage, content-addressable deltas, provenance manifests, and runtime derivation — one file format for the entire model lifecycle.
Contents
- What DMX is
- The gap DMX fills
- Storage: archival-grade lossless
- Deltas: efficient updates across versions
- Inference: runtime derivation
- Provenance: supply-chain visibility
- Beyond neural network weights
- Where DMX fits in ML DevOps
- Roadmap
- Status
- Technical details
- Quick start
- License and patent
- Citation
What DMX is
DMX is a file format and set of primitives for managing neural network artifacts the way the rest of IT has managed artifacts for decades. A single DMX file serves the entire model lifecycle: lossless storage for archival and distribution, content-addressable deltas for training chains and variant families, embedded provenance for supply-chain visibility, and runtime derivation for efficient inference across heterogeneous hardware.
The core idea is that lossless storage is the foundation. Once the stored weights are bit-exact, everything else — deltas between related checkpoints, efficient distribution of model variants, fast inference-time derivations, verifiable lineage — becomes straightforward and trustworthy. Quality tradeoffs belong at inference (where they're reversible by going back to the source), not at storage (where they corrupt the foundation every downstream operation depends on).
The gap DMX fills
Neural network artifact handling today looks nothing like modern software artifact handling. Multi-gigabyte files download as single blobs with no resume. Checkpoints duplicate full files instead of storing deltas. Variants of the same base model re-duplicate most of the base across every fork. There's no content-addressable storage, no standard provenance tracking, no built-in supply-chain verification. A new model variant means a new full download.
The rest of IT solved these problems decades ago. Resumable downloads, content-addressable storage, delta updates, dedup across related artifacts, signed provenance, versioned registries — these have been standard DevOps practice since the 1990s and early 2000s. ML adopted none of it.
This gap isn't because the problems don't exist in ML. It's because the file formats (safetensors, pickle) are blob-oriented with no structural awareness, and the infrastructure grew up inside research contexts where "download the weights file" was the whole workflow. With models now reaching hundreds of billions of parameters, the gap has become operationally expensive.
DMX is the set of primitives that closes this gap. Each DMX feature maps onto a standard DevOps practice:
- Archival integrity → lossless storage with content-addressable hashes
- Efficient updates → bit-exact deltas between related versions
- Supply-chain visibility → embedded provenance manifests
- Flexible deployment → runtime derivation to hardware-appropriate representations
- Registry compatibility → fits into existing distribution infrastructure
The sections below describe each primitive, with measured data on current behavior.
Storage: archival-grade lossless
DMX stores weights losslessly. When you save a model as DMX, the bits you get back on load are identical to the bits you put in. This is bit-exact, not "practically lossless," not "within noise tolerance" — the reconstructed tensors pass strict bitwise equality against the originals, including for NaN and infinity values.
Why this matters operationally: archival integrity. A model stored losslessly can be reproduced exactly years later. Training can resume from any checkpoint with bit-identical weights. Delta chains can extend indefinitely without error accumulation. Downstream consumers can verify they received what the source produced. None of this works if the storage foundation is lossy.
Measured compression
| Source format | Typical savings | Status |
|---|---|---|
| FP32 weights | ~16% | Measured on GPT-2 124M (byte transposition + zstd, bit-exact verified) |
| FP16 weights | ~24% | Measured on Qwen 1.5B (byte transposition + zstd, bit-exact verified) |
| BF16 weights | Measurement pending | — |
The range on FP16 reflects a real phenomenon: the compressibility of weights depends on how they were trained and downcast, not just their stored format. Weights trained natively in FP16 or from a careful BF16→FP16 downcast compress differently from weights stored at low bit-widths after mixed-precision training.
Why the numbers look modest
Neural network weights are more compressible at the top of their bit representation (where structure lives) than at the bottom (where noise lives). Lossless compression can only exploit the structured part. The remaining bits are information-theoretically incompressible without losing precision.
The savings DMX achieves are at the ceiling of what's possible without giving up bit-exactness, not at the ceiling of what aggressive lossy compression could reach. This is by design — the storage tier is the archival foundation, and it has to be exact.
For a deeper explanation of why these numbers are what they are, see the technical documentation (coming soon).
Deltas: efficient updates across versions
DMX supports lossless deltas between related DMX files. Given two DMX files representing related states of the same model — two training checkpoints, a base and a fine-tune, a model and one of its forks — DMX can compute a delta that reconstructs the target from the base exactly, with no floating-point error accumulation regardless of chain length or tree depth.
Why this matters operationally: the same delta mechanism that powers git and rsync applied to model weights. Training runs store checkpoint chains instead of duplicated full files. Model registries store base-plus-deltas instead of full copies per variant. Users download only the changes when updating. A workflow pattern familiar from every other artifact class, finally available for ML.
Deltas apply to two distinct use cases: training chains (time-ordered checkpoints during a training run) and fork/variant families (tree-ordered derivatives of a common base). The underlying mechanism is the same; the economics and workflows are different.
Training chains
A training run that checkpoints every N steps produces a time-ordered sequence of closely related model states. DMX stores the first checkpoint in full and each subsequent checkpoint as a delta from the previous one. Reconstructing any specific checkpoint is bit-exact — the same weights the training loop saved.
Validated measurements:
| Scenario | Source | Compression per delta |
|---|---|---|
| Training checkpoints (10-1000 steps apart) | GPT-2 FP32 | 35-43%, distance-dependent |
| Training checkpoints | FP16 sources | Measurement pending |
| Training checkpoints | BF16 sources | Measurement pending |
A finding worth noting: closer checkpoints compress better than distant ones. A 10-step delta compresses at 43%; a 1000-step delta at 35%. Frequent checkpointing — which is also safer and more debuggable — costs less per checkpoint under DMX than under full-file storage. The format rewards good checkpointing discipline rather than penalizing it.
Chain correctness: Delta reconstruction uses integer arithmetic with exact round-tripping. Applying a chain of deltas produces the same result as applying each delta independently. There is no error accumulation across chain length. A model reconstructed from base + delta_1 + delta_2 + ... + delta_N is bit-identical to the original checkpoint at position N, for any N.
Chain self-calibration: When a checkpoint has drifted far enough from its anchor that the delta exceeds a size threshold, DMX drops a fresh anchor and the chain continues from there. No manual tuning required.
Fork and variant families
A base model plus N variants usually means N full downloads. Under DMX, it means one download plus N small deltas.
Models rarely exist alone. A base model typically has many derived variants — instruction-tuned forks, domain-specialized fine-tunes, LoRA merges, community variants, quantized exports. Under conventional file formats, each variant costs a full model's worth of storage and bandwidth, even though most bits are shared with the base.
DMX delta compression applies directly to this case. A distributor or power user can store the base model once and maintain each variant as a small delta from that base. A user who already has the base and wants to try a new variant downloads only the delta and reconstructs the variant locally.
This changes the economics of model family distribution:
- For users: Trying multiple variants of a model family becomes cheap. The first variant costs a full base download; each subsequent variant costs only its delta.
- For distributors and model hubs: Storage and bandwidth costs for a model family scale with the number of distinct changes across the family, not with the number of variants. A 50-variant family of an 8B model no longer requires 50× the base model's storage per mirror.
- For teams: Maintaining many internal fine-tunes, customer-specific forks, or A/B candidates becomes a per-delta cost rather than a per-full-model cost.
Reconstructed variants are bit-identical to the original variant files. There is no quality difference between a variant downloaded as a full file and the same variant reconstructed from base plus delta.
Compression depends on how the variant was produced:
| Variant type | Expected delta size |
|---|---|
| LoRA merge into base | Small — LoRA changes are localized and sparse |
| Light fine-tune (few hundred steps) | Small-to-moderate |
| Full fine-tune (many epochs on new domain) | Moderate — most layers touched, changes stay structured |
| Format conversion (e.g. FP16→BF16 of same weights) | Near-zero when source bits align |
Measurements across common variant types are in progress.
Inference: runtime derivation — COMING SOON
Note: The inference runtime (
dmx-vram) is not yet publicly released. The measurements below are preliminary. This section describes the target architecture; the shipped CLI currently supports lossless storage and deltas only.
DMX's stored file is lossless — bit-exact with the source weights. At load time, DMX works down a cascade, starting from full source fidelity and stepping to a compressed representation only when hardware constraints require it. Users don't pick precision modes; the loader walks the cascade and picks the point that fits.
Why this matters operationally: one canonical artifact serves heterogeneous hardware. A 12 GB laptop, a 24 GB workstation, an 80 GB server GPU, and a 32 GB edge device can all run the same DMX file, each picking the representation that fits its constraints. No per-hardware variants to maintain, no separate pre-exports for every deployment target. The same file that went into archival distribution runs directly on whatever infrastructure needs it.
The loading cascade
Lossless DMX file on disk (FP32 or FP16 source, bit-exact)
│
▼
┌────────────────────────────────────────────┐
│ Fit at full precision in VRAM, with room │
│ for expected KV cache? │
└────────────────────────────────────────────┘
│ yes │ no
▼ ▼
┌─────────────┐ ┌──────────────────────────────┐
│ Load at │ │ Derive M=7 compressed form. │
│ full │ │ Fit in VRAM now? │
│ precision │ └──────────────────────────────┘
└─────────────┘ │ yes │ no
bit-exact with source ▼ ▼
┌─────────────┐ ┌─────────────────┐
│ Compressed │ │ Compressed │
│ residency │ │ residency │
│ (M=7) │ │ + weight pager │
└─────────────┘ └─────────────────┘
weights as BFP some weights paged
in VRAM from RAM or disk
The cascade starts at full source fidelity and steps to a compressed representation only when hardware can't hold the full precision form. Advanced users can override the default (force compressed residency for bandwidth-bound speed gains, pin a specific mode, or use DMX files with a custom loader that ignores the cascade entirely).
The lossless file on disk is the same regardless of which runtime representation a given machine selects. The cascade happens per-machine at load time, using the same file.
Measured fallback quality
When the cascade reaches M=7 compressed residency, the quality cost of the step-down has been measured across multiple architectures. Numbers below are from selective-roundtrip evaluation that matches the production dmx-vram loader's skip_compression pattern (embeddings, lm_head, and normalization layers kept at FP16; linear layers compressed to M=7 BFP).
Methodology: wikitext-2 full test split (288K+ tokens), sliding window PPL with max_len=1024 and stride=512, per-token NLL averaged, PPL = exp(mean NLL). All models loaded as explicit FP16. Same methodology applied to every row.
| Model | Parameters | Architecture | M=7 PPL delta | BFP compression ratio |
|---|---|---|---|---|
| GPT-2 | 124M | GPT-2 | −0.19% | 53.8% |
| GPT-2 Medium | 355M | GPT-2 | −0.52% | 53.8% |
| OLMo | 1B | OLMo | +0.76% | ~53% |
| Pythia | 1.4B | NeoX | +0.77% | ~53% |
| Qwen 2.5 | 1.5B | Qwen | +0.60% | ~53% |
| Phi-2 | 2.7B | Phi | +1.37% | ~53% |
| Qwen 2.5 | 3B | Qwen | +0.58% | ~53% |
| Mistral | 7B | Mistral | +0.16% | ~53% |
| Llama 3.1 | 8B | Llama | +0.29% | 53.2% |
Across the 9 architectures tested so far, M=7 selective-roundtrip delta stays under 1.5% on wikitext-2, with 8 of 9 models under 0.8%. Performance varies by architecture more than by size: within the Qwen family, delta is similar at 1.5B and 3B (+0.60% and +0.58%). Mistral 7B and Llama 3.1 8B — both standard modern transformer architectures — anchor the larger-model range at +0.16% and +0.29%. Phi-2 shows larger delta (+1.37%) than other models in its size range — Phi's architectural choices around rotary embedding and layer structure appear to make it less tolerant of aggressive quantization. Users deploying Phi-family models or architectures not included in this measurement set (Mamba, MoE, encoder-only, multimodal, vision transformers, etc.) should verify PPL on their specific model and task.
Negative deltas on GPT-2 and GPT-2 Medium are within measurement noise — selective roundtrip at these sizes is statistically indistinguishable from FP16 inference.
What M=7 represents on DMX's curve
M=7 is not DMX's only operating point — it's the fallback point the cascade defaults to when the lossless source can't fit at FP16. M is a tunable parameter: the BFP mantissa bits preserved per block of 32 values. Lower M values produce smaller files and more aggressive VRAM savings, at increasing quality cost. M=7 is the point where the step-down from FP16 stays within the noise band of the standard FP32→FP16 conversion that production inference already performs.
Lower M settings have not yet been characterized. They would extend DMX's curve toward more aggressive savings, at quality costs appropriate to their precision level. Users who need DMX's file beyond the M=7 regime can re-encode at a lower M; the lossless source remains the source of truth.
VRAM characterization (Llama 3.1 8B)
| Runtime mode | Peak VRAM | Quality vs FP16 |
|---|---|---|
| Inflate to FP16 | 15.0 GB | Bit-exact with source |
| Compressed residency (M=7) | 8.9 GB (~40% reduction) | +0.29% PPL delta |
| Compressed + paging | ~3-5 GB | +0.29% PPL delta, slower inference |
Beyond the default cascade
The cascade above describes DMX's default behavior. Underneath it, DMX exposes a composable set of runtime components: lossless representations (FP32/FP16, INT16 via bitcast), lossy compressed residency (M=7 default, M=6 more aggressive), pre-exported quantization formats (INT8, NF4, FP8 — see Export section), and a weight pager for VRAM that can't hold the chosen representation.
Power users can stack these differently than the default. Some examples:
- Lossless INT16 with paging. Weights stay bit-exact with source; when VRAM runs out, paging handles the overflow. No quality tradeoff at all.
- M=6 compressed residency. More aggressive compression than the default, accepting a measurable quality cost in exchange for additional VRAM headroom.
- Pre-exported INT8 or NF4. Generate a quantized file once, load it directly. Skips runtime derivation but locks in that format.
- Storage-only use. Use DMX purely for lossless archival and delta compression, loading into your own inference pipeline with no runtime derivation involved.
The tradeoff axis is speed ←→ performance per unit of VRAM. Full precision gives speed at nominal VRAM cost; heavier compression gives more performance per GB at some quality or latency cost. Generating a non-default representation takes one-time compute to derive; once derived, the result is cached and can be paged or switched between modes without re-derivation.
Detailed configuration is documented separately in technical design notes.
Export to other formats — COMING SOON
DMX can derive and export weights in other formats — INT8, NF4, FP8, and additional quantization schemes — from the lossless source. This is useful when a deployment target expects a specific format natively, or when a user wants to pre-materialize a derivation rather than computing it at load time.
Export produces a new file in the target format. The source DMX file remains unchanged. Each export is a derivation from the lossless source, so multiple exports from the same source are independent of each other — converting to INT8 and then to NF4 produces the same result as converting directly to NF4 from the DMX source.
Export paths are implemented but not yet fully verified. Detailed documentation will follow once behavior on all source file types is confirmed.
Beyond neural network weights
The structural properties DMX exploits — byte-plane decomposition, exponent clustering, exact integer deltas — are not unique to neural network weights. They apply to any dense floating-point data with similar statistical structure.
3D Gaussian Splats
3DGS rendering is structurally the same pattern as model inference: a lossless source is stored at rest, and at runtime DMX derives a representation optimized for the specific downstream consumer. For model inference, the consumer is a transformer and the derivation targets hardware constraints (VRAM, bandwidth, supported precision formats). For 3DGS, the consumer is a browser renderer and the derivation targets rendering speed and streaming bandwidth, with quality held below the perceptual threshold.
The same DMX file can be the lossless source for either use. The derivation happens at load time, tuned to what the consumer needs.
Two workflows are supported:
-
Lossless source with runtime derivation (recommended). Store the 3DGS scene as a lossless DMX file. At load time, the decoder derives a streaming-quality representation for rendering. The source remains available for archival, re-derivation at different quality targets, or any future downstream use. Provenance records the file as lossless.
-
Direct lossy export (available when archival fidelity is not required). When file size is the priority and the 3DGS scene will only ever be consumed by a renderer at perceptual quality, DMX can compress directly to a lossy representation at rest. This is opt-in, not the default. The provenance manifest records that the file is lossy, so downstream consumers know they are not receiving an archival source.
Measured rendering quality on shipped lossy path: The lossy compression path (BFP with FLAC entropy coding) produces ~63% bandwidth reduction on typical 3DGS scenes (measured on the bonsai scene, ~265 MB → 97.2 MB), with rendering quality at PSNR 48.50 dB mean and SSIM 0.9997 across 100 viewpoints per scene. These numbers are well above the perceptual threshold for visible difference — the rendered scene is indistinguishable from the source in practice. Lossless compression numbers under the current main encoder are pending measurement.
dmx-web is the companion Rust/WASM browser decoder (github.com/willjriley/dmx-web). It reconstructs DMX-compressed 3DGS data directly in the browser and includes a real-time Gaussian splatting viewer rendered via WebGL2. The same compressed file sitting on a server decodes and renders in a browser tab — no server-side processing, no intermediate format conversions, and no Python runtime required.
Live demo: huggingface.co/spaces/Senat1/dmx-3dgs-viewer
The decoder has been validated against 2,436 tensor roundtrips across all encoding paths and entropy codecs (zstd, brotli, FLAC).
Provenance: supply-chain visibility
Every DMX file carries an embedded provenance manifest that records what it is, where it came from, and what operations produced it. The manifest is part of the file itself, not a sidecar — it travels with the data and cannot be lost or replaced independently.
Why this matters operationally: supply-chain visibility for ML artifacts. The same pattern that SBOMs (Software Bill of Materials) provide for application builds and that signed commits provide for code, applied to neural network weights. A consumer receiving a DMX file can verify its source, trace its lineage, and detect whether lossy operations appear anywhere in its history. This matters for regulated deployment contexts, for distribution integrity, and for debugging when a model variant behaves unexpectedly in production.
The manifest provides three concrete capabilities:
Identity. Each file carries a hash of its source weights and a hash of its own compressed data. A user receiving a DMX file can verify that it came from the source it claims to come from and that its contents haven't been altered since creation.
Lineage. Every file records its parent (if derived), its delta base (if it's a delta), its lineage depth, and the hash of the original root source. A user can trace any DMX file back to the original checkpoint it descended from, through any number of intermediate operations.
Warnings at boundaries. When an operation produces a derivative file whose quality is bounded by an earlier lossy step — for example, a delta computed against a lossy base — the resulting file carries a warning in its manifest. Downstream tooling can detect these warnings and decide how to handle them (proceed with caveat, refuse to use as a lossless source, require explicit user confirmation). The format reports the state; consumers enforce policy.
The manifest aligns with real standards where they exist: field conventions draw from the EU AI Act Article 13 transparency requirements, the NIST AI Risk Management Framework, and the C2PA content provenance standard. This is deliberate — DMX is designed to work in regulated and distribution-integrity-sensitive contexts, not only in research settings.
What Phase 1 delivers
The initial implementation focuses on the trust-critical subset:
- Source identity (source hash, content hash)
- Lineage (parent, delta base, lineage depth, root hash)
- Lossy-source warnings (automatic when an operation produces output bounded by an earlier lossy step)
- Integrity (cryptographic hash of the compressed data)
- Creation metadata (timestamp, model architecture, parameter count, source format)
What's coming later
Training-pipeline integration (checkpoint step, epoch, training config hash), cryptographic signing (C2PA-compatible detached signatures), regulatory metadata fields (license, author, intended use, known limitations), and merge-tracking for weight-averaged models are defined in the manifest schema and will ship in subsequent phases.
The full manifest schema is documented in DESIGN.md.
Where DMX fits in ML DevOps
DMX maps onto the stages that existing ML infrastructure already has. It doesn't ask operators to adopt new workflows — it improves the economics of stages they're already running.
Training and checkpointing
Training pipelines already produce checkpoints at intervals. DMX replaces bulk full-file checkpoint storage with lossless base-plus-deltas, recording ~35-43% per-delta compression on measured training chains. Frequent checkpointing becomes affordable; checkpoint rollback is bit-exact reconstruction from any point in the chain. Provenance captures checkpoint step, training configuration hash, and lineage to the root source.
Model registry and distribution
Model registries currently store full files per variant. DMX stores the base once and maintains variants as small deltas, scaling registry storage with the number of distinct changes across a model family rather than the number of variants. A registry with 50 variants of an 8B model stops requiring 50× the base model's storage per mirror. Distribution bandwidth drops accordingly — users pull the base once, variants arrive as small delta files.
Serving and deployment — COMING SOON
Note: Compressed residency requires the
dmx-vramruntime, which is validated internally but not yet publicly released. The numbers below are from controlled testing.
Production inference infrastructure has a VRAM budget per GPU and wants to maximize throughput per dollar. DMX compressed residency (measured +0.29% PPL delta on Llama 3.1 8B) reduces weight VRAM by ~40%, freeing that capacity for larger batch sizes, longer contexts, multi-model tenancy, or deployment on smaller hardware tiers. One DMX file serves all of these use cases; the runtime picks the appropriate representation per machine.
Governance and compliance
Regulated deployment contexts (EU AI Act, NIST AI RMF, industry-specific compliance) require supply-chain documentation for model artifacts. DMX's embedded provenance manifests align with these standards — source identity, lineage tracing, lossy-operation warnings, integrity verification. The same infrastructure that provides DevOps convenience also satisfies audit requirements.
Edge and on-device inference — COMING SOON
Note: Requires
dmx-vramruntime. Not yet publicly released.
Mobile and embedded devices have sharply constrained memory. At DMX M=7 compressed residency, models that don't fit natively (3B-8B class on 8-12 GB mobile devices) become candidates. The same file that ships to server clusters ships to on-device deployment; the cascade picks compressed residency automatically on memory-constrained hardware. (Mobile runtime kernels are roadmap; current runtime is CUDA-based.)
When DMX is less interesting
DMX is less interesting if you only deploy a single model to a single hardware target and don't care about checkpoint history, distribution topology, or lineage verification. Conventional tools work fine for that case. DMX's value scales with the complexity of the artifact lifecycle you're managing.
Roadmap
These are directional ideas under consideration, not committed features. They describe where the format could go if the foundations hold up.
DMX-Server
A server component that holds DMX files (base models, variants, checkpoint chains) and serves derived artifacts on demand. Local or cloud-hosted.
A client asks for "this base model at this precision for this hardware," and the server derives the requested representation on the fly from the lossless source it holds, then streams the result. The same derivation mechanism that enables local load-time auto-selection (see Inference section) extends naturally to server-side on-demand export — the local DMX loader and DMX-Server are running the same transformation, just in different locations.
This inverts the conventional distribution model. Rather than pre-exporting every possible variant for every possible hardware target and storing them all, the server stores only the lossless canonical source plus any relevant deltas, and produces variants as needed. Popular derivations can be cached; cold derivations are cheap to recompute on demand.
DMX-REPO (or HuggingFace extension)
Version-control-like workflow for model families. Base models as commits, variants as branches, deltas as the diffs. Whether this lives as a standalone repository format or as an extension layered onto existing model hubs (HuggingFace in particular) is under consideration — the extension path is likely the more pragmatic route because it meets users where they already are.
Other directions under exploration
- Additional data modalities beyond neural network weights and 3D Gaussian Splats — any dense floating-point data with exploitable structure is a candidate.
- Provenance manifest expansion beyond Phase 1 — cryptographic signing (C2PA-compatible), training pipeline integration (checkpoint step, epoch, training config hash), regulatory metadata fields, and merge tracking for weight-averaged models. Schema defined; implementation staged.
- Format conversion utilities that make DMX a transit format between existing neural network file formats without requiring full re-export.
Status
DMX is in active development. The format, tooling, and documentation are evolving.
Currently shipped:
- Lossless storage on FP32, FP16, and BF16 sources
- Lossless storage as the default CLI behavior
- Lossless delta compression on FP32 training checkpoints via the CLI
.dmxfiles accepted directly as delta inputs (base or target, any combination)- Provenance manifest Phase 1 embedded in every new
.dmxand.dmxdfile (source identity, lineage, lossy-source warnings, integrity) - Lineage chain reference through delta operations (content_hash-based, not file-SHA-based, when inputs are
.dmx) dmx inspectcommand for manifest display and content-hash verification- Legacy lossy delta path preserved as
--lossy-quantizedopt-in - 3DGS streaming at ~60%+ bandwidth reduction with perceptual quality (PSNR 48-53 dB, SSIM 0.9997+); dmx-web Rust/WASM browser decoder with real-time viewer; decoder validated against 2,436 tensor roundtrips
Validated internally (not yet publicly released):
- Auto-selected inference mode on Llama-class models (requires dmx-vram runtime)
- M=7 compressed residency with measured PPL across 9 architectures
In validation:
- Lossless delta compression on FP16 and BF16 training checkpoints
- Fork and variant delta scenarios (LoRA merges, fine-tunes, format conversions)
- Additional model architectures for inference quality measurement
Planned (Phase 2 and beyond):
- Provenance manifest expansion: cryptographic signing, training pipeline integration, regulatory fields, merge tracking
- Published benchmark corpus and reproducibility scripts
- Expanded auto-selection heuristics for emerging hardware
Technical details
For the underlying mechanics — how storage compression works, how deltas are computed, how the inference auto-selection decides, and what the precision and correctness guarantees formally are — see the technical documentation (coming soon).
For measurement methodology and reproducibility — how the numbers in this document were produced, which benchmarks were used, and how to reproduce them — see the benchmark documentation (coming soon).
Quick start
Install
pip install dmx-compress
Compress a model (lossless)
dmx compress model.safetensors model.dmx
Original: 474.7 MB (GPT-2 124M, FP32)
Compressed: 397.6 MB (16.2% savings)
Result: LOSSLESS — all 148 tensors exactly match
Decompress (bit-exact reconstruction)
dmx decompress model.dmx restored.safetensors
Verify roundtrip
dmx verify model.safetensors model.dmx
Result: LOSSLESS - all tensors exactly match
Overall verdict: PASS
Delta compression (training checkpoints)
dmx delta-compress base.safetensors checkpoint.safetensors delta.dmxd
dmx delta-reconstruct base.safetensors delta.dmxd restored.safetensors
Inspect provenance
dmx inspect model.dmx
{
"dmx_version": "1.0",
"source_format": "safetensors",
"source_hash": "sha256:ae60e8b7...",
"created": "2026-04-19T21:58:58Z",
"param_count": 124439808,
"lineage_depth": 0,
"root_hash": "sha256:ae60e8b7...",
"export_warning": null,
"content_hash": "sha256:3107d014..."
}
dmx inspect model.dmx --verify
Download pre-compressed models
| Model | Original | DMX | Savings | Verified |
|---|---|---|---|---|
| GPT-2 124M FP32 | 474.7 MB | 397.6 MB | 16.2% | Bit-exact roundtrip |
License and patent
Code: MIT License — free to use, modify, and distribute.
Methods: Patent Pending (U.S. Provisional Applications filed April 2026). The patented methods cover aligned cross-layer quantization for neural network weight compression and stream-separated block floating point encoding with independent entropy coding. Personal, academic, and open-source use is unrestricted. Commercial use of the patented methods may require a license from the inventor — contact bill.riley@gmail.com.
Citation
@software{riley2026dmx,
author = {Riley, William J},
title = {DMX: Delta Multiplexed Model Format},
year = {2026},
url = {https://github.com/willjriley/dmx}
}
End of README.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dmx_compress-1.0.0.tar.gz.
File metadata
- Download URL: dmx_compress-1.0.0.tar.gz
- Upload date:
- Size: 149.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5873a83e639f7483d59de9b33b3a954fabfb855009836dffbca40a807a52ad87
|
|
| MD5 |
412d71b9db8faeb603a80f5608bedca1
|
|
| BLAKE2b-256 |
66a03d58b3797788d48a1aea4b2eb3027833bddf73804afc3631dce742fbbe74
|
File details
Details for the file dmx_compress-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dmx_compress-1.0.0-py3-none-any.whl
- Upload date:
- Size: 66.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7eadb9a587a9f322df438aaa7889184f1a3cc013104246de5135a07a38bb16ff
|
|
| MD5 |
317b8e8f001502e767d618f0b992a08c
|
|
| BLAKE2b-256 |
05100d31ad13e48e89642b68d40d44b1f9ec53b6a12e9901ff7549db9fed5ba1
|