Vortex-Codec: neural lossless byte-level codec
Project description
Vortex-Codec
Vortex-Codec is a Python library for neural lossless compression using compressive transformers + arithmetic coding. Use it as a package in your projects or via the provided CLI tools.
Installation
# Install in editable/development mode
pip install -e .
# Or install runtime requirements only
pip install -r requirements.txt
Quick usage (library)
import vortex
from vortex.models.optimized_transformer import OptimisedCompressiveTransformer
print(vortex.__version__)
model = OptimisedCompressiveTransformer()
Quick usage (CLI)
# Compress and decompress via installed console scripts
vortex-compress --model PATH_TO_MODEL --input file.bin --output file.vxc --config path/config.yaml
vortex-decompress --model PATH_TO_MODEL --input file.vxc --output recovered.bin --config path/config.yaml
Repository Layout
vortex-codec/
├── vortex/ # core Python package
│ ├── models/
│ │ ├── __init__.py # re-exports all public symbols
│ │ ├── compressive_transformer.py # base model (CompressiveTransformer)
│ │ └── optimized_transformer.py # production model (OptimisedCompressiveTransformer)
│ ├── compression/
│ │ └── arithmetic_coding.py # torchac encode/decode + BPD metric
│ ├── data/
│ │ └── dataset.py # make_loaders() for binary / HDF5 files
│ └── utils/
│ ├── training.py # LR schedule, checkpointing, EarlyStopping
│ └── zipnn.py # Huffman post-training weight compression
├── scripts/
│ ├── train.py # full training loop (CATWrapper, AMP, TensorBoard)
│ ├── compress.py # file → .vxc bitstream
│ ├── decompress.py # .vxc bitstream → file
│ ├── evaluate.py # BPD vs gzip / zlib / lzma baselines
│ └── compress_weights.py # apply ZipNN compression to a checkpoint
├── experiments/
│ ├── atlas_experiment/ # ATLAS FTAG HDF5 -> .bin splits
│ ├── camel_experiment/ # CAMEL HDF5 -> raw + float32 .bin splits
│ ├── hepmc_experiment/ # ATLAS HEPMC tarballs -> .hepmc splits
│ ├── cms_experiment/ # CMS NanoAOD ROOT -> padded float32 .bin
│ ├── cms_experiment_lg/ # Original large-dataset CMS pipeline
│ └── alice_experiment/ # ALICE ROOT -> padded float32 .bin
├── configs/ # hardware-specific base configs
│ ├── colab_t4.yaml
│ ├── rtx4070_8gb.yaml
│ ├── default.yaml
│ ├── rtx4090_24gb.yaml
│ └── amd_mi300x.yaml
├── tests/
│ └── test_basic.py
└── docs/
├── ARCHITECTURE_COMPARISON.md # v1 vs v3 component-by-component diff
└── HARDWARE_GUIDE.md
Architecture
Overview
Vortex-Codec is a byte-level autoregressive model: given a stream of bytes it predicts a probability distribution over the next byte, and uses arithmetic coding (torchac) to encode/decode the stream losslessly. Lower predicted cross-entropy = better compression.
The codebase contains two model variants, both in vortex/models/:
| Class | File | Use |
|---|---|---|
CompressiveTransformer |
compressive_transformer.py |
Reference / lightweight |
OptimisedCompressiveTransformer |
optimized_transformer.py |
Production (Flash Attn2, KV cache, RMSNorm) |
CATWrapper |
optimized_transformer.py |
Dynamic chunk scheduler wrapping either model |
compressive_transformer.py — Base Model
TDTEmbedding
Per-type embedding for IEEE-754 float32 byte streams.
Each of the 4 byte positions within a float32 (mantissa-low through sign/exponent-high) gets its own nn.Embedding(256, d_model) lookup table, since they have very different entropy profiles. An additional learnable type_scale vector (softmax-normalised) gates each table's contribution.
byte (0–255) ──► table[ t % 4 ] (one of 4 typed tables, scale-gated)
↓
h (B, T, d_model)
LearnableTokenEviction (LTE)
Content-adaptive token selection replacing strided Conv1d downsampling.
A lightweight depthwise + pointwise scorer produces per-token importance scores; the top-k (where k = ceil(T / rate)) tokens are kept in original temporal order. A straight-through soft gate (sigmoid-weighted) keeps the operation end-to-end differentiable. A final Conv1d projection + LayerNorm produces the memory representation.
acts (B, T, D) ──► scorer ──► topk ──► soft-gate ──► proj+norm ──► (B, k, D)
MemoryManager
Thin wrapper around LearnableTokenEviction. Provides a .compress(acts) method used by attention layers to build compressed memory from past activations.
CompressiveAttention
Multi-head attention with two-tier memory:
- Local stream: causal
scaled_dot_product_attentionover the current window (Q,K,V). - Memory stream: cross-attention from current queries into compressed past (
Km,VmfromMemoryManager). - Infini-β gating: a per-head learnable scalar
β = sigmoid(infini_beta)mixes the two streams:out = β·out_mem + (1−β)·out_local. Initialised at 0 (all local) so training starts stable. - Compressed memory is accumulated across chunks and capped at
window // 2tokens (oldest dropped).
SwiGLU
Gated feed-forward block (Shazeer 2020). No bias, no dropout.
out = down( silu(gate(x)) * up(x) ) — two parallel projections to d_ff, one is SiLU-activated and used as a gate.
TransformerBlock
LayerNorm → CompressiveAttention → residual → LayerNorm → SwiGLU → residual.
CompressiveTransformer
Full byte-level model:
- Embedding: standard
nn.EmbeddingorTDTEmbedding(use_tdt=True) - Sinusoidal
PositionalEncoding(max 8192) - Stack of
TransformerBlocklayers - Final
LayerNorm+ linear projection to vocab logits - Optional per-layer gradient checkpointing (
enable_gradient_checkpointing())
Default config: vocab_size=256, d_model=512, n_layers=8, n_heads=8, d_ff=2048, window=512, compression_rate=4.
optimized_transformer.py — Production Model
All components from compressive_transformer.py are reused (imported directly). The optimised variant swaps or adds:
RMSNorm
Root-Mean-Square normalisation (no mean-centering). ~15 % faster than LayerNorm at the same quality.
OptimisedCompressiveAttention
Extends CompressiveAttention with:
- Flash Attention 2 (
flash_attn_func) for causal attention when CUDA is available; falls back to PyTorchscaled_dot_product_attentionautomatically. - KV cache: concatenates previously seen
K/Vtensors for O(1)-per-step autoregressive inference. Returnsnew_cache = {"k": K, "v": V}each forward pass. - Infini-β init changed to
−3.0(sigmoid → ~0.047) so training starts almost entirely local.
OptimisedBlock
RMSNorm → OptimisedCompressiveAttention → residual → RMSNorm → SwiGLU → residual.
Forward signature: (x, comp_mem, kv_cache) → (x, new_comp, new_cache).
OptimisedCompressiveTransformer
Drop-in replacement for CompressiveTransformer with all optimised components.
Extra method: vram_estimate_gb(batch_size, seq_len) — returns a dict with parameter, activation, optimizer-state, and total VRAM estimates in GB.
CATWrapper
Dynamic chunk scheduler wrapping any model.
- Training: randomly samples chunk size from
chunk_sizes=(128, 256, 512)each forward pass, enabling multi-scale learning. - Inference: defaults to the largest chunk size; override with
chunk_size=argument. - Handles sequences longer than the chunk size by iterating and accumulating
memoriesandkv_cachesacross chunks (detached between chunks to limit graph size). - Transparent proxy: delegates
parameters(),named_parameters(),state_dict(),load_state_dict(),enable_gradient_checkpointing(), andvram_estimate_gb()to the inner model, so checkpoints are portable without the wrapper.
vortex/compression/arithmetic_coding.py
Lossless arithmetic coding via torchac:
| Function | Description |
|---|---|
probs_to_cdf(probs) |
Converts model output probabilities to a cumulative CDF (with ε-smoothing) |
encode(probs, symbols) |
Encodes a (B, T) symbol tensor to bytes |
decode(bitstring, probs) |
Decodes bytes back to (B, T) int16 symbols |
theoretical_bpd(probs, symbols) |
Cross-entropy bits-per-byte — the training objective |
vortex/utils/zipnn.py — Post-Training Weight Compression
Huffman-based lossless checkpoint size reduction (30–60 % smaller files).
Splits each float32 weight tensor into sign + exponent + mantissa bytes. Exponents and signs are Huffman-coded (low entropy); raw mantissa bytes are stored unmodified (near-random, high entropy). Decompression is exact.
from vortex.utils.zipnn import compress_model_weights, decompress_model_weights
compressed = compress_model_weights(model)
torch.save(compressed, "weights.zipnn.pt")
model2 = MyModel(...)
decompress_model_weights(model2, compressed)
Hardware Configs
| File | GPU | VRAM | Params |
|---|---|---|---|
colab_t4.yaml |
T4 (Colab) | 15 GB | 3.2 M |
rtx4070_8gb.yaml |
RTX 4070 | 8 GB | 8.5 M |
default.yaml |
RTX 3090/80 | 12 GB | 14.8 M |
rtx4090_24gb.yaml |
RTX 4090 | 24 GB | 28 M |
amd_mi300x.yaml |
MI300X | 192 GB | 60 M+ |
Training Details
The scripts/train.py loop uses OptimisedCompressiveTransformer wrapped in CATWrapper.
Key features:
- Mixed precision (
torch.amp) withbfloat16on ROCm/Ampere+,float16otherwise - Cosine LR schedule with linear warmup (
vortex.utils.training.cosine_with_warmup) - Gradient clipping (
grad_clip=1.0) + AdamW weight decay - EarlyStopping on validation BPD (patience=5, min_delta=1e-4)
- TensorBoard logging + live ASCII scoreboard with BPD trend vs baselines
- Gradient checkpointing (enabled per config; ~40 % VRAM reduction)
Default hyperparameters (configs/default.yaml):
d_model: 512 | n_layers: 8 | n_heads: 8 | d_ff: 2048
window: 512 | compression_rate: 4 | dropout: 0.1
batch_size: 32 | lr: 3e-4 | warmup: 4000 | max_steps: 100000
ATLAS Dataset
- Source: CERN EOS
root://eospublic.cern.ch//eos/opendata/atlas/datascience/ATLAS-FTAG-2023-05/ - Format: HDF5 → extracted to raw binary (
atlas.bin) viadownload.py - Benchmark sample:
mc-flavtag-ttbar-medium.bin(1 GB) — used for both baseline and Vortex evaluation - Structured dtype: 30 fields including
pt_btagJes,GN2v01_pb, kinematics, labels - See
docs/ARCHITECTURE_COMPARISON.mdfor a detailed v1 → v3 component diff and BPD benchmarks.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vortex_codec-0.2.0.tar.gz.
File metadata
- Download URL: vortex_codec-0.2.0.tar.gz
- Upload date:
- Size: 50.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e656927226a250465c4cddef8c33e06600e47f3b67f2bc61c031b2d57713e43
|
|
| MD5 |
6d562244440b1863c3a3834bbd9b9518
|
|
| BLAKE2b-256 |
ffde92b933986b23b8176705bee88db50167868521f41043c72e363e65613f19
|
File details
Details for the file vortex_codec-0.2.0-py3-none-any.whl.
File metadata
- Download URL: vortex_codec-0.2.0-py3-none-any.whl
- Upload date:
- Size: 52.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
746c8dae90135db1efc2974922fdedf3eff95d0316945de66376ef766b26d645
|
|
| MD5 |
d593415cd8d1e5e13f6de4382984b150
|
|
| BLAKE2b-256 |
6c21e4461beeb58b20e0b5579161c70999c1f114793448de13a49c7a4bbf7f6d
|