Skip to main content

Vortex-Codec: neural lossless byte-level codec

Project description

Vortex-Codec

Vortex-Codec is a Python library for neural lossless compression using compressive transformers + arithmetic coding. Use it as a package in your projects or via the provided CLI tools.

Installation

# Install in editable/development mode
pip install -e .

# Or install runtime requirements only
pip install -r requirements.txt

Quick usage (library)

import vortex
from vortex.models.optimized_transformer import OptimisedCompressiveTransformer

print(vortex.__version__)
model = OptimisedCompressiveTransformer()

Quick usage (CLI)

# Compress and decompress via installed console scripts
vortex-compress --model PATH_TO_MODEL --input file.bin --output file.vxc --config path/config.yaml
vortex-decompress --model PATH_TO_MODEL --input file.vxc --output recovered.bin --config path/config.yaml

Repository Layout

vortex-codec/
├── vortex/                              # core Python package
│   ├── models/
│   │   ├── __init__.py                  # re-exports all public symbols
│   │   ├── compressive_transformer.py   # base model (CompressiveTransformer)
│   │   └── optimized_transformer.py     # production model (OptimisedCompressiveTransformer)
│   ├── compression/
│   │   └── arithmetic_coding.py         # torchac encode/decode + BPD metric
│   ├── data/
│   │   └── dataset.py                   # make_loaders() for binary / HDF5 files
│   └── utils/
│       ├── training.py                  # LR schedule, checkpointing, EarlyStopping
│       └── zipnn.py                     # Huffman post-training weight compression
├── scripts/
│   ├── train.py                         # full training loop (CATWrapper, AMP, TensorBoard)
│   ├── compress.py                      # file → .vxc bitstream
│   ├── decompress.py                    # .vxc bitstream → file
│   ├── evaluate.py                      # BPD vs gzip / zlib / lzma baselines
│   └── compress_weights.py              # apply ZipNN compression to a checkpoint
├── experiments/
│   ├── atlas_experiment/                # ATLAS FTAG HDF5 -> .bin splits
│   ├── camel_experiment/                # CAMEL HDF5 -> raw + float32 .bin splits
│   ├── hepmc_experiment/                # ATLAS HEPMC tarballs -> .hepmc splits
│   ├── cms_experiment/                  # CMS NanoAOD ROOT -> padded float32 .bin
│   ├── cms_experiment_lg/               # Original large-dataset CMS pipeline
│   └── alice_experiment/                # ALICE ROOT -> padded float32 .bin
├── configs/                             # hardware-specific base configs
│   ├── colab_t4.yaml
│   ├── rtx4070_8gb.yaml
│   ├── default.yaml
│   ├── rtx4090_24gb.yaml
│   └── amd_mi300x.yaml
├── tests/
│   └── test_basic.py
└── docs/
    ├── ARCHITECTURE_COMPARISON.md       # v1 vs v3 component-by-component diff
    └── HARDWARE_GUIDE.md

Architecture

Overview

Vortex-Codec is a byte-level autoregressive model: given a stream of bytes it predicts a probability distribution over the next byte, and uses arithmetic coding (torchac) to encode/decode the stream losslessly. Lower predicted cross-entropy = better compression.

The codebase contains two model variants, both in vortex/models/:

Class File Use
CompressiveTransformer compressive_transformer.py Reference / lightweight
OptimisedCompressiveTransformer optimized_transformer.py Production (Flash Attn2, KV cache, RMSNorm)
CATWrapper optimized_transformer.py Dynamic chunk scheduler wrapping either model

compressive_transformer.py — Base Model

TDTEmbedding

Per-type embedding for IEEE-754 float32 byte streams.
Each of the 4 byte positions within a float32 (mantissa-low through sign/exponent-high) gets its own nn.Embedding(256, d_model) lookup table, since they have very different entropy profiles. An additional learnable type_scale vector (softmax-normalised) gates each table's contribution.

byte (0–255) ──► table[ t % 4 ]  (one of 4 typed tables, scale-gated)
                       ↓
                 h  (B, T, d_model)

LearnableTokenEviction (LTE)

Content-adaptive token selection replacing strided Conv1d downsampling.
A lightweight depthwise + pointwise scorer produces per-token importance scores; the top-k (where k = ceil(T / rate)) tokens are kept in original temporal order. A straight-through soft gate (sigmoid-weighted) keeps the operation end-to-end differentiable. A final Conv1d projection + LayerNorm produces the memory representation.

acts (B, T, D) ──► scorer ──► topk ──► soft-gate ──► proj+norm ──► (B, k, D)

MemoryManager

Thin wrapper around LearnableTokenEviction. Provides a .compress(acts) method used by attention layers to build compressed memory from past activations.

CompressiveAttention

Multi-head attention with two-tier memory:

  • Local stream: causal scaled_dot_product_attention over the current window (Q, K, V).
  • Memory stream: cross-attention from current queries into compressed past (Km, Vm from MemoryManager).
  • Infini-β gating: a per-head learnable scalar β = sigmoid(infini_beta) mixes the two streams: out = β·out_mem + (1−β)·out_local. Initialised at 0 (all local) so training starts stable.
  • Compressed memory is accumulated across chunks and capped at window // 2 tokens (oldest dropped).

SwiGLU

Gated feed-forward block (Shazeer 2020). No bias, no dropout.
out = down( silu(gate(x)) * up(x) ) — two parallel projections to d_ff, one is SiLU-activated and used as a gate.

TransformerBlock

LayerNormCompressiveAttention → residual → LayerNormSwiGLU → residual.

CompressiveTransformer

Full byte-level model:

  • Embedding: standard nn.Embedding or TDTEmbedding (use_tdt=True)
  • Sinusoidal PositionalEncoding (max 8192)
  • Stack of TransformerBlock layers
  • Final LayerNorm + linear projection to vocab logits
  • Optional per-layer gradient checkpointing (enable_gradient_checkpointing())

Default config: vocab_size=256, d_model=512, n_layers=8, n_heads=8, d_ff=2048, window=512, compression_rate=4.


optimized_transformer.py — Production Model

All components from compressive_transformer.py are reused (imported directly). The optimised variant swaps or adds:

RMSNorm

Root-Mean-Square normalisation (no mean-centering). ~15 % faster than LayerNorm at the same quality.

OptimisedCompressiveAttention

Extends CompressiveAttention with:

  • Flash Attention 2 (flash_attn_func) for causal attention when CUDA is available; falls back to PyTorch scaled_dot_product_attention automatically.
  • KV cache: concatenates previously seen K/V tensors for O(1)-per-step autoregressive inference. Returns new_cache = {"k": K, "v": V} each forward pass.
  • Infini-β init changed to −3.0 (sigmoid → ~0.047) so training starts almost entirely local.

OptimisedBlock

RMSNormOptimisedCompressiveAttention → residual → RMSNormSwiGLU → residual.
Forward signature: (x, comp_mem, kv_cache) → (x, new_comp, new_cache).

OptimisedCompressiveTransformer

Drop-in replacement for CompressiveTransformer with all optimised components.
Extra method: vram_estimate_gb(batch_size, seq_len) — returns a dict with parameter, activation, optimizer-state, and total VRAM estimates in GB.

CATWrapper

Dynamic chunk scheduler wrapping any model.

  • Training: randomly samples chunk size from chunk_sizes=(128, 256, 512) each forward pass, enabling multi-scale learning.
  • Inference: defaults to the largest chunk size; override with chunk_size= argument.
  • Handles sequences longer than the chunk size by iterating and accumulating memories and kv_caches across chunks (detached between chunks to limit graph size).
  • Transparent proxy: delegates parameters(), named_parameters(), state_dict(), load_state_dict(), enable_gradient_checkpointing(), and vram_estimate_gb() to the inner model, so checkpoints are portable without the wrapper.

vortex/compression/arithmetic_coding.py

Lossless arithmetic coding via torchac:

Function Description
probs_to_cdf(probs) Converts model output probabilities to a cumulative CDF (with ε-smoothing)
encode(probs, symbols) Encodes a (B, T) symbol tensor to bytes
decode(bitstring, probs) Decodes bytes back to (B, T) int16 symbols
theoretical_bpd(probs, symbols) Cross-entropy bits-per-byte — the training objective

vortex/utils/zipnn.py — Post-Training Weight Compression

Huffman-based lossless checkpoint size reduction (30–60 % smaller files).
Splits each float32 weight tensor into sign + exponent + mantissa bytes. Exponents and signs are Huffman-coded (low entropy); raw mantissa bytes are stored unmodified (near-random, high entropy). Decompression is exact.

from vortex.utils.zipnn import compress_model_weights, decompress_model_weights

compressed = compress_model_weights(model)
torch.save(compressed, "weights.zipnn.pt")

model2 = MyModel(...)
decompress_model_weights(model2, compressed)

Hardware Configs

File GPU VRAM Params
colab_t4.yaml T4 (Colab) 15 GB 3.2 M
rtx4070_8gb.yaml RTX 4070 8 GB 8.5 M
default.yaml RTX 3090/80 12 GB 14.8 M
rtx4090_24gb.yaml RTX 4090 24 GB 28 M
amd_mi300x.yaml MI300X 192 GB 60 M+

Training Details

The scripts/train.py loop uses OptimisedCompressiveTransformer wrapped in CATWrapper.
Key features:

  • Mixed precision (torch.amp) with bfloat16 on ROCm/Ampere+, float16 otherwise
  • Cosine LR schedule with linear warmup (vortex.utils.training.cosine_with_warmup)
  • Gradient clipping (grad_clip=1.0) + AdamW weight decay
  • EarlyStopping on validation BPD (patience=5, min_delta=1e-4)
  • TensorBoard logging + live ASCII scoreboard with BPD trend vs baselines
  • Gradient checkpointing (enabled per config; ~40 % VRAM reduction)

Default hyperparameters (configs/default.yaml):

d_model: 512  |  n_layers: 8  |  n_heads: 8  |  d_ff: 2048
window: 512   |  compression_rate: 4          |  dropout: 0.1
batch_size: 32  |  lr: 3e-4  |  warmup: 4000  |  max_steps: 100000

ATLAS Dataset

  • Source: CERN EOS root://eospublic.cern.ch//eos/opendata/atlas/datascience/ATLAS-FTAG-2023-05/
  • Format: HDF5 → extracted to raw binary (atlas.bin) via download.py
  • Benchmark sample: mc-flavtag-ttbar-medium.bin (1 GB) — used for both baseline and Vortex evaluation
  • Structured dtype: 30 fields including pt_btagJes, GN2v01_pb, kinematics, labels
  • See docs/ARCHITECTURE_COMPARISON.md for a detailed v1 → v3 component diff and BPD benchmarks.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vortex_codec-0.2.0.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vortex_codec-0.2.0-py3-none-any.whl (52.9 kB view details)

Uploaded Python 3

File details

Details for the file vortex_codec-0.2.0.tar.gz.

File metadata

  • Download URL: vortex_codec-0.2.0.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for vortex_codec-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6e656927226a250465c4cddef8c33e06600e47f3b67f2bc61c031b2d57713e43
MD5 6d562244440b1863c3a3834bbd9b9518
BLAKE2b-256 ffde92b933986b23b8176705bee88db50167868521f41043c72e363e65613f19

See more details on using hashes here.

File details

Details for the file vortex_codec-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vortex_codec-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 52.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for vortex_codec-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 746c8dae90135db1efc2974922fdedf3eff95d0316945de66376ef766b26d645
MD5 d593415cd8d1e5e13f6de4382984b150
BLAKE2b-256 6c21e4461beeb58b20e0b5579161c70999c1f114793448de13a49c7a4bbf7f6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page