Skip to main content

Model weight compression and streaming decode library

Project description

helix-substrate

Model weight compression and streaming decode library. Compress neural network weights into a compact format (CDNA), then run matrix operations directly from the compressed representation — without ever loading the full weight matrix into memory.

What it does

  1. CDNA Format — Quantize model weights into a 256-entry codebook + uint8 indices, with per-block brotli compression and SHA256 verification.

  2. Streaming Block Decode — Compute Y = X @ W where W is stored in CDNA format. W is never fully loaded. Instead, blocks of rows are decompressed one at a time, multiplied against the corresponding slice of X, and accumulated.

  3. Structural Entropy (Se) Routing — Measure tensor complexity via Se = H × U × D (entropy × unstructuredness × rank depth). The Se score maps to a compute routing decision: simple tensors → CPU, parallel tensors → GPU, complex unstructured tensors → QPU.

  4. Receipts — Every operation produces a tamper-evident receipt with SHA256 input/output hashes, timing, memory usage, and fidelity metrics. If you can't verify it, it didn't happen.

Benchmarks

Measured peak memory for Y = X @ W with streaming vs loading full W:

Matrix Size Block Rows Standard Streaming Ratio
64 MB 64 64 MB 18 MB 3.5x
256 MB 64 256 MB 68 MB 3.8x
1 GB 64 1024 MB 137 MB 7.5x
1 GB 32 1024 MB 69 MB 14.9x

Streaming overhead is ~constant (~68 MB). Ratio improves with matrix size. At LLM scale (1GB+ weight matrices), expect 7-15x memory reduction depending on block size.

Correctness: cosine similarity = 1.000000 (exact match to full-matrix computation).

Run the benchmark yourself:

python tools/bench_memory.py --rows 16384 --cols 16384 --block-rows 32

Install

pip install helix-substrate

For HuggingFace model conversion:

pip install helix-substrate[hf,brotli]

Required: numpy>=1.24. Optional: brotli, zstandard (compression), huggingface_hub, safetensors (model conversion).

Quick start

Convert a HuggingFace model

helix-substrate convert mistralai/Mistral-7B-v0.1 --output ./mistral-cdna

This downloads the model, quantizes each weight tensor to CDNA format, and saves to a directory. The model is now ready for streaming inference.

Compress a tensor

import numpy as np
from helix_substrate import encode_tensor_to_cdna, decode_cdna_to_tensor

# Compress
W = np.random.randn(4096, 4096).astype(np.float32)
encode_tensor_to_cdna(W, "weight.cdna", tensor_name="my_layer")

# Decompress
W_decoded = decode_cdna_to_tensor("weight.cdna")
print(f"Cosine similarity: {np.dot(W.flat, W_decoded.flat) / (np.linalg.norm(W) * np.linalg.norm(W_decoded)):.6f}")

Measure tensor complexity

from helix_substrate import compute_tensor_se

result = compute_tensor_se(W)
print(f"Se={result['Se']:.3f} → route to {result['routing_hint']}")
# Se=0.42 → route to gpu

Streaming decode (CDNAv2)

from helix_substrate.stream_matmul import stream_xw_from_cdna

# Y = X @ W, but W is never fully loaded
X = np.random.randn(1, 256, 4096).astype(np.float32)
Y, receipt = stream_xw_from_cdna(X, "weight.cdna2.hxz")
print(f"Memory savings: {receipt.savings_factor:.1f}x")

Package structure

helix_substrate/
├── __init__.py          # Public API
├── cdna_encoder.py      # CDNA v1 encode/decode (k-means quantization)
├── cdna_reader.py       # CDNA v2 reader (block-indexed, brotli, SHA256)
├── sidecar.py           # HXZO outlier sidecar (high-precision corrections)
├── stream_matmul.py     # Core: Y = X @ W from CDNA (streaming, never loads W)
├── stream_attention.py  # Full attention layer (Q,K,V,O all streamed)
├── stream_ffn.py        # Full FFN layer (gate,up,down all streamed)
├── stream_block.py      # Full transformer block (attention + FFN + norms)
├── rope.py              # Rotary Position Embeddings
├── se.py                # Structural Entropy estimator and routing
└── receipt.py           # Tamper-evident execution receipts

The Se formula

Structural Entropy decomposes tensor complexity into three independent factors:

Component Measures High means
H (entropy) Singular value spread Energy spread across many directions
U (unstructuredness) Neighbor coherence No spatial correlation between adjacent rows
D (depth) Effective rank ratio Many dimensions matter

Se = H × U × D produces a 0-1 score. The 2D routing policy uses (Se, C_struct) jointly:

  • Zone 1: Se < 0.30, structured → CPU
  • Zone 2: 0.30 ≤ Se < 0.70 → GPU
  • Zone 3: Se ≥ 0.70, unstructured → QPU
  • Zone 4: Se ≥ 0.70, structured → GPU

Inspiration

The mathematical patterns in this library draw from nature — Fibonacci sequences in the block structure, golden-ratio-inspired codebook initialization, and structural entropy as a measure of order vs chaos in weight matrices. The thesis: nature already solved the compression math we need, because it's the world's largest dataset.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helix_substrate-0.2.0.tar.gz (63.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

helix_substrate-0.2.0-py3-none-any.whl (68.2 kB view details)

Uploaded Python 3

File details

Details for the file helix_substrate-0.2.0.tar.gz.

File metadata

  • Download URL: helix_substrate-0.2.0.tar.gz
  • Upload date:
  • Size: 63.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for helix_substrate-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b1315e7c9736dbcddadc756180e90508d417785a15abc237cdb32c2bb32c51b2
MD5 5afbd19861e41c0c6ff4aba3eeb9d275
BLAKE2b-256 5f11458add52c15d3a757ff3cd495c7ab97720262a0a2c4cfd69d5c9b1067084

See more details on using hashes here.

File details

Details for the file helix_substrate-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for helix_substrate-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea6ae1100246153deee4b2d4c6ab397cd179d435eba6596dadf330ce304d9c57
MD5 c4cfdc3b4ea992e2350f9c7119b51dd2
BLAKE2b-256 2cd0ef3f17d533452ed0eb8fa1d4d08795a3600662066a1c3fe155c5ddf03e9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page