Skip to main content

Model weight compression and streaming decode library

Project description

helix-substrate

Model weight compression and streaming decode library. Compress neural network weights into a compact format (CDNA), then run matrix operations directly from the compressed representation — without ever loading the full weight matrix into memory.

What it does

  1. CDNA Format — Quantize model weights into a 256-entry codebook + uint8 indices, with per-block brotli compression and SHA256 verification.

  2. Streaming Block Decode — Compute Y = X @ W where W is stored in CDNA format. W is never fully loaded. Instead, blocks of rows are decompressed one at a time, multiplied against the corresponding slice of X, and accumulated.

  3. Structural Entropy (Se) Routing — Measure tensor complexity via Se = H × U × D (entropy × unstructuredness × rank depth). The Se score maps to a compute routing decision: simple tensors → CPU, parallel tensors → GPU, complex unstructured tensors → QPU.

  4. Receipts — Every operation produces a tamper-evident receipt with SHA256 input/output hashes, timing, memory usage, and fidelity metrics. If you can't verify it, it didn't happen.

Benchmarks

Measured peak memory for Y = X @ W with streaming vs loading full W:

Matrix Size Block Rows Standard Streaming Ratio
64 MB 64 64 MB 18 MB 3.5x
256 MB 64 256 MB 68 MB 3.8x
1 GB 64 1024 MB 137 MB 7.5x
1 GB 32 1024 MB 69 MB 14.9x

Streaming overhead is ~constant (~68 MB). Ratio improves with matrix size. At LLM scale (1GB+ weight matrices), expect 7-15x memory reduction depending on block size.

Correctness: cosine similarity = 1.000000 (exact match to full-matrix computation).

Run the benchmark yourself:

python tools/bench_memory.py --rows 16384 --cols 16384 --block-rows 32

Install

pip install helix-substrate

For HuggingFace model conversion:

pip install helix-substrate[hf,brotli]

Required: numpy>=1.24. Optional: brotli, zstandard (compression), huggingface_hub, safetensors (model conversion).

Quick start

Convert a HuggingFace model

helix-substrate convert mistralai/Mistral-7B-v0.1 --output ./mistral-cdna

This downloads the model, quantizes each weight tensor to CDNA format, and saves to a directory. The model is now ready for streaming inference.

Compress a tensor

import numpy as np
from helix_substrate import encode_tensor_to_cdna, decode_cdna_to_tensor

# Compress
W = np.random.randn(4096, 4096).astype(np.float32)
encode_tensor_to_cdna(W, "weight.cdna", tensor_name="my_layer")

# Decompress
W_decoded = decode_cdna_to_tensor("weight.cdna")
print(f"Cosine similarity: {np.dot(W.flat, W_decoded.flat) / (np.linalg.norm(W) * np.linalg.norm(W_decoded)):.6f}")

Measure tensor complexity

from helix_substrate import compute_tensor_se

result = compute_tensor_se(W)
print(f"Se={result['Se']:.3f} → route to {result['routing_hint']}")
# Se=0.42 → route to gpu

Streaming decode (CDNAv2)

from helix_substrate.stream_matmul import stream_xw_from_cdna

# Y = X @ W, but W is never fully loaded
X = np.random.randn(1, 256, 4096).astype(np.float32)
Y, receipt = stream_xw_from_cdna(X, "weight.cdna2.hxz")
print(f"Memory savings: {receipt.savings_factor:.1f}x")

Package structure

helix_substrate/
├── __init__.py          # Public API
├── cdna_encoder.py      # CDNA v1 encode/decode (k-means quantization)
├── cdna_reader.py       # CDNA v2 reader (block-indexed, brotli, SHA256)
├── sidecar.py           # HXZO outlier sidecar (high-precision corrections)
├── stream_matmul.py     # Core: Y = X @ W from CDNA (streaming, never loads W)
├── stream_attention.py  # Full attention layer (Q,K,V,O all streamed)
├── stream_ffn.py        # Full FFN layer (gate,up,down all streamed)
├── stream_block.py      # Full transformer block (attention + FFN + norms)
├── rope.py              # Rotary Position Embeddings
├── se.py                # Structural Entropy estimator and routing
└── receipt.py           # Tamper-evident execution receipts

The Se formula

Structural Entropy decomposes tensor complexity into three independent factors:

Component Measures High means
H (entropy) Singular value spread Energy spread across many directions
U (unstructuredness) Neighbor coherence No spatial correlation between adjacent rows
D (depth) Effective rank ratio Many dimensions matter

Se = H × U × D produces a 0-1 score. The 2D routing policy uses (Se, C_struct) jointly:

  • Zone 1: Se < 0.30, structured → CPU
  • Zone 2: 0.30 ≤ Se < 0.70 → GPU
  • Zone 3: Se ≥ 0.70, unstructured → QPU
  • Zone 4: Se ≥ 0.70, structured → GPU

Inspiration

The mathematical patterns in this library draw from nature — Fibonacci sequences in the block structure, golden-ratio-inspired codebook initialization, and structural entropy as a measure of order vs chaos in weight matrices. The thesis: nature already solved the compression math we need, because it's the world's largest dataset.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helix_substrate-0.2.1.tar.gz (65.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

helix_substrate-0.2.1-py3-none-any.whl (71.5 kB view details)

Uploaded Python 3

File details

Details for the file helix_substrate-0.2.1.tar.gz.

File metadata

  • Download URL: helix_substrate-0.2.1.tar.gz
  • Upload date:
  • Size: 65.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for helix_substrate-0.2.1.tar.gz
Algorithm Hash digest
SHA256 425b76179ef079bbdca1c66524caa0f1ff73619cf707c9c5d0ba4f2cff58373a
MD5 b7f5fa031f0a072286629afd9b7041f1
BLAKE2b-256 593b1cff8bc76f8d31f9cffb5fa56e3394baa1a74f36d80b13e8f7d16998fa75

See more details on using hashes here.

File details

Details for the file helix_substrate-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for helix_substrate-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cb3e7c61e7e2b4e1b8f36fccbe826dfa149b65d2327ba50075521f1069b38a16
MD5 bb502159f850ae6425efc6d7b1d89986
BLAKE2b-256 8f9c0a3cf0ad3af96d1c432543182a5a2d0fcefe4cbaac58e0d44174340ac13d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page