Block-level model patching with verifiable receipts

These details have not been verified by PyPI

Project links

Homepage

Project description

Helix CDC - Block-Level Model Patching with Cryptographic Receipts

Version: v0.1.2 License: Evaluation (see LICENSE) Status: Pilot-ready

What It Does

Helix CDC enables block-level patching of deterministically regenerable models with cryptographic receipts for auditability.

Key features:

84% fewer blocks written (4 blocks vs 32 blocks for typical patch)
Triple-run deterministic (same seed → same SHA256, verified)
Fail-closed MAC validation (rejects tampered overlays, no degraded mode)
Provenance-bound receipts (git_commit + impl_sha256 + cpu_flags)
CPU-first, GPU-opportunistic (automatic hardware routing)

Why this is safer (and not just smaller)

Helix-CDC is trying to solve a nasty real-world problem: once you can run powerful models locally, the unsafe part isn't the math — it's everything around it:

silent model drift
"works on my machine" claims
unbounded tool access (files/network)
unverifiable outputs and "trust me bro" deployments

We make that safer by design, using three ideas:

1) Proof before power (verifiable math correctness)

We don't ask you to trust the implementation. We give you a one-command way to prove it locally.

make prove-lite verifies the core forward-pass math (HF block0 oracle) + Tier 0 sanity.
make prove verifies full 32-block parity against HuggingFace + Tier 0/1 gates.
make prove-agent adds Tier 2 tool-use verification (sandbox + receipts).

This means contributors can reproduce the same claims with the same scripts, not vibes.

2) Receipts everywhere (tamper-evident execution)

Every meaningful run can emit receipts (JSON) that bind:

input prompt + config
model identifiers (paths/hashes)
routing decisions (depth/backends)
timings
file outputs (SHA256)
tool calls (what ran, with what args)

Receipts turn "it worked" into "here is exactly what happened." That makes debugging and security audits tractable.

3) Capability-gated tool use (sandboxed by default)

When you let a model use tools, the model becomes an actor. That's where safety goes off the rails if you don't lock things down.

Tier 2 adds a tool-use acceptance suite that runs only inside an isolated sandbox:

writes are restricted to a workspace directory
tool calls are allowlisted per scenario
runs are time-bounded
files touched are hashed into the receipt

So you can prove: "this agent can perform real tasks without being able to spray writes all over the machine or phone home."

Smaller, same behavior: the honest version

CDNA is a fidelity dial. It can compress model shards and still preserve behavior to a chosen threshold.

We treat "same" as measurable gates, not a promise:

Tier 0: shard/shape/build sanity
Tier 1: logit similarity vs HF oracle (cosine/top-K)
Tier 2: behavioral + tool-use acceptance (task success + sandbox compliance)

If it passes the gates, it's "same enough" for that tier — with receipts to prove it.

We're trying to make "AI OS" mean "auditable runtime" not "mysterious black box that can delete your home directory."

Quick Start (7 Minutes to PASS)

Clone and verify math correctness:

git clone https://github.com/voidstr3m33/helix-cdc.git
cd helix-cdc

# Fast verification (Block0 oracle + Tier 0)
make prove-lite

# Full suite (HF oracle + fidelity gates)
make prove

Expected output:

=== HF Block0 Oracle ===
VERDICT: PASS (cosine=1.0 all checkpoints)

=== HF Full Oracle (32 blocks) ===
VERDICT: PASS (logits cosine=0.99999, top-5 match)

=== CDNA Fidelity Gate (Tier 0/1) ===
Tier 0 Result: PASS
Tier 1 Result: PASS

PROVE DONE ✅

Legacy proofs (optional):

# Block-level deterministic CDC
python3 scripts/prove_033_real.py

# CPU/GPU hardware routing
python3 scripts/probe_hw_route.py

# Compressed-domain computing (98× speedup)
python3 scripts/bench_cc_receipt.py

# Symbolic Entropy (internal metric)
python3 scripts/se_receipt.py

See: REPRODUCE.md for full instructions

Verification Policy

We don't re-prove on demand. We ship receipts and a witness pack.

To verify:

tar xzf witness_pack.tgz
cd witness_pack
./reproduce.sh

Expected: Same superglyph_id, same plan_sha256, deterministic sha256

See: VERIFICATION_POLICY.md for full policy See: FINISH_LINE_COMPLETE.md for technical details

The receipts are court-ready. Run the witness pack. Full stop.

✅ PROVEN: Model Compression Pipeline (2026-01-25)

This is the production-ready workflow with verified receipts.

Compress a GGUF Model

# Compress to Hybrid CDNA v2 + outlier sidecars
python3 -m helix_cdc compress \
    --gguf model.gguf \
    --out seeds/my_model/

# Result:
#   seeds/my_model/
#     manifest.json        # Full manifest with tensor metadata
#     cdna/                # CDNA shards (.cdna.hxz files)
#     sidecars/            # HXZO outlier sidecars (.hxzo files)

Proven metrics:

Compression: 2.12x (14GB F16 → 6.6GB CDNA + 34MB sidecars)
Shards: 291/291 OK
Max error with sidecar: 0.0005 (PASS < 0.001 threshold)

Rebuild the Model

# Rebuild GGUF from manifest
python3 -m helix_cdc rebuild \
    --manifest seeds/my_model/manifest.json \
    --reference original.gguf \    # For 1D tensors (norms, biases)
    --out rebuilt.gguf

Proven metrics:

Tensors: 291/291 reconstructed
Functional: Paris ✓, H2O ✓, 1945 ✓, Pangram ✓

Verify Behavioral Equivalence

# Two-phase teacher-forced verification
python3 -m helix_cdc verify \
    --baseline original.gguf \
    --rebuilt rebuilt.gguf \
    --output receipts/verification/

Proven metrics (teacher-forced, 2026-01-24):

Metric	Threshold	Actual	Status
Teacher in top-100	≥99%	100%	✅ PASS
Teacher logit gap (mean)	small	0.36	✅ near-tie
Top-1 agreement	-	76.6%	⚠️ tie-sensitive
Mean KL	<0.5	0.43	✅ PASS

Verdict: ACCEPTABLE_WITH_TAIL_RISK — Distributions close, teacher always in top-K.

Key Receipts

receipts/fidelity_checks/cdna_shards_f16.sha256
receipts/fidelity_checks/fp8_rebuild_20260107.json
receipts/fidelity_checks/functional_equivalence_20260107.txt
receipts/hybrid_v2_behavioral_teacher/behavioral_gate_teacher_v3.json

Helix Native Inference (Experimental)

⚠️ NOTE (2026-01-25): The "millions:1 compression" claims below were DISPROVEN. DNA seeds expand to pseudo-random tensors, NOT original model weights. See CLAUDE.md for the honest status. Use the CDNA pipeline above for proven compression.

Experimental compressed inference:

python3 scripts/demo_helix_infer.py --prompt "Explain compression"

What it does (aspirational):

Loads from superglyph seed
Regenerates tensors on-demand
Generates receipt for every inference

⚠️ DISPROVEN CLAIMS:

~~2,867,000:1 compression~~ → Actually expansion, not compression
~~Self-contained regeneration~~ → Needs vault or codebook

What's PROVEN instead:

2.12x compression via CDNA (use helix compress above)
Behavioral equivalence verified (teacher 100% in top-K)

See: CLAUDE.md for honest proof status

Use Cases

Regulated AI (Banks, Gov, Healthcare)

Problem: Need auditable model updates with cryptographic proof

Solution: CDC-033 provides:

Per-block MAC validation (fail-closed)
Provenance-bound receipts (git_commit + impl_sha256)
Triple-run determinism (reproducible builds)
Acceptance gates (impl_pinned, determinism_ok, blocks_ratio_ok)

Pilot scope: $50-150k to wire receipt format into model-ops pipeline

Edge/Fleet Ops (Retail, Robots, Kiosks)

Problem: Need minimal-write updates for bandwidth-constrained devices

Solution: HB-001 provides:

Block-level writes (84% reduction)
CPU-only mode (GPU optional)
Automatic hardware routing + fallback
Tiny receipts (<2KB provenance)

Pilot scope: $25-75k for deployment integration

Model Vendors / LLM Platforms

Problem: Need optimization path for large model updates

Solution: CC-098 provides:

98× speedup operating on compressed data
No full decompression required
Block-level CDC avoids full recompress
Receipt-bound provenance for compliance

License: Per-model or per-cluster

Architecture

CDC-033: Block-Level Deterministic CDC

How it works:

Original blocks regenerated from seed (SHAKE256-based)
Writes store XOR delta (patched ⊕ original)
Delta stored as base64 with per-block HMAC-SHA256
MAC validated on read (fail-closed on mismatch)

Security:

Seed never exposed (only SHA256 in receipts)
MAC uses seed as HMAC key (integrity without seed exposure)
Fail-closed validation (no degraded mode)

Evidence:

KAT 1: Triple-run determinism verified
KAT 2: Golden receipt with provenance fields
Receipt: artifacts/attn_o_033_receipt.json

HB-001: CPU/GPU Hardware Routing

How it works:

Detect available hardware (CPU always, GPU if CUDA)
Route operations to fastest available backend
Graceful fallback if GPU unavailable

Benchmarks:

CPU: 0.18s (2048×2048 matmul)
GPU: 0.12s (4096×4096 matmul on Quadro T2000)

Evidence:

Receipt: artifacts/hw_route.receipt.json
Environment: PyTorch 2.5.1+cu121, CUDA 12.1

CC-098: Compressed-Domain Computing

How it works:

Operate on compressed data without full decompression
Base64 vectoring enables operations in compressed space
Block-level CDC avoids full recompress

Benchmarks:

Average speedup: 97.6× (1MB-8MB tests)
Memory reduction: 98%
Compression ratio: ~50:1 maintained

Evidence:

Receipt: artifacts/cc_098_receipt.json

Receipt Schema

Every receipt includes:

{
  "protocol_version": "helix_cdc:v0.1.0",
  "schema_version": "<receipt_type>:v1",
  "timestamp_utc": "2025-10-21T...",
  "claim": {
    "component": "<IP-ID>",
    "description": "...",
    "status": "GREEN"
  },
  "provenance": {
    "git_commit": "49b826a...",
    "impl_sha256": "...",
    "cpu_flags": "avx2,avx,sse4_2",
    "schema_sha256": "...",
    "deterministic_build": true
  },
  "acceptance_gates": {
    "impl_pinned": true,
    "determinism_ok": true,
    "passes": true
  }
}

Security Model

Fail-Closed by Default:

STRICT_MAC_VALIDATION = True
Overlay integrity enforced cryptographically
No silent fallback on MAC failure
See SECURITY.md for full details

Determinism Guarantees:

Same seed + label → same SHA256 (verified)
Environment: PYTHONHASHSEED=0 enforced
SHAKE256 with domain separation

Offline Mode:

No network I/O
No telemetry or analytics
Air-gap compatible

Installation

Requirements:

Python 3.10+
PyTorch 2.0+ (optional, for GPU benchmarks)

Install:

# Clone repo
git clone https://github.com/voidstr3m33/helix-cdc.git
cd helix-cdc

# Optional: Install PyTorch for GPU benchmarks
pip install torch

Verify:

# Run KATs
bash tests/kat/run_kats.sh

# Expected: ✅ All KATs passed (2/2)

Integration Example

from helix_cdc.block_api import _write_block_33, _read_block_33
from helix_cdc.receipts import generate_receipt

# Apply patch to specific block
def apply_patch(capsule, block_idx, modified_data, seed, label):
    # Write with MAC validation
    capsule = _write_block_33(
        capsule,
        block_idx,
        modified_data,
        seed,
        label,
        block_size=32768
    )
    return capsule

# Validate with fail-closed MAC
def read_and_validate(capsule, block_idx, seed, label):
    try:
        block = _read_block_33(capsule, block_idx, seed, label)
        return block
    except OverlayIntegrityError:
        # MAC validation failed - reject
        raise

See SUPPORT.md for more integration examples.

Documentation

Quick Start:

REPRODUCE.md - 3-proof validation guide (HW, CC, SE)
RELEASE_NOTES.md - Full v0.1.2 documentation

Security:

SECURITY.md - Fail-closed MAC, determinism gates, offline mode
LICENSE - Evaluation license terms

Support:

SUPPORT.md - What we support during pilot
GitHub Issues: Bug reports and feature requests

IP & Patents:

IP_REGISTER.md - 14 components documented (confidential)
DEFENSIVE_PUBLICATION.md - Patent strategy (confidential)

SBOM & Notices:

SBOM.cdx.json - Software Bill of Materials (CycloneDX format)
THIRD_PARTY_NOTICES.md - Third-party licenses and notices
These are also copied inside buyer/ for offline review

Signature Verification

We sign artifacts with Ed25519 for provenance and integrity.

Verify signatures:

python3 tools/sign_receipts.py verify --pubkey keys/ed25519_pub.pem
# Expected: "✅ verified: N | ❌ failed: 0"

Public key: Included in keys/ed25519_pub.pem

What's Proven (GREEN)

CDC-033: Block-level deterministic CDC ✅

Per-block MAC validation (fail-closed)
Triple-run determinism verified
Receipt: artifacts/attn_o_033_receipt.json

HB-001: CPU/GPU routing ✅

CPU: 0.18s, GPU: 0.12s (Quadro T2000)
Automatic hardware detection
Receipt: artifacts/hw_route.receipt.json

CC-098: Compressed computing ✅

97.6× average speedup measured
Receipt: artifacts/cc_098_receipt.json

SE-728: Symbolic Entropy (internal metric) ✅

728× improvement with scaffolding
Receipt: artifacts/se_728_receipt.json

FT-001: FlowTorch DLPack braiding ✅

Zero-copy PyTorch↔TensorFlow
Production proven with receipts

What's Wired (AMBER - Optional)

HB-002: Quantum-Classical Bridge 🟡

D-Wave library installed
QUBO solver supports cpu/gpu/qpu backends
Graceful fallback when QPU unavailable

HB-003: TPU/NPU Path 🟡

XLA/JAX integration ready
Graceful fallback when TPU/NPU unavailable

Note: Both bridges fall back to CPU/GPU automatically. Optional hardware support available on request.

Pilot Program

What's included:

4 proofs reproducible in <5 minutes
2 known-answer tests (KATs)
Receipt generators + validation
Integration examples
2-3 buyer-side engineers enabled
Weekly check-ins (30 minutes)

Pricing:

Regulated AI (audit-focused): $50-150k
Edge/Fleet Ops: $25-75k
Model Vendors: License per-model or per-cluster

Contact: [To be provided]

Known Limitations

QPU (HB-002):

D-Wave library installed but no provider job run yet
Stays AMBER until provider receipt captured
Bridge ready, graceful fallback works

TPU/NPU (HB-003):

XLA/JAX not installed
Stays AMBER until XLA run receipt captured
Interface mapped, graceful fallback works

See RELEASE_NOTES.md for full details.

Contributing

This is proprietary software. See LICENSE for evaluation terms.

For bug reports: https://github.com/voidstr3m33/helix-cdc/issues

Credits

Inventor: voidstr3m33 IP Ownership: voidstr3m33 (sole inventor, all rights retained)

See ACKNOWLEDGMENTS.md for development assistance and third-party dependencies.

License

Evaluation License - 90-day evaluation period. See LICENSE file for full terms.

No production use without commercial license. Contact for commercial licensing inquiries.

Pilot inquiries: pilots@helix-cdc.dev (replace with your contact)

Version: v0.1.2 Release Date: 2025-10-21 Git Tag: v0.1.2

Quantum Router with Se Overlay (VALIDATED 2025-10-24)

New: Quantitative control system for hybrid CPU/GPU/QPU routing using Symbolic Entropy (Se = H × C × D).

Key Result: 47-51% runtime improvement with Se-steered backend selection, validated with deterministic receipts.

One-Command Verification

# Full-stack validation (~30 seconds)
./scripts/run_fullstack_validation.sh
jq . artifacts/fullstack/FULL_STACK_REPORT.json

# 3-point Se sweep (~90 seconds)
./scripts/se_sweep_3point.sh
jq . artifacts/se_sweep/SWEEP_SUMMARY.json

# Deterministic replay
python3 tools/receipt_replay.py \
  --receipts artifacts/fullstack/baseline_neal.json,artifacts/fullstack/baseline_dwave.json,artifacts/fullstack/se_steered.json \
  --seed 42 --out artifacts/replay_verify.json

Se Formula

Se = H(X) × C(X) × D(X)

Where:
  H(X) = Shannon entropy (byte-level, 0-8 bits)
  C(X) = Contextual coherence (determinism, 0-1)
  D(X) = Dimensional depth (FibPi3D + graph, 1-72D)

Router Policy (Locked Thresholds)

# tools/quantum_router_se.py

SE_LOW_THRESHOLD = 10.0   # Below: IBM QAOA
SE_HIGH_THRESHOLD = 80.0  # Above: D-Wave SA aggressive

# Routing Rules:
Se < 10     → IBM QAOA (layers=2, shots=400, reads=4)
10 ≤ Se < 80 → D-Wave SA (sweeps=500, reads=8)
Se ≥ 80     → D-Wave SA (sweeps=1000, reads=16)

Validated Claims

1. Se as Control Signal: Se=3.96 → IBM QAOA selection (47% faster than D-Wave SA) Evidence: artifacts/fullstack/se_report.json, artifacts/fullstack/se_comparison.json

2. Runtime Improvement: 47-51% savings vs naive routing Evidence: A/B comparison with 1.7-1.9× efficiency gain

3. Deterministic Receipts: 100% replay match (seed=42, 14/14 receipts) Evidence: All receipts contain receipt_sha256, random_state_chain, provenance

4. Semantic I/O: Whitespace-invariant hashing survives perturbations Evidence: artifacts/fullstack/semantic_diff_*.json

5. Guardian Caps: 0 violations, hard limits enforced Evidence: MAX_QPU_TIME=60s, MAX_NUM_READS=1000, MAX_SWEEPS=500

Documentation

VERIFICATION_BUNDLE.md - Complete receipt inventory + replay protocol
NEXT_STEPS_COMPLETE.md - Validation summary + citation data
EXPERIMENTS_COMPLETE.md - Detailed experiment results

Total validation runtime: <3 minutes for complete reproducibility

License & Weights Policy

Core Generator: Business Source License 1.1 (BSL 1.1) Replay Pack: MIT License

Weights Policy: We do not ship vendor weights. The poetry panel uses your local vault seeds (CDC-compressed models). You are responsible for compliance with the licenses of any third-party model weights you use.

For compressed model seeds, see your ECHO_VAULT directory. Panel receipts include seed hashes (engine_ids.seed_sha256) for provenance.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.0

Mar 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helix_cdc-0.2.0.tar.gz (1.3 MB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

helix_cdc-0.2.0-py3-none-any.whl (1.4 MB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file helix_cdc-0.2.0.tar.gz.

File metadata

Download URL: helix_cdc-0.2.0.tar.gz
Upload date: Mar 6, 2026
Size: 1.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for helix_cdc-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8ad4dda3223a2ab1c85d7927771bcedf837e816b7194f048ad7569cc0e28a028`
MD5	`124f485afcba0fa781256b82a1618d13`
BLAKE2b-256	`8a4b01904a7eb82ed835cb181199dbb6c2abebcf2ca8b927f2eab08714ecec9d`

See more details on using hashes here.

File details

Details for the file helix_cdc-0.2.0-py3-none-any.whl.

File metadata

Download URL: helix_cdc-0.2.0-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 1.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for helix_cdc-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5685746d4d8453b9e2ba1a922c0ef390935bc07ea591e7deb54156985be5ebb4`
MD5	`86c0478eae451fa0b1913fe0455c1249`
BLAKE2b-256	`1cd025641e28132d825e57d625b2b1d2f10b6e33694b468c93f8c7ce82b48600`

See more details on using hashes here.

helix-cdc 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Helix CDC - Block-Level Model Patching with Cryptographic Receipts

What It Does

Why this is safer (and not just smaller)

1) Proof before power (verifiable math correctness)

2) Receipts everywhere (tamper-evident execution)

3) Capability-gated tool use (sandboxed by default)

Smaller, same behavior: the honest version

Quick Start (7 Minutes to PASS)

Verification Policy

✅ PROVEN: Model Compression Pipeline (2026-01-25)

Compress a GGUF Model

Rebuild the Model

Verify Behavioral Equivalence

Key Receipts

Helix Native Inference (Experimental)

Use Cases

Regulated AI (Banks, Gov, Healthcare)

Edge/Fleet Ops (Retail, Robots, Kiosks)

Model Vendors / LLM Platforms

Architecture

CDC-033: Block-Level Deterministic CDC

HB-001: CPU/GPU Hardware Routing

CC-098: Compressed-Domain Computing

Receipt Schema

Security Model

Installation

Integration Example

Documentation

Signature Verification

What's Proven (GREEN)

What's Wired (AMBER - Optional)

Pilot Program

Known Limitations

Contributing

Credits

License

Quantum Router with Se Overlay (VALIDATED 2025-10-24)

One-Command Verification

Se Formula

Router Policy (Locked Thresholds)

Validated Claims

Documentation

License & Weights Policy

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes