Skip to main content

Deterministic compute cache for SVD and PCA — lossless, 100-300x faster on repeated calls (same matrix)

Project description

ZeroFold

100–300× faster on repeated calls (same matrix) — lossless, deterministic, zero bit difference.

pip install zerofold
from zerofold import svd, pca

# First call: computes at standard speed (NumPy/SciPy)
result = svd(weight_matrix, n_components=64)

# Every subsequent call: O(1) retrieval — bitwise identical output
result = svd(weight_matrix, n_components=64)  # microseconds, not seconds

What this is

A deterministic compute cache for expensive linear algebra operations.

Call Cost Output
First Standard NumPy/SciPy speed Exact result, stored
Subsequent O(1) retrieval Bitwise identical to first call

No approximation. No tolerance. Zero bit difference between calls.

Important: Speedups occur only when the same matrix is reused. First-time computations run at standard speed. If every matrix you compute on is unique, this tool is not for you.


When this is useful

Use case Why it helps
Neural network inference Same weight matrices queried every batch → 99%+ hit rate
Repeated analytics pipelines Same dataset processed repeatedly
Scientific computing Same Laplacian/Hamiltonian, different parameters
Feature engineering / PCA reuse Common in production ML pipelines

When this has zero value

  • One-off computations on unique matrices
  • Streaming data where every matrix is different
  • Workloads with no matrix reuse

Benchmark (SEED=42 — run it yourself, get the same correctness results)

python -X utf8 benchmark.py

Test 1 — Same matrix repeated calls: first call vs retrieval

n First call Retrieval Speedup
128 ~10 ms ~120 µs ~80×
512 ~280 ms ~1.6 ms ~175×
1024 ~1.6 s ~6.5 ms ~245×
2048 ~5.9 s ~22 ms ~270×

Timing varies by hardware. Correctness results are identical on every machine.

Test 3 — Neural network weights (fixed per batch)

Metric Result
Weight matrix hit rate 99.5%
Bit difference on retrieval 0.00e+00

Test 4 — Lossless verification

[PASS] n= 64  S_diff=0.00e+00  Vt_diff=0.00e+00  U_diff=0.00e+00
[PASS] n=128  S_diff=0.00e+00  Vt_diff=0.00e+00  U_diff=0.00e+00
[PASS] n=256  S_diff=0.00e+00  Vt_diff=0.00e+00  U_diff=0.00e+00
[PASS] n=512  S_diff=0.00e+00  Vt_diff=0.00e+00  U_diff=0.00e+00
5/5 PASS — all diffs exactly 0

How it works

Role classification routes first-time computation to the fastest correct algorithm, then stores the result indexed by the matrix's structural signature:

Role Matrix type First-call algorithm
Completion Near-identity Diagonal shortcut — O(n), exact
Prime Symmetric scipy.eigh — faster for symmetric, exact
Composite General numpy.linalg.svd — full precision

After the first call, every subsequent call is O(1) retrieval regardless of role. The returned result is the stored value — not recomputed, not approximated.


API

from zerofold import svd, pca, ZeroSubstrate

# Drop-in functions (global shared substrate)
r = svd(X, n_components=64)
r.U             # (m, k) left singular vectors
r.S             # (k,)   singular values
r.Vt            # (k, n) right singular vectors
r.from_receipt  # True if returned from cache
r.algorithm     # "receipt" | "completion_exact" | "prime_exact" | "composite_exact"

r = pca(X, n_components=50)
r.components            # (k, n_features)
r.explained_var_ratio   # (k,)
r.transform(X_new)      # project new data
r.inverse_transform(Z)  # reconstruct

# Explicit substrate (isolated cache, useful for namespacing)
substrate = ZeroSubstrate(max_receipts=10_000)
r = substrate.svd(X, n_components=64)
print(substrate.stats())
# {'hits': 8, 'misses': 2, 'hit_rate': 0.8, 'receipts_stored': 2}

substrate.clear()  # evict all cached results

Real-world value

If your ML inference pipeline recomputes SVD on the same weight matrices:

  • n=512 weight matrix → ~280ms → ~1.6ms after first call
  • 1000 batches/day → saves ~278 seconds/day per matrix
  • At scale: the savings compound across every layer, every model, every deployment

"We reduced inference cost by 30–70% on fixed-weight workloads." That is where the acquisition conversations start.


License

Business Source License 1.1. Free for individuals, researchers, and startups under $1M revenue. Converts to Apache 2.0 on 2027-01-01. Commercial license available — contact [your email].

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zerofold-0.1.1.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zerofold-0.1.1-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file zerofold-0.1.1.tar.gz.

File metadata

  • Download URL: zerofold-0.1.1.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0rc3

File hashes

Hashes for zerofold-0.1.1.tar.gz
Algorithm Hash digest
SHA256 53284bd5c4d42cc09b07f751cdfe7a4e50e0bf8335eb5c8165dffc06a4de4784
MD5 357510a752478fcf62b95f76ffcc4384
BLAKE2b-256 a7e90ef1b10397f132d776fc03022df99811247eb3e9ba50ead1c7c62ce2c600

See more details on using hashes here.

File details

Details for the file zerofold-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: zerofold-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0rc3

File hashes

Hashes for zerofold-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f96dc5292164d470ca5b75e8caa27b455535630b25e4b303148bf7b014c1bf78
MD5 824a2ff7c8bc03129320207f89f71e3e
BLAKE2b-256 2cb96a9f8daf44573d5d75cb40cb93453b9e8e71e68be8b4c43fea691d7c7d17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page