Bare-Metal AVX2 Inference Shield for LLMs
Project description
PROJECT RESIDUE: Bare-Metal AVX2 Inference Shield for LLMs
The ultimate real-time inference optimization tool, dropping LLM pre-filtering overhead to near-zero by completely bypassing the OS kernel and exploiting predicted AVX2 gating.
STATUS: V4.2.4 PRODUCTION READY - BARE METAL ISOLATION - REALITY-SYNCHRONIZED
The Origin: The Memory Wall
When processing massive sensor streams or high-frequency sparse data before they reach the GPU, traditional Python/NumPy logic suffers from catastrophic memory latency. Modern CPUs execute computations faster than the RAM can feed them, leading to L1 Cache starvation and rendering the execution pipeline useless.
Project Residue solves this by operating as a "Shield" right before the neural network.
By analyzing the structure, complexity, and sparsity of raw data via heuristics, Residue dictates if an input block is "dense enough" to wake up the GPU, or if it is "sparse/noise" and should just bypass execution entirely. To do this without becoming a bottleneck itself, Residue V4.2.4 was forged directly in C++ AVX2 with techniques usually reserved for High-Frequency Trading.
V4.2 Architecture Features (Reality-Synchronized Engine)
V4.2 closes the gap between the lab and production. By taking "real-world" bottlenecks like NUMA architecture, thermal throttling, and Python GC pauses into account, Residue is now an industrial-grade engine.
1. The Hardened Isolation Zone (OS Bypass + NUMA)
Residue completely removes the Operating System's scheduler from the hot path with safe degradation.
- Deterministic Memory Cascade: 3-Tier memory locking strategy (
VirtualLock->Huge Pages->PrefetchVirtualMemory) guarantees the highest possible memory priority without crashing if admin privileges are missing. - SMT-Aware Core Pinning: The
AsyncObserverthread detects Hyperthreading and locks itself to a specific physical core, actively avoiding contention with hardware thread siblings. - NUMA Topology Hinting: Automatically detects multi-socket layouts and logs a
CRITICAL WARNINGif the Python memory allocations map to a different CPU node than the worker thread, avoiding latency penalties over the Infinity Fabric/QPI bus.
2. Predicted Gating (Vectorized Full-Scan)
Traditional if/else statements for detecting noise severely punish instruction pipelines due to branch mispredictions. Residue V4.2 utilizes a purely predictive approach:
- Vectorized Full-Scan Gate: Instead of relying on heuristic sampling probes, the engine uses
_mm256_max_psto compute the Max-Abs value of every single float in the frame (1024 floats) in ~71 cycles. Zero false negatives. - Static Branch Prediction: Uses C++20
[[likely]]/[[unlikely]]attributes to dictate static layout. The Direct Branch trains the CPU's dynamic Branch Target Buffer (BTB) instantly, eliminating the 20-cycle penalty of indirect V-Table calls. - The Result: 2,178,336 FPS throughput on highly sparse data (a 14x performance boost over baseline heavy compute).
3. Asynchronous Lock-Free Ingestion & Intelligent Wait
Python acts purely as a data pipe, completely decoupled from the C++ worker logic.
- SPSC Ring Buffers: Python pushes data via a lock-free Single-Producer Single-Consumer queue.
recommended_push_size()ensures optimal cache line MESI coherency. - Atomic Telemetry & Backpressure: The background C++ thread reports real-time metrics (FPS, incoming sparsity %, buffer fill level %, frames dropped). Python can dynamically back off if
backpressure_activeturns true. - Adaptive Exponential Backoff: If the buffer empties (e.g. during a Python GC pause), the C++ worker gracefully decays its spin-wait (
_mm_pause->SwitchToThread), avoiding Thermal Throttling from 100% idle spinning, ensuring Turbo Boost headroom remains available.
Quick Start
Installation
Requires a C++17/C++20 compiler with AVX2 support (MSVC on Windows, GCC/Clang on Linux).
git clone https://github.com/project-residue/residue.git
cd residue
# Build and Install the V4.2.4 Engine
python setup.py build_ext --inplace
python setup.py install
# OR install directly from PyPI
pip install residue-protocol
Advanced Usage (Async Active Observer Mode)
For separating Python ingestion from the C++ processing thread (useful for pipelining before PyTorch runs):
import numpy as np
import time
from residue.core import AsyncObserver, print_isolation_report
# 1. Check OS Bypass Telemetry (SMT detection, Memory Tiers)
print_isolation_report()
# 2. Spawn Background Worker C++ Thread
observer = AsyncObserver(frame_size=1024, buffer_capacity_frames=10_000)
observer.start() # Enters Isolation Zone
# 3. Python pushes data Non-Blocking in optimal batches
data = np.random.randn(500 * 1024).astype(np.float32)
push_size = observer.recommended_push_size()
# Push data in chunks that minimize MESI Coherency traffic
for i in range(0, len(data), push_size):
chunk = data[i:i + push_size]
observer.push_data(chunk, len(chunk))
# 4. Read Lock-Free Telemetry with Backpressure
telemetry = observer.poll_telemetry()
print(f"Processed: {telemetry.total_samples_processed}")
print(f"Skipped: {telemetry.total_samples_skipped} ({telemetry.sparsity_pct:.1f}%)")
print(f"FPS: {telemetry.current_fps:.1f}")
if telemetry.backpressure_active:
print(f"WARNING: Buffer {telemetry.buffer_fill_pct:.1f}% full!")
if telemetry.total_frames_dropped > 0:
print(f"FATAL: Dropped {telemetry.total_frames_dropped} frames!")
# 5. Stop Worker
observer.stop()
Performance Validation
Tested on an AMD Ryzen 9 5900X (DDR4 3600MHz).
Framework: tests/test_dispatch_benchmark.py
| Sparsity (Silence) | Mode | Peak Throughput | Execution Speedup |
|---|---|---|---|
| 0% (Dense) | Baseline AVX2 Math | 148,523 FPS | 1.00x |
| 50% (Mixed) | Predicted Gating | 437,130 FPS | 2.94x |
| 90% (Sparse) | Predicted Gating | 1,370,433 FPS | 9.23x |
| 99% (Extreme) | Predicted Gating | 2,178,336 FPS | 14.67x |
Residue absorbs extreme inputs, skipping mathematical processing on irrelevant/sparse segments in O(1) time without stalling the pipeline.
License
MIT License - Free for commercial and research use.
See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file residue_protocol-4.2.4.tar.gz.
File metadata
- Download URL: residue_protocol-4.2.4.tar.gz
- Upload date:
- Size: 47.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a2e5c7cfbe4c3a7a339e40f5f8e3e209ff24cdd76686436457e3822238d360e
|
|
| MD5 |
b37c0bbdd866dfe8c9861e967019be30
|
|
| BLAKE2b-256 |
ad12fbd492aea338fb58c9de1c306c20a7a1e1cf2b1563c93fdf3f785e94abe4
|
Provenance
The following attestation bundles were made for residue_protocol-4.2.4.tar.gz:
Publisher:
publish-to-pypi.yml on Orest-gt/residue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
residue_protocol-4.2.4.tar.gz -
Subject digest:
4a2e5c7cfbe4c3a7a339e40f5f8e3e209ff24cdd76686436457e3822238d360e - Sigstore transparency entry: 1057469471
- Sigstore integration time:
-
Permalink:
Orest-gt/residue@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Branch / Tag:
refs/tags/v4.2.4 - Owner: https://github.com/Orest-gt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Trigger Event:
push
-
Statement type:
File details
Details for the file residue_protocol-4.2.4-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: residue_protocol-4.2.4-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 122.1 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e84d2ade1f8c051921761d0bc01bc44f290ca8617405ba9c5aab501a057c2ea
|
|
| MD5 |
f5df1b78c45a3d6d80a27bd5b12cc0e9
|
|
| BLAKE2b-256 |
25b198474edb2743276a15aa2efa1101e16b2410e27e272a15a6e1d27a97fcee
|
Provenance
The following attestation bundles were made for residue_protocol-4.2.4-cp312-cp312-win_amd64.whl:
Publisher:
publish-to-pypi.yml on Orest-gt/residue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
residue_protocol-4.2.4-cp312-cp312-win_amd64.whl -
Subject digest:
0e84d2ade1f8c051921761d0bc01bc44f290ca8617405ba9c5aab501a057c2ea - Sigstore transparency entry: 1057469477
- Sigstore integration time:
-
Permalink:
Orest-gt/residue@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Branch / Tag:
refs/tags/v4.2.4 - Owner: https://github.com/Orest-gt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Trigger Event:
push
-
Statement type:
File details
Details for the file residue_protocol-4.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: residue_protocol-4.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 187.4 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f34c00a7b4aa9cfa597973c4413fbbfc787542822eff6b67419b1bff5d88a998
|
|
| MD5 |
a6752958d3ab91bc489f75a2434d3ec2
|
|
| BLAKE2b-256 |
e0d7f0e5f2ead9f30d4cf44cf59f06a0ff852bed41c6a3fd407ba5a4905cca03
|
Provenance
The following attestation bundles were made for residue_protocol-4.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish-to-pypi.yml on Orest-gt/residue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
residue_protocol-4.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
f34c00a7b4aa9cfa597973c4413fbbfc787542822eff6b67419b1bff5d88a998 - Sigstore transparency entry: 1057469481
- Sigstore integration time:
-
Permalink:
Orest-gt/residue@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Branch / Tag:
refs/tags/v4.2.4 - Owner: https://github.com/Orest-gt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Trigger Event:
push
-
Statement type:
File details
Details for the file residue_protocol-4.2.4-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: residue_protocol-4.2.4-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 120.3 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
391b3f71ca31fbf67d7cb9f201de65b18a4085605454749e05322d2575ca3a52
|
|
| MD5 |
df83709ff855f8e969acc42bc81c0f27
|
|
| BLAKE2b-256 |
36ab96b6b31408cc16c64dd04cb4c3d369ada1cbd8d91fac4ca337ca2dcc789b
|
Provenance
The following attestation bundles were made for residue_protocol-4.2.4-cp311-cp311-win_amd64.whl:
Publisher:
publish-to-pypi.yml on Orest-gt/residue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
residue_protocol-4.2.4-cp311-cp311-win_amd64.whl -
Subject digest:
391b3f71ca31fbf67d7cb9f201de65b18a4085605454749e05322d2575ca3a52 - Sigstore transparency entry: 1057469474
- Sigstore integration time:
-
Permalink:
Orest-gt/residue@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Branch / Tag:
refs/tags/v4.2.4 - Owner: https://github.com/Orest-gt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Trigger Event:
push
-
Statement type:
File details
Details for the file residue_protocol-4.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: residue_protocol-4.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 187.8 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb35822ce13e80d9cafd1b7558813d0cc49042d50997c0a27fc833374561abb9
|
|
| MD5 |
b2435a64bb53d79966f85d030869f1ff
|
|
| BLAKE2b-256 |
3543f50ead56dad7b7947d053ebb7b5a41787999c17d19da546db2890a620054
|
Provenance
The following attestation bundles were made for residue_protocol-4.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish-to-pypi.yml on Orest-gt/residue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
residue_protocol-4.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
eb35822ce13e80d9cafd1b7558813d0cc49042d50997c0a27fc833374561abb9 - Sigstore transparency entry: 1057469483
- Sigstore integration time:
-
Permalink:
Orest-gt/residue@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Branch / Tag:
refs/tags/v4.2.4 - Owner: https://github.com/Orest-gt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Trigger Event:
push
-
Statement type:
File details
Details for the file residue_protocol-4.2.4-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: residue_protocol-4.2.4-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 119.4 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
447197b23e5fd418d5f33f43152580a4537d121f47a88d7998734380dd2da4e2
|
|
| MD5 |
d40e576b32adfde6b70013aecca68c5e
|
|
| BLAKE2b-256 |
2960d8c1ae4e7878f0a110b3d4fc91569dead06e33d1b645169d0d1119731fd5
|
Provenance
The following attestation bundles were made for residue_protocol-4.2.4-cp310-cp310-win_amd64.whl:
Publisher:
publish-to-pypi.yml on Orest-gt/residue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
residue_protocol-4.2.4-cp310-cp310-win_amd64.whl -
Subject digest:
447197b23e5fd418d5f33f43152580a4537d121f47a88d7998734380dd2da4e2 - Sigstore transparency entry: 1057469485
- Sigstore integration time:
-
Permalink:
Orest-gt/residue@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Branch / Tag:
refs/tags/v4.2.4 - Owner: https://github.com/Orest-gt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Trigger Event:
push
-
Statement type:
File details
Details for the file residue_protocol-4.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: residue_protocol-4.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 186.9 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46b25db7c964c7516293e6085612be567a4b2b284223dc34da8af59e3b5bf5fb
|
|
| MD5 |
3207862d9dd555a39d17af56562c163f
|
|
| BLAKE2b-256 |
d4c94ab50d95754e4c876215c8a3e9d022ff0a841066ac0afd610c8de42f43e3
|
Provenance
The following attestation bundles were made for residue_protocol-4.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish-to-pypi.yml on Orest-gt/residue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
residue_protocol-4.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
46b25db7c964c7516293e6085612be567a4b2b284223dc34da8af59e3b5bf5fb - Sigstore transparency entry: 1057469479
- Sigstore integration time:
-
Permalink:
Orest-gt/residue@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Branch / Tag:
refs/tags/v4.2.4 - Owner: https://github.com/Orest-gt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@e9e209fe67b0af9798b8e66b67638d2459bdd288 -
Trigger Event:
push
-
Statement type: