Skip to main content

Bare-Metal AVX2 Inference Shield for LLMs

Project description

PROJECT RESIDUE: Bare-Metal AVX2 Inference Shield for LLMs

Version Platform License

The ultimate real-time inference optimization tool, dropping LLM pre-filtering overhead to near-zero by completely bypassing the OS kernel and exploiting predicted AVX2 gating.
STATUS: V4.2.4 PRODUCTION READY - BARE METAL ISOLATION - REALITY-SYNCHRONIZED


The Origin: The Memory Wall

When processing massive sensor streams or high-frequency sparse data before they reach the GPU, traditional Python/NumPy logic suffers from catastrophic memory latency. Modern CPUs execute computations faster than the RAM can feed them, leading to L1 Cache starvation and rendering the execution pipeline useless.

Project Residue solves this by operating as a "Shield" right before the neural network.

By analyzing the structure, complexity, and sparsity of raw data via heuristics, Residue dictates if an input block is "dense enough" to wake up the GPU, or if it is "sparse/noise" and should just bypass execution entirely. To do this without becoming a bottleneck itself, Residue V4.2.4 was forged directly in C++ AVX2 with techniques usually reserved for High-Frequency Trading.


V4.2 Architecture Features (Reality-Synchronized Engine)

V4.2 closes the gap between the lab and production. By taking "real-world" bottlenecks like NUMA architecture, thermal throttling, and Python GC pauses into account, Residue is now an industrial-grade engine.

1. The Hardened Isolation Zone (OS Bypass + NUMA)

Residue completely removes the Operating System's scheduler from the hot path with safe degradation.

  • Deterministic Memory Cascade: 3-Tier memory locking strategy (VirtualLock -> Huge Pages -> PrefetchVirtualMemory) guarantees the highest possible memory priority without crashing if admin privileges are missing.
  • SMT-Aware Core Pinning: The AsyncObserver thread detects Hyperthreading and locks itself to a specific physical core, actively avoiding contention with hardware thread siblings.
  • NUMA Topology Hinting: Automatically detects multi-socket layouts and logs a CRITICAL WARNING if the Python memory allocations map to a different CPU node than the worker thread, avoiding latency penalties over the Infinity Fabric/QPI bus.

2. Predicted Gating (Vectorized Full-Scan)

Traditional if/else statements for detecting noise severely punish instruction pipelines due to branch mispredictions. Residue V4.2 utilizes a purely predictive approach:

  • Vectorized Full-Scan Gate: Instead of relying on heuristic sampling probes, the engine uses _mm256_max_ps to compute the Max-Abs value of every single float in the frame (1024 floats) in ~71 cycles. Zero false negatives.
  • Static Branch Prediction: Uses C++20 [[likely]]/[[unlikely]] attributes to dictate static layout. The Direct Branch trains the CPU's dynamic Branch Target Buffer (BTB) instantly, eliminating the 20-cycle penalty of indirect V-Table calls.
  • The Result: 2,178,336 FPS throughput on highly sparse data (a 14x performance boost over baseline heavy compute).

3. Asynchronous Lock-Free Ingestion & Intelligent Wait

Python acts purely as a data pipe, completely decoupled from the C++ worker logic.

  • SPSC Ring Buffers: Python pushes data via a lock-free Single-Producer Single-Consumer queue. recommended_push_size() ensures optimal cache line MESI coherency.
  • Atomic Telemetry & Backpressure: The background C++ thread reports real-time metrics (FPS, incoming sparsity %, buffer fill level %, frames dropped). Python can dynamically back off if backpressure_active turns true.
  • Adaptive Exponential Backoff: If the buffer empties (e.g. during a Python GC pause), the C++ worker gracefully decays its spin-wait (_mm_pause -> SwitchToThread), avoiding Thermal Throttling from 100% idle spinning, ensuring Turbo Boost headroom remains available.

Quick Start

Installation

Requires a C++17/C++20 compiler with AVX2 support (MSVC on Windows, GCC/Clang on Linux).

git clone https://github.com/project-residue/residue.git
cd residue

# Build and Install the V4.2.4 Engine
python setup.py build_ext --inplace
python setup.py install
# OR install directly from PyPI
pip install residue-protocol

Advanced Usage (Async Active Observer Mode)

For separating Python ingestion from the C++ processing thread (useful for pipelining before PyTorch runs):

import numpy as np
import time
from residue.core import AsyncObserver, print_isolation_report

# 1. Check OS Bypass Telemetry (SMT detection, Memory Tiers)
print_isolation_report()

# 2. Spawn Background Worker C++ Thread
observer = AsyncObserver(frame_size=1024, buffer_capacity_frames=10_000)
observer.start()  # Enters Isolation Zone

# 3. Python pushes data Non-Blocking in optimal batches
data = np.random.randn(500 * 1024).astype(np.float32)
push_size = observer.recommended_push_size()

# Push data in chunks that minimize MESI Coherency traffic
for i in range(0, len(data), push_size):
    chunk = data[i:i + push_size]
    observer.push_data(chunk, len(chunk))

# 4. Read Lock-Free Telemetry with Backpressure
telemetry = observer.poll_telemetry()
print(f"Processed: {telemetry.total_samples_processed}")
print(f"Skipped: {telemetry.total_samples_skipped} ({telemetry.sparsity_pct:.1f}%)")
print(f"FPS: {telemetry.current_fps:.1f}")

if telemetry.backpressure_active:
    print(f"WARNING: Buffer {telemetry.buffer_fill_pct:.1f}% full!")
if telemetry.total_frames_dropped > 0:
    print(f"FATAL: Dropped {telemetry.total_frames_dropped} frames!")

# 5. Stop Worker
observer.stop()

Performance Validation

Tested on an AMD Ryzen 9 5900X (DDR4 3600MHz). Framework: tests/test_dispatch_benchmark.py

Sparsity (Silence) Mode Peak Throughput Execution Speedup
0% (Dense) Baseline AVX2 Math 148,523 FPS 1.00x
50% (Mixed) Predicted Gating 437,130 FPS 2.94x
90% (Sparse) Predicted Gating 1,370,433 FPS 9.23x
99% (Extreme) Predicted Gating 2,178,336 FPS 14.67x

Residue absorbs extreme inputs, skipping mathematical processing on irrelevant/sparse segments in O(1) time without stalling the pipeline.


License

MIT License - Free for commercial and research use. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

residue_protocol-4.2.4.tar.gz (47.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

residue_protocol-4.2.4-cp312-cp312-win_amd64.whl (122.1 kB view details)

Uploaded CPython 3.12Windows x86-64

residue_protocol-4.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (187.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

residue_protocol-4.2.4-cp311-cp311-win_amd64.whl (120.3 kB view details)

Uploaded CPython 3.11Windows x86-64

residue_protocol-4.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (187.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

residue_protocol-4.2.4-cp310-cp310-win_amd64.whl (119.4 kB view details)

Uploaded CPython 3.10Windows x86-64

residue_protocol-4.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (186.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file residue_protocol-4.2.4.tar.gz.

File metadata

  • Download URL: residue_protocol-4.2.4.tar.gz
  • Upload date:
  • Size: 47.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for residue_protocol-4.2.4.tar.gz
Algorithm Hash digest
SHA256 4a2e5c7cfbe4c3a7a339e40f5f8e3e209ff24cdd76686436457e3822238d360e
MD5 b37c0bbdd866dfe8c9861e967019be30
BLAKE2b-256 ad12fbd492aea338fb58c9de1c306c20a7a1e1cf2b1563c93fdf3f785e94abe4

See more details on using hashes here.

Provenance

The following attestation bundles were made for residue_protocol-4.2.4.tar.gz:

Publisher: publish-to-pypi.yml on Orest-gt/residue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file residue_protocol-4.2.4-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for residue_protocol-4.2.4-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 0e84d2ade1f8c051921761d0bc01bc44f290ca8617405ba9c5aab501a057c2ea
MD5 f5df1b78c45a3d6d80a27bd5b12cc0e9
BLAKE2b-256 25b198474edb2743276a15aa2efa1101e16b2410e27e272a15a6e1d27a97fcee

See more details on using hashes here.

Provenance

The following attestation bundles were made for residue_protocol-4.2.4-cp312-cp312-win_amd64.whl:

Publisher: publish-to-pypi.yml on Orest-gt/residue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file residue_protocol-4.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for residue_protocol-4.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f34c00a7b4aa9cfa597973c4413fbbfc787542822eff6b67419b1bff5d88a998
MD5 a6752958d3ab91bc489f75a2434d3ec2
BLAKE2b-256 e0d7f0e5f2ead9f30d4cf44cf59f06a0ff852bed41c6a3fd407ba5a4905cca03

See more details on using hashes here.

Provenance

The following attestation bundles were made for residue_protocol-4.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish-to-pypi.yml on Orest-gt/residue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file residue_protocol-4.2.4-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for residue_protocol-4.2.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 391b3f71ca31fbf67d7cb9f201de65b18a4085605454749e05322d2575ca3a52
MD5 df83709ff855f8e969acc42bc81c0f27
BLAKE2b-256 36ab96b6b31408cc16c64dd04cb4c3d369ada1cbd8d91fac4ca337ca2dcc789b

See more details on using hashes here.

Provenance

The following attestation bundles were made for residue_protocol-4.2.4-cp311-cp311-win_amd64.whl:

Publisher: publish-to-pypi.yml on Orest-gt/residue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file residue_protocol-4.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for residue_protocol-4.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eb35822ce13e80d9cafd1b7558813d0cc49042d50997c0a27fc833374561abb9
MD5 b2435a64bb53d79966f85d030869f1ff
BLAKE2b-256 3543f50ead56dad7b7947d053ebb7b5a41787999c17d19da546db2890a620054

See more details on using hashes here.

Provenance

The following attestation bundles were made for residue_protocol-4.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish-to-pypi.yml on Orest-gt/residue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file residue_protocol-4.2.4-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for residue_protocol-4.2.4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 447197b23e5fd418d5f33f43152580a4537d121f47a88d7998734380dd2da4e2
MD5 d40e576b32adfde6b70013aecca68c5e
BLAKE2b-256 2960d8c1ae4e7878f0a110b3d4fc91569dead06e33d1b645169d0d1119731fd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for residue_protocol-4.2.4-cp310-cp310-win_amd64.whl:

Publisher: publish-to-pypi.yml on Orest-gt/residue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file residue_protocol-4.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for residue_protocol-4.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 46b25db7c964c7516293e6085612be567a4b2b284223dc34da8af59e3b5bf5fb
MD5 3207862d9dd555a39d17af56562c163f
BLAKE2b-256 d4c94ab50d95754e4c876215c8a3e9d022ff0a841066ac0afd610c8de42f43e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for residue_protocol-4.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish-to-pypi.yml on Orest-gt/residue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page