Asynchronous Self-Healing KV Cache for Silicon-Native LLMs by GDI Nexus
Project description
ASH-KV: Dynamic Attention Steering & KV-Cache Integrity Middleware
ASH-KV (Asynchronous Self-Healing KV Cache) is a high-performance middleware layer designed for Runtime Manifold Integrity Enforcement. It leverages silicon-native kernels to monitor the mathematical uncertainty (Varentropy) of the Attention Manifold and surgically prunes logical drift at the hardware level.
Technical Methodology
Varentropy-Proxy Monitoring
ASH-KV implements a deterministic uncertainty detector by analyzing the mathematical variance across the KV-Cache. This real-time analysis identifies Manifold Collapse—the mathematical state where a model's transition probability distribution becomes unstable—allowing for intervention before semantic errors materialize.
Real-Time Attention Steering
When uncertainty exceeds the threshold, ASH-KV executes a Gaussian Manifold Mutation directly within the compute graph.
- Apple Silicon: Leverages
@mx.compileFused Metal kernels for sub-millisecond mutation. - NVIDIA: Implements synchronized PyTorch/CUDA tensor operations via the Hardware Abstraction Layer (HAL).
- Latency: Verified at < 0.9ms on Apple M4 hardware (negligible inference overhead).
Infinite Horizon (NVMe Paging)
To bypass physical VRAM constraints, ASH-KV utilizes an LRU-based paging protocol. Inactive context chunks are offloaded to NVMe storage using zero-copy memory mapping, enabling 100k+ token windows on consumer-grade hardware.
API Reference
protect(model, sensitivity=0.85, critic_model_path=None)
Initializes the ASH-KV Hypervisor for a given neural model.
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
nn.Module |
Required | An MLX or PyTorch model instance. |
sensitivity |
float |
0.85 |
The Varentropy threshold (0.0 to 1.0). |
critic_model_path |
str |
None |
Optional path for ANE-accelerated manifold critics. |
Research & Reproducibility
Our benchmarks use time.perf_counter_ns() to track the exact overhead of the Fused Metal Mutations.
ash-kv install # Platform driver verification
ash-kv benchmark # Unified Latency & Integrity suite
Hardware Abstraction Layer (HAL)
MLXHealer: Fused Metal backends for macOS.CudaHealer: Synchronized tensor backends for NVIDIA/Linux.UniversalTensorCritic: Pure mathematical manifold evaluation.
DISCLAIMER
ASH-KV is a probabilistic reliability layer. It is NOT a substitute for professional clinical or legal judgment. All AI-generated outputs must be verified by qualified human professionals.
© 2026 GDI Nexus Software Solutions LLP. All rights reserved.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_ash_kv-8.3.2.tar.gz.
File metadata
- Download URL: mlx_ash_kv-8.3.2.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7c001211e92c873e726911006579d5c14420c283591d1b2e71e2ca0caf378d7
|
|
| MD5 |
7cf977dec4a45f6ba425a8ee5270efee
|
|
| BLAKE2b-256 |
14db869ede9e6843f0d7f1bf9ad01338285e286d8907ff22ced2c79423c607da
|
File details
Details for the file mlx_ash_kv-8.3.2-py3-none-any.whl.
File metadata
- Download URL: mlx_ash_kv-8.3.2-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4c7686c7eb13453e1c032942a93587abd9bbf309911cb1676072f73801112b4
|
|
| MD5 |
4d6005bd01d3658732866bda5af2633d
|
|
| BLAKE2b-256 |
dbf795d06aae3e4ad16b53ec441f70fe0ac136791cbd7a9fe5a4ba2e1e454d10
|