MemBank: Pointer-based neural activation memory for open-source LLMs
Project description
MemBank™
Pointer-based neural activation memory for open-source LLMs.
Instead of rescanning transcripts, MemBank stores content-addressed references to neural activations — giving any open-source model persistent, efficient long-term memory.
Built with HGNS (Hierarchical Gradient Number System) by chickenpie347 with contributions from Grok 3 (xAI).
Why MemBank?
Standard LLMs have no persistent memory beyond their context window. The typical workaround — stuffing transcripts back into the prompt — wastes tokens and compute on text the model has already "seen."
MemBank takes a different approach, inspired by how human memory actually works:
| Approach | What's stored | Retrieval | Token cost |
|---|---|---|---|
| Transcript stuffing | Raw text | Linear rescan | High — grows with history |
| Naive RAG | Text chunks + embeddings | Vector search | Medium — chunks per query |
| MemBank | Activation pointers | HGNS coarse-to-fine | Minimal — only relevant context |
The key insight: a pointer to an activation is ~64 bytes. The activation itself is ~16KB. You only need the full tensor for the final re-ranking step.
Architecture
User Query
│
▼
ModelAdapter.encode() ← any HF transformer or mock
│
▼
HGNSHierarchy.compress() ← three resolution levels
│ Level 0: full dim (e.g. 256-dim)
│ Level 1: ~33% dim (e.g. 84-dim)
│ Level 2: ~8% dim (e.g. 21-dim)
▼
QueryEngine.search() ← coarse-to-fine drill-down
│ Step 1: Level 2 → top candidate_k × top_k results (fast)
│ Step 2: Level 1 → re-rank candidates
│ Step 3: Level 0 → final top_k ranking (precise)
▼
Registry.get() + Buffer.deref() ← zero-copy pointer dereference
│
▼
RecallResult list
Core Components
| Module | Responsibility |
|---|---|
core/pointer.py |
PointerRecord — SHA256 content-addressed metadata (64B each) |
core/buffer.py |
MemMapBuffer — flat memmap file, zero-copy reads |
core/registry.py |
Registry — SQLite metadata store, dedup, GC, invalidation |
hgns/gradient.py |
HGNS gradient approximation: ∇f(x) ≈ [f(x+1/kˡ) - f(x)] / (1/kˡ) |
hgns/hierarchy.py |
HGNSHierarchy — three-level compression pipeline |
retrieval/index.py |
VectorIndex — FAISS (or numpy fallback) per HGNS level |
retrieval/query.py |
QueryEngine — multi-level drill-down search |
adapters/base.py |
ModelAdapter — abstract interface (3 methods) |
adapters/hf_adapter.py |
HuggingFaceAdapter — forward hook on any HF transformer |
adapters/mock_adapter.py |
MockAdapter — deterministic testing without GPU |
membank.py |
MemBank — public API: ingest(), recall(), integrate() |
Installation
Minimal (numpy only, no model loading):
pip install mambank
With FAISS for fast retrieval:
pip install mambank[retrieval]
With HuggingFace support:
pip install mambank[full]
From source:
git clone https://github.com/chickenpie347/mambank
cd mambank
pip install -e ".[full]"
Quick Start
With MockAdapter (no GPU, for testing)
from mambank import MemBank
from mambank.adapters.mock_adapter import MockAdapter
bank = MemBank(adapter=MockAdapter(hidden_dim=256))
# Ingest — compresses to 3 HGNS levels, stores content-addressed pointers
bank.ingest("The HGNS framework introduces recursive sub-steps between integers.")
bank.ingest("Quantum chaos is tamed by iterative HGNS refinement.")
bank.ingest("MemBank stores activations as zero-copy pointer references.")
# Recall — coarse-to-fine HGNS retrieval
results = bank.recall("HGNS butterfly effect", top_k=3)
for r in results:
print(f"[{r.rank}] score={r.final_score:.4f} chunk={r.source_text_hash[:16]}")
# Integrate — recall + ingest in one call (active working memory)
output = bank.integrate("How does HGNS suppress chaotic sensitivity?", top_k=2)
print(output["augmented_prompt"])
With a Real HuggingFace Model
from mambank import MemBank
from mambank.adapters.hf_adapter import HuggingFaceAdapter
# Loads GPT-2 (117M) — replace with any HF model
adapter = HuggingFaceAdapter("gpt2")
adapter.warmup()
bank = MemBank(
adapter=adapter,
buffer_path="./my_memory.mmap", # persistent across sessions
registry_path="./my_registry.db", # persistent across sessions
)
bank.ingest("Turn 1: We discussed the HGNS paper on recursive tensor systems.")
bank.ingest("Turn 2: The butterfly effect is suppressed via adaptive resolution.")
results = bank.recall("What did we discuss about HGNS?", top_k=3)
Persistent Memory (Across Sessions)
# Session 1 — build the memory
bank = MemBank(
adapter=HuggingFaceAdapter("gpt2"),
buffer_path="./memory.mmap",
registry_path="./memory.db",
)
bank.ingest("Important fact: HGNS was invented in September 2025.")
bank.close()
# Session 2 — restore and recall
bank = MemBank(
adapter=HuggingFaceAdapter("gpt2"),
buffer_path="./memory.mmap",
registry_path="./memory.db",
)
bank.rebuild_indexes() # Rebuilds in-memory FAISS indexes from persisted registry
results = bank.recall("When was HGNS invented?", top_k=1)
Conversational Auto-Population
# Every integrate() call adds the new turn AND retrieves relevant past context
for user_message in conversation:
output = bank.integrate(user_message, top_k=3)
# output["augmented_prompt"] contains:
# [MemBank Context]
# [1] (score=0.9231) chunk:a3f9c2d1e4b5...
# [2] (score=0.8874) chunk:7bc2f1a3d9e0...
# [Query]
# <user_message>
# Feed output["augmented_prompt"] to your LLM
response = llm.generate(output["augmented_prompt"])
bank.ingest(response) # Also ingest the model's response
API Reference
MemBank
MemBank(
adapter, # ModelAdapter instance (required)
buffer_path="./mambank_buffer.mmap",
registry_path=":memory:", # Use a file path for persistence
buffer_capacity_bytes=128*1024*1024,
hgns_k=10, # HGNS base (graduations per level)
hgns_gradient_levels=4, # Gradient recursion depth
candidate_multiplier=5, # L2 over-fetch factor
min_recall_score=0.0, # Discard results below this score
auto_validate_adapter=True,
)
| Method | Description |
|---|---|
ingest(text, metadata=None) |
Encode, compress, store. Returns {ptr_ids, is_new, dims, ingest_ms} |
recall(query, top_k=5, levels=None) |
Find top-k relevant activations. Returns List[RecallResult] |
integrate(text, top_k=3) |
recall() + ingest() in one call. Returns {query, recalled, augmented_prompt, ingest_result} |
rebuild_indexes() |
Restore search indexes after reopening a persistent store |
gc() |
Collect dead pointers (ref_count ≤ 0) |
stats() |
Full telemetry dict |
close() |
Flush and close buffer + registry |
RecallResult
| Field | Type | Description |
|---|---|---|
rank |
int |
0-indexed position in result list |
final_score |
float |
Cosine similarity at finest available HGNS level |
score_l0/l1/l2 |
float |
Per-level similarity scores |
ptr_level0/1/2 |
PointerRecord |
Pointer at each level (for direct buffer access) |
source_text_hash |
str |
SHA256 of original text chunk |
best_ptr |
PointerRecord |
Finest-resolution pointer available |
ModelAdapter (implement to add a new model)
from mambank.adapters.base import ModelAdapter
import numpy as np
class MyAdapter(ModelAdapter):
@property
def hidden_dim(self) -> int:
return 768 # your model's embedding dimension
def model_id(self) -> str:
return "my-model-v1" # stable across restarts
def encode(self, text: str) -> np.ndarray:
# Run your model, return shape (hidden_dim,) float32 array
return my_model.encode(text).astype(np.float32)
HGNS Integration
MemBank's compression is built on the Hierarchical Gradient Number System (HGNS) from "A Recursive Framework for Multiscale Tensor Calculations Tames the Butterfly Effect" (chickenpie347, Sep 2025).
The key idea: instead of fixed-width projections (PCA, linear layers), MemBank uses HGNS gradient magnitude to select the most information-dense dimensions of each activation — content-aware, parameter-free, no training required.
HGNS gradient approximation (from the paper):
∇f(x) ≈ [f(x + 1/kˡ) - f(x)] / (1/kˡ)
Multi-level gradient (geometric decay across levels):
∇ᴸf = Σₗ₌₁ᴸ ∂⁽ˡ⁾f / kˡ
Attribute convergence (from Classical__Quantum paper):
v_{n+1} = v_n + η ∇_attr v_n, stop when ‖v_{n+1} - v_n‖ < ε
The three HGNS levels in MemBank map naturally to the HGNS number representation:
- Level 0 (integer part
n): full activation, maximum fidelity - Level 1 (first sub-steps
m₁/k): sentence-level compressed representation - Level 2 (coarse
m₂/k²): topic-level summary, fast initial search
Benchmark Results
Measured on CPU with numpy fallback (no FAISS, no GPU), hidden_dim=256, 500-chunk corpus.
Throughput & Latency
| Metric | Value |
|---|---|
| Ingest throughput | ~165 chunks/sec |
| Recall p50 latency | 15.6 ms |
| Recall p99 latency | 26.7 ms |
Memory Efficiency (256-dim model)
| Level | Dimensions | Size/activation | vs. Full |
|---|---|---|---|
| Level 0 (full) | 256 | 1.0 KB | 100% |
| Level 1 (sentence) | 84 | 336 B | 33% |
| Level 2 (topic) | 21 | 84 B | 8% |
Installing
faiss-cpudrops recall latency to ~1–2 ms for corpora up to 100k entries.
HGNS Level Contribution
| Search config | Precision@5 (vs naive RAG) |
|---|---|
| L2 only (coarse) | 8% |
| L2 + L1 (mid) | 12% |
| L2 + L1 + L0 (full drill-down) | 16% |
Accuracy improves 2× from coarse-only to full drill-down, confirming the HGNS hierarchy adds retrieval value at each level. Accuracy numbers reflect MockAdapter semantics — real HF models with proper embeddings achieve significantly higher overlap with naive RAG.
Project Structure
mambank/
├── mambank/
│ ├── __init__.py ← public API: from mambank import MemBank
│ ├── membank.py ← MemBank main class
│ ├── core/
│ │ ├── pointer.py ← PointerRecord, content hashing
│ │ ├── buffer.py ← MemMapBuffer, zero-copy storage
│ │ └── registry.py ← SQLite metadata, dedup, GC
│ ├── hgns/
│ │ ├── gradient.py ← HGNS gradient math from paper
│ │ └── hierarchy.py ← Three-level compression pipeline
│ ├── retrieval/
│ │ ├── index.py ← VectorIndex (FAISS / numpy)
│ │ └── query.py ← QueryEngine, coarse-to-fine search
│ └── adapters/
│ ├── base.py ← Abstract ModelAdapter interface
│ ├── hf_adapter.py ← HuggingFace transformer adapter
│ └── mock_adapter.py ← Deterministic testing adapter
├── tests/
│ ├── test_week1.py ← pointer, buffer, registry (36 tests)
│ ├── test_week2.py ← HGNS, adapters (140 tests)
│ └── test_week3.py ← retrieval, MemBank e2e (87 tests)
├── benchmarks/
│ └── benchmark.py ← throughput, latency, accuracy
├── setup.py
└── README.md
Running Tests
# All tests (requires pytest)
pytest tests/ -v
# Individual weeks
python tests/test_week2.py
python tests/test_week3.py
# Benchmarks
python benchmarks/benchmark.py
Test coverage: 263 tests, 0 failures across pointer system, HGNS hierarchy, model adapters, retrieval, and end-to-end pipeline.
Roadmap
- Week 4: PyPI release, CI/CD, GitHub Actions
- Async ingest for non-blocking pipeline integration
- IVF index auto-upgrade for corpora > 10k entries
- Buffer defragmentation / slot reclamation for GC'd pointers
- HGNS-guided clustering for topic-level memory organisation
- Adapters for vLLM, llama.cpp, Ollama
- Multi-modal support (image + text activations)
- Quantum hardware implementation (per HGNS paper §5)
Citation
If you use MemBank or HGNS in your research:
@software{membank2025,
author = {chickenpie347},
title = {MemBank: Pointer-Based Neural Activation Memory for Open-Source LLMs},
year = {2025},
url = {https://github.com/chickenpie347/mambank},
}
@article{hgns2025,
author = {chickenpie347 and {Grok 3 (xAI)}},
title = {The Hierarchical Gradient Number System: A Recursive Framework
for Multiscale Tensor Calculations Tames the Butterfly Effect},
year = {2025},
note = {arXiv:XXXX.XXXXX [physics.comp-ph]},
}
License
MIT © chickenpie347
Core HGNS mathematics from "The Hierarchical Gradient Number System" papers, 2025.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mambank-0.1.0a0.tar.gz.
File metadata
- Download URL: mambank-0.1.0a0.tar.gz
- Upload date:
- Size: 35.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62153901982cb5d431bf05427f335b6141c9f9734506a8a4e5f9e385df36716d
|
|
| MD5 |
87d60aa453f4cd8039de53157083d94b
|
|
| BLAKE2b-256 |
89d3b523bfac72710406eef43ce5c79f53b54dc8c5d006cac7f46f5c3b843608
|
File details
Details for the file mambank-0.1.0a0-py3-none-any.whl.
File metadata
- Download URL: mambank-0.1.0a0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab6f18fa08d9a02cf96d312823503dfc4b80ba2aa512897d0dd311a8ca66dce5
|
|
| MD5 |
ea5a415e059ffbdb1c53d9be05b7cdf6
|
|
| BLAKE2b-256 |
ce223120ae09f761b95d749ec1a0d4299726a029bde22e1ff90cb2a77a4e1fc8
|