Skip to main content

MemBank: Pointer-based neural activation memory for open-source LLMs

Project description

MemBank™

Pointer-based neural activation memory for open-source LLMs.

Instead of rescanning transcripts, MemBank stores content-addressed references to neural activations — giving any open-source model persistent, efficient long-term memory.

Built with HGNS (Hierarchical Gradient Number System) by chickenpie347 with contributions from Grok 3 (xAI).


Why MemBank?

Standard LLMs have no persistent memory beyond their context window. The typical workaround — stuffing transcripts back into the prompt — wastes tokens and compute on text the model has already "seen."

MemBank takes a different approach, inspired by how human memory actually works:

Approach What's stored Retrieval Token cost
Transcript stuffing Raw text Linear rescan High — grows with history
Naive RAG Text chunks + embeddings Vector search Medium — chunks per query
MemBank Activation pointers HGNS coarse-to-fine Minimal — only relevant context

The key insight: a pointer to an activation is ~64 bytes. The activation itself is ~16KB. You only need the full tensor for the final re-ranking step.


Architecture

User Query
    │
    ▼
ModelAdapter.encode()          ← any HF transformer or mock
    │
    ▼
HGNSHierarchy.compress()       ← three resolution levels
    │                             Level 0: full dim  (e.g. 256-dim)
    │                             Level 1: ~33% dim  (e.g. 84-dim)
    │                             Level 2: ~8% dim   (e.g. 21-dim)
    ▼
QueryEngine.search()           ← coarse-to-fine drill-down
    │  Step 1: Level 2 → top candidate_k × top_k results (fast)
    │  Step 2: Level 1 → re-rank candidates
    │  Step 3: Level 0 → final top_k ranking (precise)
    ▼
Registry.get() + Buffer.deref()  ← zero-copy pointer dereference
    │
    ▼
RecallResult list

Core Components

Module Responsibility
core/pointer.py PointerRecord — SHA256 content-addressed metadata (64B each)
core/buffer.py MemMapBuffer — flat memmap file, zero-copy reads
core/registry.py Registry — SQLite metadata store, dedup, GC, invalidation
hgns/gradient.py HGNS gradient approximation: ∇f(x) ≈ [f(x+1/kˡ) - f(x)] / (1/kˡ)
hgns/hierarchy.py HGNSHierarchy — three-level compression pipeline
retrieval/index.py VectorIndex — FAISS (or numpy fallback) per HGNS level
retrieval/query.py QueryEngine — multi-level drill-down search
adapters/base.py ModelAdapter — abstract interface (3 methods)
adapters/hf_adapter.py HuggingFaceAdapter — forward hook on any HF transformer
adapters/mock_adapter.py MockAdapter — deterministic testing without GPU
membank.py MemBank — public API: ingest(), recall(), integrate()

Installation

Minimal (numpy only, no model loading):

pip install mambank

With FAISS for fast retrieval:

pip install mambank[retrieval]

With HuggingFace support:

pip install mambank[full]

From source:

git clone https://github.com/chickenpie347/mambank
cd mambank
pip install -e ".[full]"

Quick Start

With MockAdapter (no GPU, for testing)

from mambank import MemBank
from mambank.adapters.mock_adapter import MockAdapter

bank = MemBank(adapter=MockAdapter(hidden_dim=256))

# Ingest — compresses to 3 HGNS levels, stores content-addressed pointers
bank.ingest("The HGNS framework introduces recursive sub-steps between integers.")
bank.ingest("Quantum chaos is tamed by iterative HGNS refinement.")
bank.ingest("MemBank stores activations as zero-copy pointer references.")

# Recall — coarse-to-fine HGNS retrieval
results = bank.recall("HGNS butterfly effect", top_k=3)
for r in results:
    print(f"[{r.rank}] score={r.final_score:.4f}  chunk={r.source_text_hash[:16]}")

# Integrate — recall + ingest in one call (active working memory)
output = bank.integrate("How does HGNS suppress chaotic sensitivity?", top_k=2)
print(output["augmented_prompt"])

With a Real HuggingFace Model

from mambank import MemBank
from mambank.adapters.hf_adapter import HuggingFaceAdapter

# Loads GPT-2 (117M) — replace with any HF model
adapter = HuggingFaceAdapter("gpt2")
adapter.warmup()

bank = MemBank(
    adapter=adapter,
    buffer_path="./my_memory.mmap",    # persistent across sessions
    registry_path="./my_registry.db",  # persistent across sessions
)

bank.ingest("Turn 1: We discussed the HGNS paper on recursive tensor systems.")
bank.ingest("Turn 2: The butterfly effect is suppressed via adaptive resolution.")

results = bank.recall("What did we discuss about HGNS?", top_k=3)

Persistent Memory (Across Sessions)

# Session 1 — build the memory
bank = MemBank(
    adapter=HuggingFaceAdapter("gpt2"),
    buffer_path="./memory.mmap",
    registry_path="./memory.db",
)
bank.ingest("Important fact: HGNS was invented in September 2025.")
bank.close()

# Session 2 — restore and recall
bank = MemBank(
    adapter=HuggingFaceAdapter("gpt2"),
    buffer_path="./memory.mmap",
    registry_path="./memory.db",
)
bank.rebuild_indexes()  # Rebuilds in-memory FAISS indexes from persisted registry
results = bank.recall("When was HGNS invented?", top_k=1)

Conversational Auto-Population

# Every integrate() call adds the new turn AND retrieves relevant past context
for user_message in conversation:
    output = bank.integrate(user_message, top_k=3)

    # output["augmented_prompt"] contains:
    # [MemBank Context]
    # [1] (score=0.9231) chunk:a3f9c2d1e4b5...
    # [2] (score=0.8874) chunk:7bc2f1a3d9e0...
    # [Query]
    # <user_message>

    # Feed output["augmented_prompt"] to your LLM
    response = llm.generate(output["augmented_prompt"])
    bank.ingest(response)  # Also ingest the model's response

API Reference

MemBank

MemBank(
    adapter,                          # ModelAdapter instance (required)
    buffer_path="./mambank_buffer.mmap",
    registry_path=":memory:",         # Use a file path for persistence
    buffer_capacity_bytes=128*1024*1024,
    hgns_k=10,                        # HGNS base (graduations per level)
    hgns_gradient_levels=4,           # Gradient recursion depth
    candidate_multiplier=5,           # L2 over-fetch factor
    min_recall_score=0.0,             # Discard results below this score
    auto_validate_adapter=True,
)
Method Description
ingest(text, metadata=None) Encode, compress, store. Returns {ptr_ids, is_new, dims, ingest_ms}
recall(query, top_k=5, levels=None) Find top-k relevant activations. Returns List[RecallResult]
integrate(text, top_k=3) recall() + ingest() in one call. Returns {query, recalled, augmented_prompt, ingest_result}
rebuild_indexes() Restore search indexes after reopening a persistent store
gc() Collect dead pointers (ref_count ≤ 0)
stats() Full telemetry dict
close() Flush and close buffer + registry

RecallResult

Field Type Description
rank int 0-indexed position in result list
final_score float Cosine similarity at finest available HGNS level
score_l0/l1/l2 float Per-level similarity scores
ptr_level0/1/2 PointerRecord Pointer at each level (for direct buffer access)
source_text_hash str SHA256 of original text chunk
best_ptr PointerRecord Finest-resolution pointer available

ModelAdapter (implement to add a new model)

from mambank.adapters.base import ModelAdapter
import numpy as np

class MyAdapter(ModelAdapter):

    @property
    def hidden_dim(self) -> int:
        return 768   # your model's embedding dimension

    def model_id(self) -> str:
        return "my-model-v1"   # stable across restarts

    def encode(self, text: str) -> np.ndarray:
        # Run your model, return shape (hidden_dim,) float32 array
        return my_model.encode(text).astype(np.float32)

HGNS Integration

MemBank's compression is built on the Hierarchical Gradient Number System (HGNS) from "A Recursive Framework for Multiscale Tensor Calculations Tames the Butterfly Effect" (chickenpie347, Sep 2025).

The key idea: instead of fixed-width projections (PCA, linear layers), MemBank uses HGNS gradient magnitude to select the most information-dense dimensions of each activation — content-aware, parameter-free, no training required.

HGNS gradient approximation (from the paper):

∇f(x) ≈ [f(x + 1/kˡ) - f(x)] / (1/kˡ)

Multi-level gradient (geometric decay across levels):

∇ᴸf = Σₗ₌₁ᴸ ∂⁽ˡ⁾f / kˡ

Attribute convergence (from Classical__Quantum paper):

v_{n+1} = v_n + η ∇_attr v_n,   stop when ‖v_{n+1} - v_n‖ < ε

The three HGNS levels in MemBank map naturally to the HGNS number representation:

  • Level 0 (integer part n): full activation, maximum fidelity
  • Level 1 (first sub-steps m₁/k): sentence-level compressed representation
  • Level 2 (coarse m₂/k²): topic-level summary, fast initial search

Benchmark Results

Measured on CPU with numpy fallback (no FAISS, no GPU), hidden_dim=256, 500-chunk corpus.

Throughput & Latency

Metric Value
Ingest throughput ~165 chunks/sec
Recall p50 latency 15.6 ms
Recall p99 latency 26.7 ms

Memory Efficiency (256-dim model)

Level Dimensions Size/activation vs. Full
Level 0 (full) 256 1.0 KB 100%
Level 1 (sentence) 84 336 B 33%
Level 2 (topic) 21 84 B 8%

Installing faiss-cpu drops recall latency to ~1–2 ms for corpora up to 100k entries.

HGNS Level Contribution

Search config Precision@5 (vs naive RAG)
L2 only (coarse) 8%
L2 + L1 (mid) 12%
L2 + L1 + L0 (full drill-down) 16%

Accuracy improves 2× from coarse-only to full drill-down, confirming the HGNS hierarchy adds retrieval value at each level. Accuracy numbers reflect MockAdapter semantics — real HF models with proper embeddings achieve significantly higher overlap with naive RAG.


Project Structure

mambank/
├── mambank/
│   ├── __init__.py           ← public API: from mambank import MemBank
│   ├── membank.py            ← MemBank main class
│   ├── core/
│   │   ├── pointer.py        ← PointerRecord, content hashing
│   │   ├── buffer.py         ← MemMapBuffer, zero-copy storage
│   │   └── registry.py       ← SQLite metadata, dedup, GC
│   ├── hgns/
│   │   ├── gradient.py       ← HGNS gradient math from paper
│   │   └── hierarchy.py      ← Three-level compression pipeline
│   ├── retrieval/
│   │   ├── index.py          ← VectorIndex (FAISS / numpy)
│   │   └── query.py          ← QueryEngine, coarse-to-fine search
│   └── adapters/
│       ├── base.py           ← Abstract ModelAdapter interface
│       ├── hf_adapter.py     ← HuggingFace transformer adapter
│       └── mock_adapter.py   ← Deterministic testing adapter
├── tests/
│   ├── test_week1.py         ← pointer, buffer, registry (36 tests)
│   ├── test_week2.py         ← HGNS, adapters (140 tests)
│   └── test_week3.py         ← retrieval, MemBank e2e (87 tests)
├── benchmarks/
│   └── benchmark.py          ← throughput, latency, accuracy
├── setup.py
└── README.md

Running Tests

# All tests (requires pytest)
pytest tests/ -v

# Individual weeks
python tests/test_week2.py
python tests/test_week3.py

# Benchmarks
python benchmarks/benchmark.py

Test coverage: 263 tests, 0 failures across pointer system, HGNS hierarchy, model adapters, retrieval, and end-to-end pipeline.


Roadmap

  • Week 4: PyPI release, CI/CD, GitHub Actions
  • Async ingest for non-blocking pipeline integration
  • IVF index auto-upgrade for corpora > 10k entries
  • Buffer defragmentation / slot reclamation for GC'd pointers
  • HGNS-guided clustering for topic-level memory organisation
  • Adapters for vLLM, llama.cpp, Ollama
  • Multi-modal support (image + text activations)
  • Quantum hardware implementation (per HGNS paper §5)

Citation

If you use MemBank or HGNS in your research:

@software{membank2025,
  author  = {chickenpie347},
  title   = {MemBank: Pointer-Based Neural Activation Memory for Open-Source LLMs},
  year    = {2025},
  url     = {https://github.com/chickenpie347/mambank},
}

@article{hgns2025,
  author  = {chickenpie347 and {Grok 3 (xAI)}},
  title   = {The Hierarchical Gradient Number System: A Recursive Framework
             for Multiscale Tensor Calculations Tames the Butterfly Effect},
  year    = {2025},
  note    = {arXiv:XXXX.XXXXX [physics.comp-ph]},
}

License

MIT © chickenpie347

Core HGNS mathematics from "The Hierarchical Gradient Number System" papers, 2025.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mambank-0.1.0a0.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mambank-0.1.0a0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file mambank-0.1.0a0.tar.gz.

File metadata

  • Download URL: mambank-0.1.0a0.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for mambank-0.1.0a0.tar.gz
Algorithm Hash digest
SHA256 62153901982cb5d431bf05427f335b6141c9f9734506a8a4e5f9e385df36716d
MD5 87d60aa453f4cd8039de53157083d94b
BLAKE2b-256 89d3b523bfac72710406eef43ce5c79f53b54dc8c5d006cac7f46f5c3b843608

See more details on using hashes here.

File details

Details for the file mambank-0.1.0a0-py3-none-any.whl.

File metadata

  • Download URL: mambank-0.1.0a0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for mambank-0.1.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 ab6f18fa08d9a02cf96d312823503dfc4b80ba2aa512897d0dd311a8ca66dce5
MD5 ea5a415e059ffbdb1c53d9be05b7cdf6
BLAKE2b-256 ce223120ae09f761b95d749ec1a0d4299726a029bde22e1ff90cb2a77a4e1fc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page