MemBank: Pointer-based neural activation memory for open-source LLMs

These details have not been verified by PyPI

Project links

Homepage

Project description

MemBank™

Pointer-based neural activation memory for open-source LLMs.

Instead of rescanning transcripts, MemBank stores content-addressed references to neural activations — giving any open-source model persistent, efficient long-term memory.

Built with HGNS (Hierarchical Gradient Number System) by chickenpie347 with contributions from Grok 3 (xAI).

Why MemBank?

Standard LLMs have no persistent memory beyond their context window. The typical workaround — stuffing transcripts back into the prompt — wastes tokens and compute on text the model has already "seen."

MemBank takes a different approach, inspired by how human memory actually works:

Approach	What's stored	Retrieval	Token cost
Transcript stuffing	Raw text	Linear rescan	High — grows with history
Naive RAG	Text chunks + embeddings	Vector search	Medium — chunks per query
MemBank	Activation pointers	HGNS coarse-to-fine	Minimal — only relevant context

The key insight: a pointer to an activation is ~64 bytes. The activation itself is ~16KB. You only need the full tensor for the final re-ranking step.

Architecture

User Query
    │
    ▼
ModelAdapter.encode()          ← any HF transformer or mock
    │
    ▼
HGNSHierarchy.compress()       ← three resolution levels
    │                             Level 0: full dim  (e.g. 256-dim)
    │                             Level 1: ~33% dim  (e.g. 84-dim)
    │                             Level 2: ~8% dim   (e.g. 21-dim)
    ▼
QueryEngine.search()           ← coarse-to-fine drill-down
    │  Step 1: Level 2 → top candidate_k × top_k results (fast)
    │  Step 2: Level 1 → re-rank candidates
    │  Step 3: Level 0 → final top_k ranking (precise)
    ▼
Registry.get() + Buffer.deref()  ← zero-copy pointer dereference
    │
    ▼
RecallResult list

Core Components

Module	Responsibility
`core/pointer.py`	`PointerRecord` — SHA256 content-addressed metadata (64B each)
`core/buffer.py`	`MemMapBuffer` — flat memmap file, zero-copy reads
`core/registry.py`	`Registry` — SQLite metadata store, dedup, GC, invalidation
`hgns/gradient.py`	HGNS gradient approximation: `∇f(x) ≈ [f(x+1/kˡ) - f(x)] / (1/kˡ)`
`hgns/hierarchy.py`	`HGNSHierarchy` — three-level compression pipeline
`retrieval/index.py`	`VectorIndex` — FAISS (or numpy fallback) per HGNS level
`retrieval/query.py`	`QueryEngine` — multi-level drill-down search
`adapters/base.py`	`ModelAdapter` — abstract interface (3 methods)
`adapters/hf_adapter.py`	`HuggingFaceAdapter` — forward hook on any HF transformer
`adapters/mock_adapter.py`	`MockAdapter` — deterministic testing without GPU
`membank.py`	`MemBank` — public API: `ingest()`, `recall()`, `integrate()`

Installation

Minimal (numpy only, no model loading):

pip install mambank

With FAISS for fast retrieval:

pip install mambank[retrieval]

With HuggingFace support:

pip install mambank[full]

From source:

git clone https://github.com/chickenpie347/mambank
cd mambank
pip install -e ".[full]"

Quick Start

With MockAdapter (no GPU, for testing)

from mambank import MemBank
from mambank.adapters.mock_adapter import MockAdapter

bank = MemBank(adapter=MockAdapter(hidden_dim=256))

# Ingest — compresses to 3 HGNS levels, stores content-addressed pointers
bank.ingest("The HGNS framework introduces recursive sub-steps between integers.")
bank.ingest("Quantum chaos is tamed by iterative HGNS refinement.")
bank.ingest("MemBank stores activations as zero-copy pointer references.")

# Recall — coarse-to-fine HGNS retrieval
results = bank.recall("HGNS butterfly effect", top_k=3)
for r in results:
    print(f"[{r.rank}] score={r.final_score:.4f}  chunk={r.source_text_hash[:16]}")

# Integrate — recall + ingest in one call (active working memory)
output = bank.integrate("How does HGNS suppress chaotic sensitivity?", top_k=2)
print(output["augmented_prompt"])

With a Real HuggingFace Model

from mambank import MemBank
from mambank.adapters.hf_adapter import HuggingFaceAdapter

# Loads GPT-2 (117M) — replace with any HF model
adapter = HuggingFaceAdapter("gpt2")
adapter.warmup()

bank = MemBank(
    adapter=adapter,
    buffer_path="./my_memory.mmap",    # persistent across sessions
    registry_path="./my_registry.db",  # persistent across sessions
)

bank.ingest("Turn 1: We discussed the HGNS paper on recursive tensor systems.")
bank.ingest("Turn 2: The butterfly effect is suppressed via adaptive resolution.")

results = bank.recall("What did we discuss about HGNS?", top_k=3)

Persistent Memory (Across Sessions)

# Session 1 — build the memory
bank = MemBank(
    adapter=HuggingFaceAdapter("gpt2"),
    buffer_path="./memory.mmap",
    registry_path="./memory.db",
)
bank.ingest("Important fact: HGNS was invented in September 2025.")
bank.close()

# Session 2 — restore and recall
bank = MemBank(
    adapter=HuggingFaceAdapter("gpt2"),
    buffer_path="./memory.mmap",
    registry_path="./memory.db",
)
bank.rebuild_indexes()  # Rebuilds in-memory FAISS indexes from persisted registry
results = bank.recall("When was HGNS invented?", top_k=1)

Conversational Auto-Population

# Every integrate() call adds the new turn AND retrieves relevant past context
for user_message in conversation:
    output = bank.integrate(user_message, top_k=3)

    # output["augmented_prompt"] contains:
    # [MemBank Context]
    # [1] (score=0.9231) chunk:a3f9c2d1e4b5...
    # [2] (score=0.8874) chunk:7bc2f1a3d9e0...
    # [Query]
    # <user_message>

    # Feed output["augmented_prompt"] to your LLM
    response = llm.generate(output["augmented_prompt"])
    bank.ingest(response)  # Also ingest the model's response

API Reference

`MemBank`

MemBank(
    adapter,                          # ModelAdapter instance (required)
    buffer_path="./mambank_buffer.mmap",
    registry_path=":memory:",         # Use a file path for persistence
    buffer_capacity_bytes=128*1024*1024,
    hgns_k=10,                        # HGNS base (graduations per level)
    hgns_gradient_levels=4,           # Gradient recursion depth
    candidate_multiplier=5,           # L2 over-fetch factor
    min_recall_score=0.0,             # Discard results below this score
    auto_validate_adapter=True,
)

Method	Description
`ingest(text, metadata=None)`	Encode, compress, store. Returns `{ptr_ids, is_new, dims, ingest_ms}`
`recall(query, top_k=5, levels=None)`	Find top-k relevant activations. Returns `List[RecallResult]`
`integrate(text, top_k=3)`	`recall()` + `ingest()` in one call. Returns `{query, recalled, augmented_prompt, ingest_result}`
`rebuild_indexes()`	Restore search indexes after reopening a persistent store
`gc()`	Collect dead pointers (ref_count ≤ 0)
`stats()`	Full telemetry dict
`close()`	Flush and close buffer + registry

`RecallResult`

Field	Type	Description
`rank`	`int`	0-indexed position in result list
`final_score`	`float`	Cosine similarity at finest available HGNS level
`score_l0/l1/l2`	`float`	Per-level similarity scores
`ptr_level0/1/2`	`PointerRecord`	Pointer at each level (for direct buffer access)
`source_text_hash`	`str`	SHA256 of original text chunk
`best_ptr`	`PointerRecord`	Finest-resolution pointer available

`ModelAdapter` (implement to add a new model)

from mambank.adapters.base import ModelAdapter
import numpy as np

class MyAdapter(ModelAdapter):

    @property
    def hidden_dim(self) -> int:
        return 768   # your model's embedding dimension

    def model_id(self) -> str:
        return "my-model-v1"   # stable across restarts

    def encode(self, text: str) -> np.ndarray:
        # Run your model, return shape (hidden_dim,) float32 array
        return my_model.encode(text).astype(np.float32)

HGNS Integration

MemBank's compression is built on the Hierarchical Gradient Number System (HGNS) from "A Recursive Framework for Multiscale Tensor Calculations Tames the Butterfly Effect" (chickenpie347, Sep 2025).

The key idea: instead of fixed-width projections (PCA, linear layers), MemBank uses HGNS gradient magnitude to select the most information-dense dimensions of each activation — content-aware, parameter-free, no training required.

HGNS gradient approximation (from the paper):

∇f(x) ≈ [f(x + 1/kˡ) - f(x)] / (1/kˡ)

Multi-level gradient (geometric decay across levels):

∇ᴸf = Σₗ₌₁ᴸ ∂⁽ˡ⁾f / kˡ

Attribute convergence (from Classical__Quantum paper):

v_{n+1} = v_n + η ∇_attr v_n,   stop when ‖v_{n+1} - v_n‖ < ε

The three HGNS levels in MemBank map naturally to the HGNS number representation:

Level 0 (integer part n): full activation, maximum fidelity
Level 1 (first sub-steps m₁/k): sentence-level compressed representation
Level 2 (coarse m₂/k²): topic-level summary, fast initial search

Benchmark Results

Measured on CPU with numpy fallback (no FAISS, no GPU), hidden_dim=256, 500-chunk corpus.

Throughput & Latency

Metric	Value
Ingest throughput	~165 chunks/sec
Recall p50 latency	15.6 ms
Recall p99 latency	26.7 ms

Memory Efficiency (256-dim model)

Level	Dimensions	Size/activation	vs. Full
Level 0 (full)	256	1.0 KB	100%
Level 1 (sentence)	84	336 B	33%
Level 2 (topic)	21	84 B	8%

Installing faiss-cpu drops recall latency to ~1–2 ms for corpora up to 100k entries.

HGNS Level Contribution

Search config	Precision@5 (vs naive RAG)
L2 only (coarse)	8%
L2 + L1 (mid)	12%
L2 + L1 + L0 (full drill-down)	16%

Accuracy improves 2× from coarse-only to full drill-down, confirming the HGNS hierarchy adds retrieval value at each level. Accuracy numbers reflect MockAdapter semantics — real HF models with proper embeddings achieve significantly higher overlap with naive RAG.

Project Structure

mambank/
├── mambank/
│   ├── __init__.py           ← public API: from mambank import MemBank
│   ├── membank.py            ← MemBank main class
│   ├── core/
│   │   ├── pointer.py        ← PointerRecord, content hashing
│   │   ├── buffer.py         ← MemMapBuffer, zero-copy storage
│   │   └── registry.py       ← SQLite metadata, dedup, GC
│   ├── hgns/
│   │   ├── gradient.py       ← HGNS gradient math from paper
│   │   └── hierarchy.py      ← Three-level compression pipeline
│   ├── retrieval/
│   │   ├── index.py          ← VectorIndex (FAISS / numpy)
│   │   └── query.py          ← QueryEngine, coarse-to-fine search
│   └── adapters/
│       ├── base.py           ← Abstract ModelAdapter interface
│       ├── hf_adapter.py     ← HuggingFace transformer adapter
│       └── mock_adapter.py   ← Deterministic testing adapter
├── tests/
│   ├── test_week1.py         ← pointer, buffer, registry (36 tests)
│   ├── test_week2.py         ← HGNS, adapters (140 tests)
│   └── test_week3.py         ← retrieval, MemBank e2e (87 tests)
├── benchmarks/
│   └── benchmark.py          ← throughput, latency, accuracy
├── setup.py
└── README.md

Running Tests

# All tests (requires pytest)
pytest tests/ -v

# Individual weeks
python tests/test_week2.py
python tests/test_week3.py

# Benchmarks
python benchmarks/benchmark.py

Test coverage: 263 tests, 0 failures across pointer system, HGNS hierarchy, model adapters, retrieval, and end-to-end pipeline.

Roadmap

Week 4: PyPI release, CI/CD, GitHub Actions
Async ingest for non-blocking pipeline integration
IVF index auto-upgrade for corpora > 10k entries
Buffer defragmentation / slot reclamation for GC'd pointers
HGNS-guided clustering for topic-level memory organisation
Adapters for vLLM, llama.cpp, Ollama
Multi-modal support (image + text activations)
Quantum hardware implementation (per HGNS paper §5)

Citation

If you use MemBank or HGNS in your research:

@software{membank2025,
  author  = {chickenpie347},
  title   = {MemBank: Pointer-Based Neural Activation Memory for Open-Source LLMs},
  year    = {2025},
  url     = {https://github.com/chickenpie347/mambank},
}

@article{hgns2025,
  author  = {chickenpie347 and {Grok 3 (xAI)}},
  title   = {The Hierarchical Gradient Number System: A Recursive Framework
             for Multiscale Tensor Calculations Tames the Butterfly Effect},
  year    = {2025},
  note    = {arXiv:XXXX.XXXXX [physics.comp-ph]},
}

License

MIT © chickenpie347

Core HGNS mathematics from "The Hierarchical Gradient Number System" papers, 2025.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0a0 pre-release

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mambank-0.1.0a0.tar.gz (35.7 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mambank-0.1.0a0-py3-none-any.whl (12.6 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file mambank-0.1.0a0.tar.gz.

File metadata

Download URL: mambank-0.1.0a0.tar.gz
Upload date: Feb 27, 2026
Size: 35.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for mambank-0.1.0a0.tar.gz
Algorithm	Hash digest
SHA256	`62153901982cb5d431bf05427f335b6141c9f9734506a8a4e5f9e385df36716d`
MD5	`87d60aa453f4cd8039de53157083d94b`
BLAKE2b-256	`89d3b523bfac72710406eef43ce5c79f53b54dc8c5d006cac7f46f5c3b843608`

See more details on using hashes here.

File details

Details for the file mambank-0.1.0a0-py3-none-any.whl.

File metadata

Download URL: mambank-0.1.0a0-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 12.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for mambank-0.1.0a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab6f18fa08d9a02cf96d312823503dfc4b80ba2aa512897d0dd311a8ca66dce5`
MD5	`ea5a415e059ffbdb1c53d9be05b7cdf6`
BLAKE2b-256	`ce223120ae09f761b95d749ec1a0d4299726a029bde22e1ff90cb2a77a4e1fc8`

See more details on using hashes here.

mambank 0.1.0a0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MemBank™

Why MemBank?

Architecture

Core Components

Installation

Quick Start

With MockAdapter (no GPU, for testing)

With a Real HuggingFace Model

Persistent Memory (Across Sessions)

Conversational Auto-Population

API Reference

MemBank

RecallResult

ModelAdapter (implement to add a new model)

HGNS Integration

Benchmark Results

Throughput & Latency

Memory Efficiency (256-dim model)

HGNS Level Contribution

Project Structure

Running Tests

Roadmap

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`MemBank`

`RecallResult`

`ModelAdapter` (implement to add a new model)