Lightweight long-term memory for LLM agents via vector-quantized routing

These details have not been verified by PyPI

Project links

Project description

Routing Memory (RM)

Lightweight long-term memory for LLM agents via vector-quantized routing.

RM replaces brute-force dense retrieval with a VQ codebook that compresses N items into K centroid buckets. Queries probe only the top-n centroids, rerank by dot product, and return results — achieving 768x per-item compression and 99%+ recall at a fraction of the latency and memory cost.

pip install routing-memory

Quick Start

from rm import RoutingMemory

memory = RoutingMemory()

# Store memories
memory.add("User prefers dark mode for all applications")
memory.add("Meeting with Alice scheduled for March 15 at 2pm")
memory.add("Project deadline is end of Q1 2026")

# Search
results = memory.search("what are the user's UI preferences?", top_k=3)
for r in results:
    print(f"  [{r['score']:.3f}] {r['text']}")

Features

Feature	Description
VQ Codebook	MiniBatchKMeans clustering with adaptive K = ceil(N/B_target)
Multi-probe retrieval	Query top-n centroids, collect candidates, rerank by dot product
Score filtering	Threshold-based filtering saves tokens by dropping low-relevance results
Drift detection	Rolling qerr monitoring with automatic alarm when distribution shifts
Online adaptation	EMA centroid updates, bucket splits, idle centroid pruning
Persistence	SQLite backend for durable storage across sessions
Pluggable backends	Swap embedding models or storage engines via clean interfaces

Architecture

Query ──> Encode ──> Top-n Centroids ──> Collect Candidates ──> Dot-Product Rerank ──> Filter ──> Results
                          |                                           |
                     VQ Codebook                               Score Threshold
                     (K centroids)                               (tau >= 0.3)

Compression: Each item needs only a 2-byte centroid assignment vs 384x4 = 1536 bytes for dense fp32. That's 768x compression.

Recall: With n=4 probes on 5K items: R@5 = 0.9916 (99.2% of dense baseline).

API Reference

`RoutingMemory`

RoutingMemory(
    db_path="rm_memory.db",      # SQLite path (None for in-memory)
    embedding_model="all-MiniLM-L6-v2",  # any sentence-transformers model
    n_probes=3,                  # centroids to probe per query
    score_threshold=0.3,         # minimum retrieval score
    seed=42,                     # random seed
)

Methods:

Method	Description
`add(text, item_id=None, metadata=None)`	Store a memory item, returns item ID
`search(query, top_k=5, threshold=None)`	Semantic search, returns list of dicts
`search_with_signals(query, top_k=5)`	Search with routing signals (confidence, margin, qerr)
`stats()`	Memory statistics (item count, K, compression, drift)
`codebook_info()`	Codebook details (K, dim, Gini, dead codes)
`save()`	Persist codebook state
`close()`	Close storage connection

Low-level Components

from rm import Codebook, L1Retriever, DriftMonitor

# Direct codebook access
cb = Codebook(dim=384, seed=42)
cb.fit(embeddings, item_ids)
centroid_id, qerr = cb.encode(query_embedding)
conf = cb.conf(query_embedding)
margin = cb.margin(query_embedding)

# Retriever
retriever = L1Retriever(cb, n_probes=4, top_k=10, score_threshold=0.3)
result = retriever.query(query_embedding)  # returns L1Result

# Drift monitor
monitor = DriftMonitor()
alarm = monitor.record(qerr, margin)  # returns DriftAlarm or None

Experiment Suite

RM ships with 13 reproducible experiments (7 hypothesis tests + 6 application benchmarks).

# Run all experiments
python -m rm.experiments.run_all

# Run specific experiments
python -m rm.experiments.run_all --select H1 H2 A4

Results Summary

Exp	Name	Key Metric	Result
H1	Codebook Fundamentals	Fidelity@K=64	0.7415
H2	Retrieval Quality	R@5 (n=4 probes)	0.9916
H3	Score-Based Filtering	Savings@tau=0.7	59.1% (R@5=0.958)
H4	Adaptive K Heuristics	Best heuristic	sqrtN (lowest Gini)
H5	Drift Detection	Alarm latency	11 episodes
H6	Multi-Encoder Robustness	RM/Dense ratio spread	0.0036
H7	Storage & Latency	Per-item compression	768x vs fp32
A1	MS-MARCO Passage Retrieval	R@5 (n=4)	0.9585
A2	LoCoMo Conversational Memory	R@5	0.9934
A3	Enrichment Generalization	Delta RM	+0.053
A4	Million-Scale (1M items)	R@5	0.8556
A5	Pareto Frontier	RM dominant at n>=4	91.8% R@5 @ 2.8ms
A6	Bucket Imbalance	Gini (100K, K=256)	0.3628

All experiments use real embeddings (all-MiniLM-L6-v2, d=384) and seed=42.

Project Structure

rm/
  rm/                     # Core package
    __init__.py
    codebook.py           # VQ codebook (MiniBatchKMeans, adaptive K)
    retrieval.py          # Multi-probe retrieval with dot-product rerank
    filtering.py          # Score-based result filtering
    drift.py              # Distribution drift detection
    memory.py             # RoutingMemory high-level API
    embeddings/           # Pluggable embedding backends
      base.py             # Abstract interface
      local.py            # sentence-transformers wrapper
    storage/              # Pluggable storage backends
      base.py             # Abstract interface
      sqlite.py           # SQLite persistence
  experiments/            # 13 reproducible experiments
    run_all.py            # Experiment runner (--select support)
    shared/               # Data generation, plotting utilities
    h1_codebook/ .. h7_storage/   # Hypothesis tests
    a1_msmarco/ .. a6_imbalance/  # Application benchmarks
  tests/                  # pytest test suite
  pyproject.toml          # Package configuration
  LICENSE                 # MIT

Development

git clone https://github.com/AhmetYSertel/routing-memory.git
cd routing-memory
pip install -e ".[dev]"
pytest tests/ -v

Citation

If you use RM in your research, please cite the HGA paper:

@article{sertel2026hga,
  title={Hybrid Governance Architecture: Structured Memory and Adaptive Routing for LLM Agents},
  author={Sertel, Ahmet Yigit},
  year={2026}
}

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

routing_memory-0.1.0.tar.gz (16.7 kB view details)

Uploaded Mar 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

routing_memory-0.1.0-py3-none-any.whl (19.9 kB view details)

Uploaded Mar 26, 2026 Python 3

File details

Details for the file routing_memory-0.1.0.tar.gz.

File metadata

Download URL: routing_memory-0.1.0.tar.gz
Upload date: Mar 26, 2026
Size: 16.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for routing_memory-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`03f52d3cf432e9b88b87160f56ef9f08e8ffc361fef712c4bf45560c0be9caaa`
MD5	`64d1b37e19aa18f87d03510f1b60f7d6`
BLAKE2b-256	`003b947f068646ce2da785b4fa5ff5e6945c774ae17d34c7f427c59e2d1dc0ec`

See more details on using hashes here.

File details

Details for the file routing_memory-0.1.0-py3-none-any.whl.

File metadata

Download URL: routing_memory-0.1.0-py3-none-any.whl
Upload date: Mar 26, 2026
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for routing_memory-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0f7f851c0c18733875b4bd7ac5c055b3e34b504774a51b0f9f3219becb0b7250`
MD5	`bb0b280ed369ff6d28a09c9742fba795`
BLAKE2b-256	`b586e0a91f79b093817d39d8b95880f77f53eaf7dc5f4618556dead1cb73afd2`

See more details on using hashes here.

routing-memory 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Routing Memory (RM)

Quick Start

Features

Architecture

API Reference

`RoutingMemory`

Low-level Components

Experiment Suite

Results Summary

Project Structure

Development

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes