Self-improving memory store for LLM agents: hybrid retrieval, clustered retrieval-induced forgetting, optional LLM enrichment

These details have not been verified by PyPI

Project description

lethe

Λήθη: the ancient Greek personification of forgetfulness, and one of the five rivers of the underworld.

A memory store for LLM agents. Hybrid BM25 + dense retrieval, cross-encoder reranking, clustered retrieval-induced forgetting (RIF), and an optional LLM enrichment layer at write time.

Read this before the benchmark numbers

Most memory-tool READMEs advertise NDCG or recall in the 95 to 99% range. Those numbers come from evaluating on a per-query fresh database of ~50 sessions, at session granularity, with recall@5. That's a roughly 2000× easier task than "find this specific turn among 200,000". Some implementations additionally leak ground-truth annotations into the index.

lethe's numbers are reported on the full 199,509-turn LongMemEval S corpus, turn-level retrieval, NDCG@10, no leakage. Random baseline is 0.005%. Vector-only search gets 0.14. BM25 alone gets 0.24. When other tools are run on this setup, they land in the same 0.1 to 0.4 range. The 99% numbers don't translate.

No 99%. No cherry-picking. These are the honest numbers on a hard benchmark:

Stage	NDCG@10	notes
Hybrid BM25 + vector (RRF)	0.217	basic retrieval (most popular)
+ cross-encoder reranking	0.293	+35% from semantic reranking
+ clustered+gap RIF (checkpoint 13)	0.312	+6.5% from retrieval-induced forgetting
+ LLM enrichment, on covered queries	0.473	+21% on the 75 queries where the answer turn was Haiku-enriched

Enrichment only helps where it's been applied; the full-corpus number is diluted by the uncovered 425 queries. Full methodology in BENCHMARKS.md. The research path, including 10 failed approaches before anything worked, is in RESEARCH_JOURNEY.md.

https://github.com/user-attachments/assets/ea6e9657-7525-4fed-ba40-df7a14e6b9f4

Lethe full stack vs basic hybrid RRF on the 199,509-turn LongMemEval corpus. NDCG@10 over 5,000 queries.

What's new here: retrieval-induced forgetting

RIF is a well-studied phenomenon in cognitive psychology (Anderson, 1994). In plain words:

When you retrieve one thing, related things that could have been retrieved but weren't get quietly suppressed. Ask someone to name a fruit; they say "apple". A few minutes later, "pear" and "orange" are measurably harder for them to recall, even though nothing asked about them. Remembering is also an act of forgetting.

lethe brings this mechanism to vector memory:

Every retrieval surfaces a candidate pool (BM25 ∪ vector top-k).
The cross-encoder picks the winners.
Entries that reached the pool but lost don't disappear. They accumulate a per-cluster suppression score.
On future retrievals in the same query cluster ("travel-like queries", "debugging-like queries"), those entries get their scores penalised before the cross-encoder sees them, freeing slots for entries that were previously crowded out.

The key detail: suppression is cue-dependent, not global. An entry suppressed for travel queries stays fully available for food queries. Global suppression was ~5× weaker in our benchmarks. The clustering is what made it work.

Why this matters: a memory that evolves

Most memory stores are static caches. You put strings in; you retrieve them by similarity. The retrieval function never changes.

RIF makes retrieval behaviour adapt over time without retraining any model:

Entries that keep losing to more relevant alternatives get quieter in their own cluster.
Entries that keep winning get amplified via tier promotion (naive → gc → memory) and exemption from decay.
The system learns "in the context of this kind of query, you usually mean that chunk, not this one" from its own retrieval history. No fine-tuning. No extra LLM calls. Just O(n_clusters × n_entries) bookkeeping.

Concrete example: you store both "I prefer window seats on flights" and "my wife needs aisle seats". Travel queries keep picking your preference first. The system quietly penalises your wife's preference for travel-sounding queries, not globally, just in that cluster. Ask about your spouse's preferences and both entries remain equally findable.

Checkpoint 13 (clustered + rank-gap RIF) gives +6.5% NDCG and +9.5% recall@30 on LongMemEval over the baseline hybrid pipeline. As far as I can find, this is the first implementation of clustered RIF in a production retrieval system.

Claude Code plugin

Lethe ships as a Claude Code plugin that persists memory across sessions per-project:

/plugin marketplace add teimurjan/lethe
/plugin install lethe

Hooks write daily markdown to .lethe/memory/YYYY-MM-DD.md, summarize each turn with claude -p --model haiku (no extra API key), reindex via the lethe CLI, and a memory-recall skill surfaces prior-session context on demand.

See plugins/claude-code/README.md for hook behavior, CLI reference, and local-install instructions.

Quick start

from lethe import MemoryStore
from sentence_transformers import SentenceTransformer, CrossEncoder

store = MemoryStore(
    "./my_memories",
    bi_encoder=SentenceTransformer("all-MiniLM-L6-v2"),
    cross_encoder=CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2"),
)

store.add("I prefer window seats on flights", session_id="trip")
store.add("My wife needs aisle seats", session_id="trip")
store.add("I work at Google as a software engineer", session_id="work")

results = store.retrieve("What are my travel preferences?", k=5)
for entry_id, content, score in results:
    print(f"  [{score:.1f}] {content}")

store.save()
store.close()

Architecture

Query
  │
  ├── FAISS top-30 (dense vector similarity)
  ├── BM25 top-30 (sparse keyword match)
  │
  └── Merge (RRF)
        │
        └── RIF suppression penalty (per-cluster, gap-based)
              │
              └── Cross-encoder rerank → top-k
                    │
                    └── Update suppression state, affinities, tier

Optional write-time LLM enrichment layer (src/lethe/enrichment.py): before indexing, each memory can be processed by an LLM (default claude-haiku-4-5) to produce a gist, 3 anticipated queries, entities, and temporal markers. All fields index alongside the original text; cross-encoder still scores against original. Attacks the vocabulary-mismatch failure mode.

RIF: technical details

The formula that made it work (checkpoint 13 of 17):

competition_strength = max(0, xenc_rank - initial_rank) / pool_size × sigmoid(-xenc_score)
suppression[eid, cluster] += learning_rate × competition_strength
score_adjusted = base_score - alpha × suppression[eid, cluster]

The rank-gap term only penalises entries that dropped in rank between the initial hybrid retrieval and the cross-encoder rerank AND were actively rejected (negative xenc score). Entries that just lost a close race aren't suppressed. That prevents the system from forgetting legitimately useful context because it happened to rank second once.

k-means runs once (30 clusters) on the bi-encoder query embedding at retrieval time; the cluster assignment is the "cue" coordinate. Based on Anderson's inhibition theory (1994) and the SAM competitive-sampling model (Raaijmakers & Shiffrin, 1981).

Three storage layers

Layer	File	Purpose
DuckDB	`lethe.duckdb`	Entries, suppression state, rescue cache, stats. Many project DBs can be `ATTACH`ed simultaneously for cross-project search.
numpy + FAISS	`embeddings.npz`, `faiss.index`	Vector storage
BM25	In-memory, rebuilt on startup	Sparse keyword index

Entry lifecycle (germinal-center inspired)

NAIVE → GC → MEMORY
              ↓
         APOPTOTIC

Naive: new entries, unproven
GC: retrieved 3+ times, actively evaluated
Memory: high affinity + frequently retrieved, stable, exempt from decay
Apoptotic: low affinity + idle > 1000 steps, excluded from search

Useful for long-running agents; doesn't directly improve retrieval quality (that's what RIF and enrichment do).

Deduplication (on add)

Exact: SHA-256 content hash (free)
Near-duplicate: cosine similarity > 0.95 (keeps the longer entry)

Install

From PyPI (the distribution name is lethe-memory; the import name is still lethe):

pip install lethe-memory
# or
uv pip install lethe-memory

As the lethe CLI on PATH without a venv:

uv tool install lethe-memory
lethe --version

From source, for development:

git clone https://github.com/teimurjan/lethe && cd lethe
uv venv --python 3.12 && uv pip install -e .

As a Claude Code plugin (see plugin README):

/plugin marketplace add teimurjan/lethe
/plugin install lethe

Global search across projects

Every lethe index registers the project in ~/.lethe/projects.json. lethe search --all then ATTACHes each registered lethe.duckdb as a read-only schema and runs the full hybrid + RIF + cross-encoder pipeline across the union.

# In each project you care about:
lethe index

# Search across all registered projects:
lethe search "mongodb pool" --all --top-k 5
# [p_site_fix_pentest_8a99] [+4.63] 06e6...  - MongoDB connection pool issue at 100% rollout…
# [p_lethe_77be]            [+3.41] 4670...  - lethe search returning real hybrid results…

# Restrict to a subset:
lethe search "auth flow" --projects p_site_fix_pentest_8a99,p_api_server_12ab

# Inspect / manage the registry:
lethe projects list
lethe projects add /path/to/other-project
lethe projects remove <slug-or-path>
lethe projects prune   # drop entries whose roots no longer exist

Opt out of auto-registration in three ways:

Per invocation: lethe index --no-register
Per project: lethe config set auto_register false
Globally: export LETHE_DISABLE_GLOBAL_REGISTRY=1

The registry only stores absolute project paths, never contents. Cross-project search is read-only; it does not mutate RIF state. Local lethe search continues to learn as normal.

DuckDB has no cap on attached databases, so the 10/125 ceiling in SQLite's ATTACH DATABASE doesn't apply. Real-world bottleneck before you'd feel it is the per-project BM25 rebuild, which scales linearly.

Benchmark

# prep LongMemEval
uv run python experiments/data_prep.py --dataset longmemeval

# retrieval-only baseline + RIF variants
uv run python benchmarks/run_benchmark.py
uv run python benchmarks/run_rif_benchmark.py

# LLM enrichment layer (needs ANTHROPIC_API_KEY)
export ANTHROPIC_API_KEY=sk-ant-...
uv run python experiments/enrich_longmemeval.py     # one-time, ~$16 for 10k entries
uv run python benchmarks/run_rif_enriched.py         # 3-arm benchmark

See BENCHMARKS.md for results tables and methodology details.

Project structure

src/lethe/             # Production library (162 tests, ~95% coverage)
├── memory_store.py
├── markdown_store.py  # Markdown chunker for .lethe/memory/*.md
├── cli.py             # `lethe` console script
├── union_store.py     # Cross-project read-only search via DuckDB ATTACH
├── db.py              # DuckDB persistence
├── vectors.py         # FAISS + BM25 index
├── reranker.py        # Cross-encoder + adaptive depth
├── rif.py             # Retrieval-induced forgetting (clustered + gap)
├── enrichment.py      # Optional LLM write-time enrichment (Anthropic SDK)
├── dedup.py           # Hash + cosine deduplication
└── entry.py           # MemoryEntry dataclass + Tier enum

plugins/claude-code/   # Claude Code plugin (hooks + memory-recall skill)
.claude-plugin/        # Marketplace manifest

benchmarks/            # Per-checkpoint benchmark scripts
├── run_*.py
├── _lib/              # Benchmark-only helpers (metrics, etc.)
└── results/           # Raw per-run output markdowns

scripts/               # Reproducibility utilities (dataset prep, enrichment runner)

research/              # Experimental / research code (not production)
├── gc_mutation/       # Germinal-center mutation thread (checkpoints 1-10)
└── sdm/               # Sparse Distributed Memory prototype (checkpoint 15)

tests/                 # Production unit tests

Research background

This project started as an experiment porting the immune system's germinal-center mechanism to vector memory. 17 checkpoints so far:

Checkpoints 1-10 (GC-mutation approaches): all failed to improve retrieval quality. Useful for lifecycle management, not retrieval.
Checkpoints 11-13 (RIF): cognitive-science inspired. Clustered + gap-formula variant gives +6.5% NDCG, +9.5% recall@30 on the full 500-query eval. First positive learned-retrieval result.
Checkpoints 14-15 (exploration/rescue, Sparse Distributed Memory prototype): both negative at full scale.
Checkpoint 16 (extended behavior metrics for checkpoint 13): RIF's gain is primarily from reducing cross-topic retrieval (−1.6pp wrong_family); sibling confusion and stale-fact rate unchanged.
Checkpoint 17 (LLM enrichment layer): write-time structured extraction via Haiku. On covered queries: +8.3pp NDCG over checkpoint 13, the largest single-lever gain in the journey. Overall diluted by partial coverage; scaling to full coverage is pending.

Full journey in RESEARCH_JOURNEY.md.

License

MIT.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

Apr 17, 2026

0.2.2

Apr 17, 2026

This version

0.2.1

Apr 17, 2026

0.2.0

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lethe_memory-0.2.1.tar.gz (2.8 MB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lethe_memory-0.2.1-py3-none-any.whl (44.4 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file lethe_memory-0.2.1.tar.gz.

File metadata

Download URL: lethe_memory-0.2.1.tar.gz
Upload date: Apr 17, 2026
Size: 2.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lethe_memory-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`ee496ffbc1b19cda44b123d03d698d4504d374d02a6789f14eb8312278e54350`
MD5	`3592defb9137e868b0e08e6540c856e0`
BLAKE2b-256	`71f5c327af6ad9f31afce66428ced698c8eaf26a43bfa07d3b2b0a2fc9513c0b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lethe_memory-0.2.1.tar.gz:

Publisher: release-please.yml on teimurjan/lethe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lethe_memory-0.2.1.tar.gz
- Subject digest: ee496ffbc1b19cda44b123d03d698d4504d374d02a6789f14eb8312278e54350
- Sigstore transparency entry: 1328228992
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: teimurjan/lethe@ba1116105467eaf2e2aefb4ff45fc7d3a17e6aa6
- Branch / Tag: refs/heads/main
- Owner: https://github.com/teimurjan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-please.yml@ba1116105467eaf2e2aefb4ff45fc7d3a17e6aa6
- Trigger Event: push

File details

Details for the file lethe_memory-0.2.1-py3-none-any.whl.

File metadata

Download URL: lethe_memory-0.2.1-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 44.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lethe_memory-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`40ef2c9bc9bdc1b654cd88a1dda1de448351bde95a3cc6e19cc36acab7fc4d3b`
MD5	`0065e31323a7434684abc5d49895b1c9`
BLAKE2b-256	`5abe538f8e3c793bba3db0c131cc0809531b66c1ea060efc5b089b4d3ec48157`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lethe_memory-0.2.1-py3-none-any.whl:

Publisher: release-please.yml on teimurjan/lethe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lethe_memory-0.2.1-py3-none-any.whl
- Subject digest: 40ef2c9bc9bdc1b654cd88a1dda1de448351bde95a3cc6e19cc36acab7fc4d3b
- Sigstore transparency entry: 1328229002
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: teimurjan/lethe@ba1116105467eaf2e2aefb4ff45fc7d3a17e6aa6
- Branch / Tag: refs/heads/main
- Owner: https://github.com/teimurjan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-please.yml@ba1116105467eaf2e2aefb4ff45fc7d3a17e6aa6
- Trigger Event: push

lethe-memory 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

lethe

Read this before the benchmark numbers

What's new here: retrieval-induced forgetting

Why this matters: a memory that evolves

Claude Code plugin

Quick start

Architecture

RIF: technical details

Three storage layers

Entry lifecycle (germinal-center inspired)

Deduplication (on add)

Install

Global search across projects

Benchmark

Project structure

Research background

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance