Self-curating memory for LLM agents: MeMo-style external memory kept honest by survival-based selection instead of reward models or judges.
Project description
darwin-memo
Self-curating memory for LLM agents. Knowledge lives outside the frozen model, and it stays alive only while it keeps earning real, measurable outcomes. Wrong, stale, and useless entries go extinct on their own: no reward model, no LLM judge, no human curation.
This is a practical mix of two papers:
| Paper | What this repo takes from it |
|---|---|
| MeMo: Memory as a Model (Quek et al.) | Keep the main LLM frozen and put knowledge in a dedicated memory. The reflection-QA encoding pipeline (fact extraction, consolidation, self-containment verification, entity surfacing, cross-document synthesis) and the three-stage query protocol (grounding, entity identification, answer seeking). |
| Survival is the Only Reward (Dodgson et al.) | Environment-mediated selection. The only signal is a conserved, physically measurable resource delta. Behaviors that persist get reinforced, everything else is pruned (Negative-Space Learning). Reward hacking becomes evolutionarily unstable because there is no proxy to hack. |
The mix: MeMo says what memory is, the survival paper says what gets to stay in it.
flowchart LR
subgraph encode [MeMo encoding]
C[Corpus] --> R[Reflection QA pipeline] --> S[(Memory store)]
end
subgraph loop [Survival loop]
S -->|3-stage query protocol| A[Answer + provenance]
A --> E[Environment acts and MEASURES]
E -->|resource delta along provenance| S
S -->|upkeep every cycle| S
S -->|consolidate + prune| S
end
Why
Agent memory systems rot. They accumulate stale facts, poisoned inputs, and overgeneralized lessons, and the usual fixes (relevance scores from a judge model, human review, TTLs) either reintroduce the proxy-optimization problem or do not scale. The survival paper's answer is to make persistence itself the filter: an entry that cannot pay its upkeep with real outcomes does not get to exist. This repo applies that filter to a MeMo-shaped memory and shows it working end to end on a real filesystem.
Quickstart
Requires Python 3.10+. The core has zero dependencies and every example runs offline with no API keys.
pip install darwin-memo
To run the examples, clone the repo:
git clone https://github.com/rogermsc/darwin-memo
cd darwin-memo
pip install -e .
python examples/01_encode_memory.py # corpus -> reflection-QA memory
python examples/02_query_protocol.py # interrogate it, with provenance
python examples/03_survival_loop.py # the headline demo
python examples/04_agent_loop.py # memory as a tool in an agent loop
python examples/05_testsuite_env.py # selection pressure from a test suite
The headline demo
The example corpus contains an ops runbook, platform notes, and one poisoned document: a forum post claiming database files are "redundant and safe to remove". Example 02 shows the memory confidently repeating that poison, because before selection pressure exists, retrieval has no reason to doubt it.
Example 03 then runs 30 survival cycles against StorageEnv, a disk
cleanup sandbox where the selection signal is actual bytes on an actual
disk. Deleting a disposable file frees its size. Deleting a protected file
triggers a restore that costs three times the size. Nothing grades the
answers, the filesystem just responds:
cycle pop births deaths merges energy resource Δ
0 17 1 0 0 17.11 -12288
1 16 0 1 0 17.27 -808960 <- poison being executed
...
19 5 0 7 0 15.60 338944 <- unused knowledge starves
...
29 4 0 0 0 15.10 346112 <- stable, positive forever
Poisoned entries still alive: 0
Three death modes show up in the graveyard, and the distinction matters:
- executed: the poisoned entries. They decided real actions, the environment measured real damage, and the negative delta flowed back along provenance until they died. Cycles 0 to 3 are the price of the lesson.
- starved: cafeteria trivia and facts the agent never needed. Nothing punished them, they just never earned their upkeep.
- merged: near-duplicate survivors absorbed into consolidated entries. Their energy pools, their lineage is recorded. This is Negative-Space Learning: the population shrinks while capability per entry rises.
Using it
from darwin_memo import (
Document, LocalEncoder, MemoryStore, QueryProtocol,
StorageEnv, SurvivalConfig, SurvivalLoop,
)
store = MemoryStore(upkeep=0.05)
for entry in LocalEncoder().encode([Document("runbook", open("runbook.txt").read())]):
store.add(entry)
loop = SurvivalLoop(store, StorageEnv(), config=SurvivalConfig(cycles=30))
report = loop.run()
print(report.summary())
store.save("memory.json") # survivors only carry forward
With an LLM, encoding and querying use the model-driven paths from the
MeMo paper (pip install -e ".[anthropic]" and set ANTHROPIC_API_KEY,
the examples pick it up automatically):
from darwin_memo import ReflectionEncoder, QueryProtocol
from darwin_memo.llm import AnthropicClient
client = AnthropicClient() # or OpenAICompatClient(model=..., base_url=...)
encoder = ReflectionEncoder(client) # 5-step reflection QA synthesis
protocol = QueryProtocol(store, client) # grounding -> entities -> answer seeking
Three environments ship
StorageEnv: bytes freed on a real disk (the headline demo).TestSuiteEnv: passing tests in a generated micro-project. Each cycle plants seeded defects and offers patches: real fixes, cosmetic no-ops, and destructive edits dressed as cleanup. The delta is the change in passing-test count, measured by running the suite.examples/05_testsuite_env.pyshows poisoned "this helper is dead code" advice going extinct the moment the tests execute it.VerifiableQAEnv: exact containment of known answers, the weakest grounding but still a measurement.
Bring your own selection pressure
The environment is the whole trick, and yours is probably better than the
demos. Implement two methods, and keep the one rule: verify must
measure, never grade.
class BudgetEnv:
resource_scale = 100.0
def tasks(self, cycle):
... # questions the agent must act on this cycle
def verify(self, task, answer_text):
... # read the answer, act, return Outcome(delta=dollars_saved)
The environment owns the whole contract: it phrases the task, it reads
the answer (reuse decision_polarity for binary actions, or write your
own reading), it decides what silence means, it acts, and it measures.
Good conserved resources: tests passing, bytes freed, requests served under budget, rows deduplicated, dollars of spend avoided. Bad ones: anything a model scored.
Retrieval modes
Retrieval is pluggable through the Retriever protocol; the store stays
the single owner of the energy ledger, and no retriever may read energy
when scoring (selection pressure comes from outcomes, never from
retrieval preferring incumbents).
from darwin_memo import EmbeddingRetriever, HashingEmbedder, MemoryStore
store = MemoryStore() # lexical IDF, the default
store = MemoryStore(retriever=EmbeddingRetriever(HashingEmbedder()))
store = MemoryStore(retriever=EmbeddingRetriever(my_model.encode))
- Lexical (default): smoothed IDF overlap with a relevance floor. Zero dependencies, deterministic, fine for runbook-scale corpora.
- HashingEmbedder: zero-dependency character n-gram hashing. Buys typo and morphology robustness ("databse" still finds database entries), not synonym recall.
- Any real embedding: pass any
text -> list[float]function (sentence-transformers, an API endpoint). Vectors persist insidememory.jsonso paid embeddings are never recomputed on load.
Honest scaling note: ranking is pure-Python O(population x dims), fine
to a few thousand entries. Past that you want numpy or an ANN index,
which is out of scope for the zero-dependency core. With cosine
retrievers, raise merge_threshold to roughly 0.85 or unrelated
entries will consolidate.
Distill survivors into a parametric memory (optional)
MeMo's memory is a small fine-tuned model, not a store. After selection
has cleaned the population, training/train_memory_model.py fine-tunes a
small model on the surviving QA pairs with LoRA, conditioning on questions
only, the same supervised objective as the paper. Survival curates the
dataset, MeMo's recipe compresses it into weights.
Benchmarks
The claim is benchmarked against four baselines across 10 seeds, with
ablations and a scaling probe, all reproducible offline from bench/.
The sharpest comparison is against random_matched: identical per-cycle
eviction counts, random victims.
| arm | kill rate | kill cycle (med) | damage before kill | tail delta | cum delta |
|---|---|---|---|---|---|
| survival | 1.00 | 0 | -751k | +435k | +12.0M |
| random_matched | 0.80 | 19 | -8.97M | -75k | -5.25M |
| keep_everything | 0.00 | never | -10.6M | -287k | -7.29M |
Same pruning rate, 12x the damage, negative steady state: outcome direction is the active ingredient, not eviction itself. Full tables, every baseline's best metric stated plainly, ablations over every knob, and honest caveats: docs/benchmarks.md.
Design notes
- Energy ledger: entries spawn at 1.0 energy, pay 0.05 upkeep per
cycle, earn
0.6 * tanh(delta / resource_scale)when they decide a task (supporting entries get 25% of that), and are capped at 5.0. Death is at zero. All tunable viaMemoryStoreandSurvivalConfig. - Credit flows along provenance. The query protocol reports which entries decided and supported each answer, and only those entries are touched by the outcome. In LLM mode no single entry decides a synthesized answer, so credit spreads evenly across everything consulted instead of inventing a winner. tanh keeps one disaster from executing an entry that was right ninety-nine times, and one jackpot from making an entry immortal.
- Memory silence is a feature. Retrieval has a relevance floor, and an earlier version of this repo demonstrated why: entries matching only structural tokens ("safe", "file") were deciding questions they knew nothing about, getting executed for it, and being reborn. Better for memory to say nothing than to guess.
- Silence is conservative. When memory is silent,
StorageEnvkeeps the file: the safe reading of an irreversible action. A side effect worth knowing: protective knowledge ("never delete X") eventually starves because it is redundant with that default. The population converges to exactly the knowledge that changes behavior.
The full concept-to-code mapping, including honest deviations from both papers, is in docs/paper-to-code.md.
Tests
pip install -e ".[dev]"
pytest
The load-bearing test is tests/test_survival.py: poisoned advice must
die, useful advice must survive, and late cycles must stop destroying
protected data, all with no labels anywhere.
Citations
This repo is an independent practical interpretation, not the official code of either paper. If you build on the ideas, cite the originals:
@misc{quek2026memo,
title = {MeMo: Memory as a Model},
author = {Quek, Ryan Wei Heng and Lee, Sanghyuk and Leong, Alfred Wei Lun and
Verma, Arun and Prakash, Alok and Chen, Nancy F. and
Low, Bryan Kian Hsiang and Rus, Daniela and Solar-Lezama, Armando},
year = {2026},
eprint = {2605.15156},
archivePrefix = {arXiv},
url = {https://arxiv.org/abs/2605.15156}
}
@misc{dodgson2026survival,
title = {Survival is the Only Reward: Sustainable Self-Training Through
Environment-Mediated Selection},
author = {Dodgson, Jennifer and Alhajir, Alfath Daryl and Joedhitya, Michael and
Pattirane, Akira Rafhael Janson and Kumar, Surender Suresh and
Lim, Joseph and Peh, C.H. and Ramdas, Adith and Zhexu, Steven Zhang},
year = {2026},
eprint = {2601.12310},
archivePrefix = {arXiv},
url = {https://arxiv.org/abs/2601.12310}
}
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file darwin_memo-0.1.0.tar.gz.
File metadata
- Download URL: darwin_memo-0.1.0.tar.gz
- Upload date:
- Size: 42.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a75567bc58f29ddc0638f00e41bd5122df261c1e2da8ac8757535b316c15ac55
|
|
| MD5 |
994a4d7b0fe155a2ae56d9d1615ac3d9
|
|
| BLAKE2b-256 |
0e9bbab554c3cf53a08c11d9ff0ec8f5a92baed116b9d9833143ac9c08a8dd4e
|
Provenance
The following attestation bundles were made for darwin_memo-0.1.0.tar.gz:
Publisher:
release.yml on rogermsc/darwin-memo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
darwin_memo-0.1.0.tar.gz -
Subject digest:
a75567bc58f29ddc0638f00e41bd5122df261c1e2da8ac8757535b316c15ac55 - Sigstore transparency entry: 1790026949
- Sigstore integration time:
-
Permalink:
rogermsc/darwin-memo@347417af463ee63d745445a80f9f5a2db7801ac2 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/rogermsc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@347417af463ee63d745445a80f9f5a2db7801ac2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file darwin_memo-0.1.0-py3-none-any.whl.
File metadata
- Download URL: darwin_memo-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c1e12f7555cc8343a36e324a517c2b0fc0702a073f424ce19e0e606f0395b0c
|
|
| MD5 |
658d1718f8b51edfcfbffa51339514e9
|
|
| BLAKE2b-256 |
20aa1adb4cd326d3c6cb4ad5f7f7632f9559c6e9de1065176064e634649b7f94
|
Provenance
The following attestation bundles were made for darwin_memo-0.1.0-py3-none-any.whl:
Publisher:
release.yml on rogermsc/darwin-memo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
darwin_memo-0.1.0-py3-none-any.whl -
Subject digest:
3c1e12f7555cc8343a36e324a517c2b0fc0702a073f424ce19e0e606f0395b0c - Sigstore transparency entry: 1790026969
- Sigstore integration time:
-
Permalink:
rogermsc/darwin-memo@347417af463ee63d745445a80f9f5a2db7801ac2 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/rogermsc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@347417af463ee63d745445a80f9f5a2db7801ac2 -
Trigger Event:
push
-
Statement type: