Persistent, graph-backed memory for AI coding agents. Bayesian confidence, FTS5 search, HRR vocabulary bridge, entity-index retrieval, MCP server.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

robotrocketscience

These details have not been verified by PyPI

Project description

agentmemory

Persistent memory for AI coding agents. Your agent remembers what you discussed, decided, and corrected, so the next session does not start from scratch.

Read the handbook · Install · Workflow · Architecture · Benchmarks · Project writeup

Why

When a session ends, your agent forgets everything. You end up re-explaining the project, re-stating the same preferences, and watching the same mistakes happen again.

agentmemory captures decisions, corrections, and context as you work, and hands them back to the agent next time. No manual notes. No context files. Just memory.

Install

uv pip install git+https://github.com/robot-rocket-science/agentmemory.git
agentmemory setup

Restart Claude Code, then in any project:

/mem:onboard .

Full prerequisites and troubleshooting: docs/INSTALL.md.

What it does

Remembers automatically. Captures decisions, corrections, and preferences from your conversations without you lifting a finger.
Learns what matters. Memories that help get stronger over time. Memories that hurt get weaker. The system tunes itself to your project.
Stays on your machine. Everything lives in local SQLite. No cloud, no vector database, no telemetry unless you opt in.
Works with any MCP agent. Claude Code is the primary target, but any MCP-compatible client can connect to the server.

A sketch of what using it feels like

Session 1
─────────
you    We decided to use uv for this project, not poetry.
agent  Got it.

   ...session ends, days pass, new session opens...

Session 2
─────────
you    Set up the environment please.
agent  Using uv, per the project decision from last week.
       Pinning Python 3.12 as configured. Proceeding.

The second session starts already knowing. That is the whole pitch.

How it works

Conversations become scored beliefs in a local graph. Each belief gets stronger or weaker based on whether it helped. Retrieval pulls the most relevant subset into the agent's context on every turn, within a fixed token budget.

agentmemory pipeline: ingestion, retrieval, and feedback

Deep dive in the handbook: Chapter 5 - Architecture.

Documentation

The full handbook is at docs/README.md and is structured as a short book with prev/next navigation on every page. Jump to a chapter:

Part I - Getting Started: Installation · Workflow
Part II - Reference: Commands · Obsidian
Part III - Under the Hood: Architecture · Privacy
Part IV - Benchmarks and Research: Protocol · Results · Research Freeze

Wonder and Reason

agentmemory includes two graph-aware research commands that go beyond simple keyword search. They use the belief graph -- edges like SUPPORTS, CONTRADICTS, SUPERSEDES, CITES -- to surface connected evidence and detect reasoning gaps.

`/mem:wonder <topic>` -- Deep Research

Wonder is exploratory. You give it a topic and it fans out across the belief graph to collect everything relevant, even things you did not directly search for.

Retrieves seed beliefs via FTS5 keyword search
Expands outward along graph edges (BFS, configurable depth)
Scores uncertainty for each belief using Beta distribution variance
Detects contradictions between beliefs in the result set
Outputs a structured context block with three sections: Known Facts (direct hits), Connected Evidence (reached via graph traversal), and Open Questions (high-uncertainty beliefs)

Use wonder when you want to survey what the system knows about a topic before making a decision. It answers: "what do we know, what is connected, and where are we uncertain?"

`/mem:reason <question>` -- Hypothesis Testing

Reason is focused. You give it a question or hypothesis and it builds branching consequence paths to evaluate whether the evidence supports it.

Retrieves seed beliefs, then checks relevance (content-word overlap filter)
Builds consequence paths -- chains of beliefs linked by edges, with compound confidence decay at each hop
Checks constraints -- compares paths against locked beliefs for conflicts
Detects impasses -- four types: ties (contradicting beliefs at similar confidence), gaps (dead-end paths), constraint failures (conflicts with locked beliefs), and no-change (all low-confidence evidence)
Issues a verdict: SUFFICIENT, INSUFFICIENT, UNCERTAIN, CONTRADICTORY, or PARTIAL

Use reason when you need to evaluate a specific claim or decision. It answers: "does the evidence support this, and if not, where does the reasoning break down?"

The difference

Wonder is divergent -- cast a wide net, see what is out there. Reason is convergent -- evaluate a specific claim against the evidence. Together they form a research loop: wonder to survey the landscape, reason to test specific hypotheses that emerge from it.

Benchmarks

[!NOTE] About these numbers. I run and publish benchmarks because I believe objective, replicable methodology and transparent result reporting matter, and that readers deserve to see them. I place limited personal weight on the numbers themselves. V&V for agent memory systems is a specialized area where I do not have deep hands-on experience, and I cannot be fully confident that Claude and I have exercised these systems as rigorously as a dedicated evaluator would.

What I can commit to is the scientific rigor I was trained on and the professional engineering standards I am obligated to uphold: pre-registered hypotheses, contamination protocols, protocol-correct evaluation, and full methodology disclosure.

I welcome constructive criticism, independent replication, and analysis that refutes or supports any of these claims, and I would be glad to collaborate with anyone interested in strengthening the evaluation.

Evaluated across 5 published benchmarks. All results are protocol-correct with contamination-proof isolation (separate GT files, verified by verify_clean.py, enforced by 65 pytest protocol tests). No embeddings, no vector DB. Methodology follows the Lin checklist for reproducibility.

Results by version

Benchmark	Metric	v1.0	v1.1	v1.2.1	v2.2.2
MAB SH 262K	SEM	60%	90%	90%	92%
MAB MH 262K	SEM	6%	35%*	60%	58%
StructMemEval	Accuracy	29%	100%	100%	100%
LongMemEval	Opus judge	--	--	59.0%	59.6%
LoCoMo	F1	--	66.1%	66.1%	50.8%**

* chain-valid score; raw SEM was 47% ** reader variance; retrieval code unchanged from v1.2.1 (see analysis below)

Compared to published systems

Benchmark	agentmemory (best)	Paper ceiling / SOTA	Other published systems
MAB SH 262K	92% SEM	88% GPT-4o, 45% GPT-4o-mini	o4-mini 100% (6K context only)
MAB MH 262K	58% SEM	<=7% all methods (paper ceiling)	--
StructMemEval	100% (14/14)	vector stores fail	--
LongMemEval	59.6%	60.6% GPT-4o pipeline	--
LoCoMo	66.1% (v1.2.1)	87.9% human ceiling	92.3% EverMemOS, 74.0% Letta, 68.5% Mem0, 51.6% GPT-4-turbo

LoCoMo comparison note: EverMemOS (92.3%), Letta (74.0%), and Mem0 (68.5%) use different retrieval architectures (embeddings, filesystem grep, LLM extraction respectively). agentmemory uses FTS5 keyword retrieval only, no embeddings. The v2.2.2 LoCoMo regression (50.8%) is driven by LLM reader variance from sub-agent batching, not retrieval quality. Per Lin's methodology, single-run results are insufficient when the reader is a variable; >=5 runs with mean +/- std are needed.

Methodology, per-benchmark details, and audit trails: Chapter 8 - Benchmark Results.

Session metrics

Beyond benchmarks, agentmemory tracks real-world usage metrics from conversation logs. Run agentmemory metrics for the full report.

Metric	Value	Note
Correction rate	0.72%	FP-adjusted, ~90% precision
Retrieval tokens/search	~1,800	Stable (2K budget cap)
Retrieval budget fill	73% -> 100%	Improving as belief store grows
Correction trend	1.7% -> 0.5%	Suggestive, not yet significant
Fix commit rate	12%	50/404 commits in dev period

Evaluation protocol: docs/EVALUATION_PROTOCOL.md -- three-part framework covering benchmarks, acceptance tests (872+), and session metrics.

Development

git clone https://github.com/robot-rocket-science/agentmemory.git
cd agentmemory
uv sync --all-groups
uv run pytest tests/ -x -q
uv run pyright src/

Contributions welcome. See CONTRIBUTING.md.

Citation

If you use agentmemory in your research or project, please cite:

@software{agentmemory2026,
  author    = {robotrocketscience},
  title     = {agentmemory: Persistent Memory for AI Coding Agents},
  year      = {2026},
  url       = {https://github.com/robot-rocket-science/agentmemory},
  license   = {MIT}
}

License

MIT -- free for personal, commercial, and any other use. Citation appreciated.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

robotrocketscience

These details have not been verified by PyPI

Release history Release notifications | RSS feed

4.2.0

Apr 21, 2026

4.0.3

Apr 21, 2026

4.0.2

Apr 21, 2026

4.0.1

Apr 20, 2026

4.0.0

Apr 20, 2026

3.0.2

Apr 20, 2026

3.0.1

Apr 20, 2026

3.0.0

Apr 20, 2026

This version

2.5.0

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentmemory_rrs-2.5.0.tar.gz (4.9 MB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentmemory_rrs-2.5.0-py3-none-any.whl (195.5 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file agentmemory_rrs-2.5.0.tar.gz.

File metadata

Download URL: agentmemory_rrs-2.5.0.tar.gz
Upload date: Apr 20, 2026
Size: 4.9 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentmemory_rrs-2.5.0.tar.gz
Algorithm	Hash digest
SHA256	`d62c7da3fdf5f32d94832303b6587a5ce6ffc269abba0892f3a05bb198f2476a`
MD5	`f9eff67c2496f68ffdc7882d52fb3be9`
BLAKE2b-256	`e1fc44920cc46f311f6c21b5c304cc8b86a9fcf7728504d186ccf2df886e525b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentmemory_rrs-2.5.0.tar.gz:

Publisher: publish.yml on robot-rocket-science/agentmemory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentmemory_rrs-2.5.0.tar.gz
- Subject digest: d62c7da3fdf5f32d94832303b6587a5ce6ffc269abba0892f3a05bb198f2476a
- Sigstore transparency entry: 1341763961
- Sigstore integration time: Apr 20, 2026
Source repository:
- Permalink: robot-rocket-science/agentmemory@3ed1c373bf055f544460b2df4f32d201ebbb4849
- Branch / Tag: refs/tags/v2.5.0
- Owner: https://github.com/robot-rocket-science
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3ed1c373bf055f544460b2df4f32d201ebbb4849
- Trigger Event: push

File details

Details for the file agentmemory_rrs-2.5.0-py3-none-any.whl.

File metadata

Download URL: agentmemory_rrs-2.5.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 195.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentmemory_rrs-2.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`424d390346234686e181ed729ac389667af7c55faf28952ee291eb00ba90ea14`
MD5	`0f59c9b328733c261b454fefc51ad2d2`
BLAKE2b-256	`2968dd931258c8dd1b0036c50447ec0b1f1eb267168696ef1dd9d79fdf01ff21`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentmemory_rrs-2.5.0-py3-none-any.whl:

Publisher: publish.yml on robot-rocket-science/agentmemory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentmemory_rrs-2.5.0-py3-none-any.whl
- Subject digest: 424d390346234686e181ed729ac389667af7c55faf28952ee291eb00ba90ea14
- Sigstore transparency entry: 1341763964
- Sigstore integration time: Apr 20, 2026
Source repository:
- Permalink: robot-rocket-science/agentmemory@3ed1c373bf055f544460b2df4f32d201ebbb4849
- Branch / Tag: refs/tags/v2.5.0
- Owner: https://github.com/robot-rocket-science
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3ed1c373bf055f544460b2df4f32d201ebbb4849
- Trigger Event: push

agentmemory-rrs 2.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

agentmemory

Why

Install

What it does

A sketch of what using it feels like

How it works

Documentation

Wonder and Reason

/mem:wonder <topic> -- Deep Research

/mem:reason <question> -- Hypothesis Testing

The difference

Benchmarks

Results by version

Compared to published systems

Session metrics

Development

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`/mem:wonder <topic>` -- Deep Research

`/mem:reason <question>` -- Hypothesis Testing