Local-first agent memory: an Obsidian markdown vault as source of truth, with a rebuildable DuckDB index.
Project description
๐ชจ agentcairn
Local-first memory for AI agents โ that you can actually read, edit, and own.
cairn ย /kษษn/ย ยท noun โ a stack of stones raised to mark a trail or a place worth remembering, left for whoever comes next.
agentcairn gives your coding agent durable, high-quality memory โ but instead of locking it in an opaque database or a cloud service, your memories live as plain Markdown in an Obsidian vault you own. A fast, rebuildable DuckDB index sits on top for retrieval. Open your vault, read what the agent remembered, fix a wrong fact by hand, or drop in your own notes โ and the agent picks it all up.
Why agentcairn is different
Most agent-memory systems make a database or cloud store the source of truth and treat files (if any) as a one-way export. agentcairn inverts that:
- ๐ Your vault is the source of truth โ not an export. Memory is human-readable Markdown with frontmatter and
[[wikilinks]]. Edit it in Obsidian; the index honors your edits. - โป๏ธ The index is disposable. DuckDB is a rebuildable cache (
cairn reindex). Your memory survives a model upgrade, a corrupted index, a schema change, or uninstalling the tool โ zero data loss, because the truth is just files on disk. - ๐ง Non-lossy by construction. The full note is always retained. Distillation only adds derived notes that link back to the source โ it never silently drops facts it didn't think to extract at write time.
- ๐ Redaction before every write. Secrets are scrubbed (regex + entropy + URL-credential detection) before anything โ body, title, or tags โ reaches the plaintext vault. We write files you can read, so we treat a leaked credential as the worst failure mode.
- ๐ธ๏ธ A free, deterministic knowledge graph. Your
[[wikilinks]]and frontmatter are the graph โ no LLM extraction, no hallucinated entities. - ๐ชถ Daemonless, zero external DB. One embedded DuckDB file does semantic vector search, BM25 full-text, and graph traversal. No always-on server, no Neo4j/Postgres/Qdrant, no required cloud key โ just a
cairnCLI and an on-demand MCP server. - ๐ Honestly measured. A reproducible LongMemEval-S + LoCoMo harness ships in
benchmarks/โ with real numbers, ablations, and explicit caveats instead of one cherry-picked headline (see below).
Install
The easiest way to use agentcairn is the Claude Code plugin โ one install wires up the MCP server, ambient memory (recall at session start, capture at session end), a memory skill, and slash commands:
claude plugin marketplace add ccf/agentcairn
claude plugin install agentcairn@agentcairn
On install you pick a vault path (default ~/agentcairn); it's auto-created on the first session โ no Obsidian setup required. From then on agentcairn surfaces relevant memory at the start of each session, distills each session into your vault, and gives you /agentcairn:recall, /remember, /memory, /savings, and /ingest. Nothing to pip-install โ the plugin runs the published package via uvx.
Not on Claude Code? agentcairn is also a standalone MCP server + CLI for any host โ see Using it directly.
How it works
flowchart LR
T["Session transcripts<br/>(out-of-band)"]
H["You ยท Obsidian<br/>(hand edits)"]
V["๐ Obsidian vault<br/>Markdown + frontmatter + wikilinks<br/><b>source of truth</b>"]
I["โป๏ธ DuckDB index<br/>vector + BM25 + graph<br/><b>rebuildable cache</b>"]
M["MCP tools<br/>remember ยท recall ยท search ยท build_context ยท recent"]
T -- "redact โ dedup โ distill" --> V
H -- "edit" --> V
V -- "parse / reconcile-on-spawn" --> I
I -- "READ_ONLY hybrid recall" --> M
M -. "remember (redacted write)" .-> V
classDef truth fill:#eaf1ff,stroke:#317cff,color:#191919;
classDef cache fill:#f5f5f3,stroke:#999999,color:#191919;
class V truth
class I cache
- Capture reads your agent harness's session transcripts (append-only, already on disk) out-of-band โ robust by design, with no fragile live hooks โ then redacts โ dedups โ importance-gates โ distills into the vault, non-lossily. Plus an agent-driven
remembertool for curated, high-value memories. - Retrieval fuses BM25 + semantic vectors with Reciprocal Rank Fusion, applies an optional graph-boost, and degrades gracefully down to keyword-only when no embedding model is available โ so recall is never silently dead. An optional cross-encoder reranker adds precision.
- Hybrid intelligence: offline local embeddings (FastEmbed /
nomic-embed-text-v1.5by default) out of the box โ strong on its own and in the hybrid fusion (withnomic, vector-only edges out BM25 even on short turns; see the benchmark). SetCAIRN_EMBED_MODELto pick another FastEmbed model, or runCAIRN_EMBEDDER=ollama/ a cloud tier to go further. - Temporal memory: notes may carry
valid_from/valid_until/superseded_byfrontmatter. Recall is validity-aware โ it soft-demotes superseded and expired facts (the current fact wins) without ever hiding them (non-lossy), and annotates each result's status (current/superseded/expired/not_yet_valid) plus anas_ofanchor so the agent can reason over time. Inert for notes with no validity fields.
Using it directly
The plugin is the easiest path, but agentcairn is just a package โ use it without Claude Code via the on-demand MCP server (for any MCP host) or the cairn CLI:
uvx agentcairn # on-demand MCP server for any MCP host
cairn ingest --vault ~/vault # distill recent agent sessions into the vault
cairn sweep --vault ~/vault # ingest + reindex in one pass (cron-friendly)
cairn recall "how did we fix the auth bug?" # hybrid recall from the CLI
cairn savings # how much context recall has saved you
cairn reindex ~/vault # rebuild the index from Markdown (always safe)
cairn doctor # health-check the index
Agents supported
agentcairn works at two levels. Claude Code gets a first-class plugin โ the full ambient loop (recall at session start, capture at session end), a memory skill, and slash commands. Every other MCP host gets the same recall/search/remember tools via the portable MCP server; cairn install wires it in non-destructively (your other servers are preserved, the original is backed up to <config>.bak). The vault stays a single global ~/agentcairn, so memory is shared across every host.
| Host | Support | Set up with | Ambient capture |
|---|---|---|---|
| Claude Code | ๐ข First-class plugin | claude plugin install agentcairn@agentcairn |
โ recall-at-start + capture-at-end |
| Cursor | ๐ MCP server | cairn install cursor |
โ |
| Claude Desktop | ๐ MCP server | cairn install claude-desktop |
โ |
| VS Code (Copilot) | ๐ MCP server | cairn install vscode |
โ |
| Gemini CLI | ๐ MCP server | cairn install gemini |
โ |
| Antigravity | ๐ MCP server | cairn install antigravity |
โ |
| Codex CLI | ๐ MCP server | cairn install codex |
โ |
| Any other MCP host | ๐ MCP server | uvx agentcairn (paste the cairn install โฆ --print snippet) |
โ |
cairn install # detect installed hosts + preview (writes nothing)
cairn install cursor # configure one host
cairn install --all # configure every detected host
cairn install codex --print # just print the snippet, change nothing
Most hosts take a JSON mcpServers entry (VS Code uses its servers key); Codex takes a TOML [mcp_servers.agentcairn] table (comments and other tables preserved). Ambient memory (auto recall-at-start, capture-at-end) is Claude-Code-only today โ cross-host capture is tracked in #36.
Benchmarks measured
We benchmark agentcairn the way we'd want a memory system measured โ reproducibly, with ablations, and without a single cherry-picked headline number. The harness (benchmarks/) runs LongMemEval-S and LoCoMo through a version-pinned downloader (datasets are never vendored), scores retrieval deterministically (recall/nDCG@k, MRR โ no API key needed, runs in CI on a synthetic fixture), and offers an opt-in LLM-judged QA layer.
Retrieval โ LoCoMo
Full LoCoMo set, turn-level, macro-avg, FastEmbed nomic-embed-text-v1.5 (the default embedder):
| arm | recall@5 | recall@10 | MRR |
|---|---|---|---|
| BM25 only | 0.527 | 0.604 | 0.459 |
| vector only | 0.536 | 0.637 | 0.433 |
| hybrid (RRF) | 0.562 | 0.648 | 0.477 |
| hybrid + graph-boost | 0.562 | 0.648 | 0.477 |
| hybrid + reranker | 0.662 | 0.735 | 0.608 |
What we read from this โ and say out loud:
- Hybrid beats either arm alone โ RRF fusion is worth it.
- The cross-encoder reranker is the biggest lever (+0.10 recall@5 over hybrid); the "ms-marco domain-shift might hurt" worry didn't materialize on conversational data.
- The embedder default now pulls its weight โ with
nomic, vector-only edges out BM25 (0.536 vs 0.527); switching from the oldbge-smalldefault (which trailed at 0.483) closed the gap. A 5-model FastEmbed sweep settled the pick โnomic(768-d) wins on quality-per-dim; bigger 1024-d models don't beat it. Full table:benchmarks/README.md. - graph-boost is inert on these corpora โ LoCoMo/LongMemEval have no native
[[wikilink]]graph, so the boost has nothing to fire on. It's for real interlinked vaults, not chat logs, and we don't pretend otherwise.
Retrieval โ LongMemEval-S
Full 500-instance set โ an easier task with well-separated evidence sessions. Session level is the granularity prior work reports; turn level is the finer, corpus-revealing slice:
| arm | session r@5 | session MRR | turn r@5 | turn r@10 | turn MRR |
|---|---|---|---|---|---|
| BM25 only | 0.920 | 0.918 | 0.680 | 0.791 | 0.638 |
| vector only | 0.936 | 0.916 | 0.507 | 0.692 | 0.454 |
| hybrid (RRF) | 0.954 | 0.938 | 0.640 | 0.798 | 0.544 |
| hybrid + reranker | 0.969 | 0.963 | 0.788 | 0.891 | 0.716 |
Read honestly:
- Our 0.969 session recall@5 sits right alongside prior work's โ0.95 over the same full 500-question set โ and at full scale it discriminates (0.920 BM25 โ 0.969 reranker) rather than saturating the way a small sample does.
- The reranker is again the biggest lever โ turn r@5 0.640 โ 0.788, session r@5 0.954 โ 0.969.
- Turn level is corpus-revealing: here BM25-only (0.680) beats the RRF hybrid (0.640) because vector-only is weak on these single-turn evidence spans (0.507); the reranker is what pulls the default ahead. (Contrast LoCoMo, where vector-only edges out BM25.)
Context efficiency
How much smaller is the context agentcairn recalls than the full history you'd otherwise carry into the model? Default config (hybrid + reranker, k=10):
| dataset | queries | mean haystack | mean recalled (k=10) | context reduction |
|---|---|---|---|---|
| LoCoMo (3 convos) | 497 | 25,646 tok | 529 tok | 51.1ร mean / 50.3ร median |
| LongMemEval-S (full 500) | 470 | 136,552 tok | 2,207 tok | 64.7ร mean / 61.6ร median |
Estimate (~4 chars/token), not a billed cost; "haystack" = the full indexed history, "recalled" = the top-k chunks returned. It measures context size, independent of retrieval quality.
QA accuracy
QA-accuracy numbers (LLM-judged) are available too, but use an Anthropic judge rather than the papers' GPT-4o, so they are not comparable to published leaderboards โ valid for relative ablation signal only. See benchmarks/README.md for how to run it and how to read the numbers.
Roadmap
- v1 โ done. The core loop: transcript ingestion โ redaction โ Markdown โ rebuildable DuckDB index โ hybrid recall; MCP server + CLI; secret redaction; local embeddings; reproducible benchmark harness.
- v1.1 โ next, prioritized by the benchmark above:
- โ
Reranker on by default โ the largest measured retrieval lever;
CAIRN_RERANK=0to disable. (shipped) - Ollama embedding tier โ โ
local models via
CAIRN_EMBEDDER=ollama(CAIRN_EMBED_MODEL/OLLAMA_HOST); cloud (OpenAI/Voyage) still pending. - โ
Bi-temporal validity โ frontmatter
valid_from/valid_until/superseded_by; recall soft-demotes superseded/expired facts (non-lossy โ never hidden) and annotates each result's currency + anas_ofanchor, so the current fact wins and the agent can reason over time. (shipped) - In-memory HNSW for large-vault retrieval latency.
- โ
Reranker on by default โ the largest measured retrieval lever;
- v2 โ Obsidian plugin surface, MotherDuck cloud sync, optional LLM entity extraction.
Development
agentcairn uses uv exclusively for dependency management and tooling.
Do not use pip, poetry, or global virtual environments.
# First-time setup
uv sync # create .venv and install all deps (including dev)
uv run pre-commit install # install git hooks (ruff + pytest run on every commit)
# Daily use
uv run pytest # run the test suite
uv run cairn --help # run the CLI
uvx agentcairn # run the installed tool ephemerally (as the MCP server does)
# Formatting and linting
uv run ruff format . # format all Python files
uv run ruff check --fix . # lint with auto-fix
uv run pre-commit run --all-files
# Benchmarks (offline retrieval metrics need no API key)
uv run pytest benchmarks/tests/ # offline synthetic-fixture suite
PYTHONPATH=benchmarks uv run --group bench python -m cairn_bench.run --dataset locomo
The MCP server is launched via uvx agentcairn โ no global install required.
License
Apache License 2.0 โ permissive, with an explicit patent grant. Copyright ยฉ 2026 Charles C. Figueiredo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentcairn-0.6.1.tar.gz.
File metadata
- Download URL: agentcairn-0.6.1.tar.gz
- Upload date:
- Size: 489.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
682850a71a242878b4dfcb4d92ee8754c2252e04eaf758453cbba083d7616dfc
|
|
| MD5 |
7a4e2e0ca3de916778ec3c23f331db16
|
|
| BLAKE2b-256 |
d7595067f1afd2660e9081d7c866bc5b36f36556368f4e8e408e3512c2513f93
|
Provenance
The following attestation bundles were made for agentcairn-0.6.1.tar.gz:
Publisher:
release.yml on ccf/agentcairn
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentcairn-0.6.1.tar.gz -
Subject digest:
682850a71a242878b4dfcb4d92ee8754c2252e04eaf758453cbba083d7616dfc - Sigstore transparency entry: 1791891648
- Sigstore integration time:
-
Permalink:
ccf/agentcairn@b2e13a39c73104bf41e7407512c3c7663177ae07 -
Branch / Tag:
refs/tags/v0.6.1 - Owner: https://github.com/ccf
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b2e13a39c73104bf41e7407512c3c7663177ae07 -
Trigger Event:
push
-
Statement type:
File details
Details for the file agentcairn-0.6.1-py3-none-any.whl.
File metadata
- Download URL: agentcairn-0.6.1-py3-none-any.whl
- Upload date:
- Size: 63.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d09e41f1992d9fde801f057152fce1fbb39f45cdbe550fe690aa355535218b9b
|
|
| MD5 |
f38f3dbf3e5892cd13aa308b92660d76
|
|
| BLAKE2b-256 |
fa2d84bda9d2d7fa7868feeb62d61d5e1edca42030332067bc9ca4979089af47
|
Provenance
The following attestation bundles were made for agentcairn-0.6.1-py3-none-any.whl:
Publisher:
release.yml on ccf/agentcairn
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentcairn-0.6.1-py3-none-any.whl -
Subject digest:
d09e41f1992d9fde801f057152fce1fbb39f45cdbe550fe690aa355535218b9b - Sigstore transparency entry: 1791891719
- Sigstore integration time:
-
Permalink:
ccf/agentcairn@b2e13a39c73104bf41e7407512c3c7663177ae07 -
Branch / Tag:
refs/tags/v0.6.1 - Owner: https://github.com/ccf
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b2e13a39c73104bf41e7407512c3c7663177ae07 -
Trigger Event:
push
-
Statement type: