Skip to main content

Agent memory with graph-based spreading activation retrieval and principled forgetting. 90% on LongMemEval temporal-reasoning.

Project description

Dory

Persistent memory for AI agents. Graph-based, spreading activation retrieval, principled forgetting. Single SQLite file. No server required.

pip install dory-memory
from dory import DoryMemory

mem = DoryMemory()
mem.observe("User prefers local-first AI")
mem.observe("User switched from llama.cpp to MLX — 25% faster")

print(mem.query("what does the user prefer for inference?"))
# → MLX (updated preference, supersedes llama.cpp)

LongMemEval (500q, oracle split): 80.6% with Claude Code MCP backend / 79.8% with direct Sonnet API.


The problem

Every session, your agent starts from zero. Systems that claim to "remember" typically do keyword search through a flat list of notes. That's not memory — it's ctrl+F.

The deeper problem: naive context injection makes things worse. Research (Chroma, 2025) shows all major frontier models degrade starting at 500–750 tokens of context. Dumping everything into a prompt creates noise that degrades performance on the things that actually matter.

What Dory does

Four memory types

Type What it stores Status
Episodic Past events, sessions, experiences
Semantic Facts, preferences, entities, relationships
Procedural Skills, workflows, repeatable processes
Working In-context window (managed by your LLM)

Spreading activation retrieval — not vector similarity search. Relevant memories pull in connected memories through the graph. "AllergyFind" activates "Giovanni's" activates "FastAPI" activates "menu endpoint" because those things co-occurred. That's how human associative memory works.

Cacheable prefix output — Dory splits output into a stable prefix (unchanged until memory changes, enabling prompt cache hits) and a dynamic suffix (query-specific). Result: cache hits on every turn. Substantially cheaper to run agents with memory than without.

Principled forgetting — three decay zones: active, archived, expired. Scores based on recency + frequency + relevance. Archived memories are queryable for historical context ("what was true in January?"). Nothing is ever deleted — only decayed.

Bi-temporal conflict resolution — when a fact changes, the old version is archived with a SUPERSEDES edge and a timestamp. Full provenance for every update.

Zero-server stack — single SQLite file. FTS5 for keyword search, adjacency tables for the graph. No Postgres, no Neo4j, no Redis. Works offline.


Quick start

from dory import DoryMemory

mem = DoryMemory()

# Add memories manually
mem.observe("Alice is migrating payments from Stripe to a custom processor", node_type="EVENT")
mem.observe("Alice prefers async Python over synchronous frameworks", node_type="PREFERENCE")
mem.observe("The migration deadline is end of Q2", node_type="EVENT")

# Query — returns context to inject into your LLM prompt
context = mem.query("payment migration deadline")
print(context)

# End of session: consolidate, decay, promote core memories
mem.flush()

# See your graph in the browser
mem.visualize()

Or from the command line:

dory visualize          # opens graph in browser
dory show               # print stats + core memories
dory query "topic"      # spreading activation from the terminal

With auto-extraction (Dory extracts memories from conversation turns automatically):

mem = DoryMemory(extract_model="qwen3:8b")                  # local via Ollama (5 GB)
mem = DoryMemory(extract_model="qwen3:14b")                 # local via Ollama (9 GB, better quality)
mem = DoryMemory(                                           # Claude
    extract_model="claude-haiku-4-5-20251001",
    extract_backend="anthropic",
    extract_api_key="sk-ant-...",
)
mem = DoryMemory(                                           # GPT / Grok / any compat
    extract_model="gpt-4o-mini",
    extract_backend="openai",
    extract_api_key="sk-...",
)

# Log turns — extraction happens automatically every N turns
mem.add_turn("user", "I'm working on AllergyFind today, need to add a menu endpoint")
mem.add_turn("assistant", "What authentication approach are you using?")

# Build API-ready messages with prompt caching
result = mem.build_context("menu endpoint authentication")
messages = result.as_anthropic_messages(user_query)   # Anthropic SDK w/ cache_control
messages = result.as_openai_messages(user_query)      # OpenAI / compat

MCP server (Claude Code / Claude Desktop)

pip install 'dory-memory[mcp]'

# Find the installed binary path (needed if installed in a venv)
which dory-mcp

# Register globally across all Claude Code projects
claude mcp add --scope user dory -- /full/path/to/dory-mcp --db ~/.dory/engram.db

The --db path defaults to ~/.dory/engram.db if omitted. You can also set DORY_DB_PATH as an environment variable.

Verify the server connected:

claude mcp list   # should show dory ✓ Connected

Five tools are exposed: dory_query, dory_observe, dory_consolidate, dory_visualize, dory_stats.

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "dory": {
      "command": "/full/path/to/dory-mcp",
      "args": ["--db", "/Users/you/.dory/engram.db"]
    }
  }
}

Interactive demo

Live graph visualization →

Dory memory graph demo

Force-directed knowledge graph with spreading activation query mode, edge type coloring, archived/superseded nodes, and session summary chain.


Framework adapters

LangChain — drop-in BaseMemory replacement:

from dory.adapters.langchain import DoryMemoryAdapter
from langchain.chains import ConversationChain
from langchain_anthropic import ChatAnthropic

memory = DoryMemoryAdapter(
    extract_model="claude-haiku-4-5-20251001",
    extract_backend="anthropic",
    extract_api_key="sk-ant-...",
)
chain = ConversationChain(llm=ChatAnthropic(model="claude-sonnet-4-6"), memory=memory)

LangGraph — graph nodes with the (state) -> state signature:

from dory.adapters.langgraph import DoryMemoryNode, MemoryState
from langgraph.graph import StateGraph, START, END

mem = DoryMemoryNode(extract_model="claude-haiku-4-5-20251001", extract_backend="anthropic")

builder = StateGraph(MemoryState)
builder.add_node("load_memory", mem.load_context)
builder.add_node("record_turn", mem.record_turn)
builder.add_edge(START, "load_memory")
builder.add_edge("load_memory", "record_turn")
builder.add_edge("record_turn", END)
graph = builder.compile()

Multi-agent — shared memory pool with thread-safe writes and agent attribution:

from dory.adapters.multi_agent import SharedMemoryPool

pool = SharedMemoryPool(db_path="shared.db")
pool.observe("User prefers dark mode", agent_id="agent-1")
pool.add_turn("user", "Let's ship it", agent_id="agent-2", session_id="s1")
results = pool.query("UI preferences")

Async API

All DoryMemory methods have async counterparts — safe to await from FastAPI, LangGraph, and any async framework:

context = await mem.aquery("current topic")
result  = await mem.abuild_context("current topic")
await mem.aadd_turn("user", "message")
node_id = await mem.aobserve("User prefers JWT", node_type="PREFERENCE")
stats   = await mem.aflush()

Export / import

from dory.export.jsonld import JSONLDExporter

exporter = JSONLDExporter(graph)
exporter.export("memory.jsonld.json")
JSONLDExporter.import_into(graph, "memory.jsonld.json")

How it works

Knowledge graph

Every piece of information is a typed node: ENTITY, CONCEPT, EVENT, PREFERENCE, BELIEF, PROCEDURE, SESSION (episodic narrative), SESSION_SUMMARY (structured episodic). Edges between them are typed and weighted: USES, WORKS_ON, PREFERS, SUPERSEDES, CO_OCCURS, SUPPORTS_FACT, TEMPORALLY_AFTER, etc.

Salience is computed from connectivity, activation frequency, and recency. High-salience nodes become core memories — they anchor the stable context prefix.

Observer

Every N conversation turns, the Observer calls an LLM to extract structured memories. Extractions carry confidence scores — anything below threshold is logged but not written to the graph.

Backends: Ollama (default), Anthropic (Claude), or any OpenAI-compatible endpoint.

Prefixer

Builds context in two parts:

[stable prefix]         ← core memories + key relationships
                          same bytes across turns → prompt cache hits

[dynamic suffix]        ← spreading activation for this specific query
                          + recent episodic observations

Decayer

score = recency_weight  × exp(-λ × days_since_activation)
      + frequency_weight × log(1 + activation_count)
      + relevance_weight × salience

Nodes below the active floor → archived. Below the archive floor → expired. Core memories are shielded with a configurable multiplier.

Reflector

Near-duplicate detection (Jaccard ≥ 0.82): merges duplicates, keeping the higher-salience node and rewiring edges. Supersession detection (Jaccard in [0.45, 0.82), shared subject): archives the older node, adds SUPERSEDES provenance edge. Old observations compressed into summaries.


Architecture

dory/
├── graph.py          ← nodes, edges, salience computation
├── schema.py         ← NodeType, EdgeType, zone constants
├── activation.py     ← spreading activation engine
├── consolidation.py  ← edge decay, strengthen, prune, promote/demote core
├── session.py        ← session-level helpers: query, observe, write_turn, end_session
├── memory.py         ← DoryMemory — high-level API (sync + async)
├── visualize.py      ← D3.js interactive graph visualization
├── mcp_server.py     ← MCP tools (dory_query, dory_observe, dory_consolidate, …)
├── store.py          ← SQLite backend (nodes, edges, FTS5, observations)
│
├── pipeline/
│   ├── observer.py   ← LLM extraction of memories from conversation turns
│   ├── summarizer.py ← episodic layer: SESSION nodes from conversation turns
│   ├── prefixer.py   ← stable prefix + dynamic suffix builder
│   ├── decayer.py    ← node decay scoring + zone management
│   └── reflector.py  ← dedup, supersession, observation compression
│
├── adapters/
│   ├── langchain.py   ← DoryMemoryAdapter (BaseMemory drop-in)
│   ├── langgraph.py   ← DoryMemoryNode (StateGraph integration)
│   └── multi_agent.py ← SharedMemoryPool (thread-safe multi-agent)
│
└── export/
    └── jsonld.py      ← JSON-LD round-trip export/import

Local LLM setup

ollama pull qwen3:14b          # extraction
ollama pull nomic-embed-text   # embeddings (768-dim, offline after pull)

OpenAI-compatible endpoint (llama.cpp server, vLLM, etc.):

obs = Observer(graph, backend="openai", base_url="http://localhost:8000", model="qwen3")

Vector search activates automatically once nomic-embed-text is available. Falls back to FTS5 BM25 if no embedding model is running.


Decay zones

Zone Behavior How to query
active Retrieved in all normal queries graph.all_nodes() (default)
archived Invisible to normal queries graph.all_nodes(zone="archived")
expired Completely invisible graph.all_nodes(zone=None)

Memory is never deleted — only decayed. Archived and expired nodes retain full provenance and can be restored if reactivated. The one exception: exact structural duplicates detected by the Reflector are hard-merged (lower-salience copy removed, edges rewired to the winner).


Comparison

mem0 Zep Letta Mastra Dory
Principled forgetting
Spreading activation retrieval
Cacheable prefix output ✓ (TS only)
Bi-temporal conflict resolution
Zero-server local stack partial partial
Drop-in Python library partial
Apache 2.0

Graph topology — what flat search can't do

Run examples/demo_topology.py to see six live graph traversals:

Q1 · Supersession — "What was the inference backend before MLX replaced it?"

  ┌ BEFORE  [PREFERENCE]  Prefers llama.cpp — cross-platform, well-supported
  │         zone=archived  archived=2026-03-01
  ├─SUPERSEDES──▶
  └ AFTER   [PREFERENCE]  Prefers MLX over llama.cpp on Apple Silicon (20-30% faster)

  ✗ Flat search: returns both nodes with equal score. No directionality. No timestamp.

──────────────────────────────────────────────────────────────────────
Q4 · Semantic Path — "How does local-first philosophy connect to the 80.6% result?"

  ● [CONCEPT]    Local-first AI — data stays on device, no cloud
    └─[CO_OCCURS]──▶
  ● [PREFERENCE] Prefers local-first — no data leaves device unless necessary
    └─[PREFERS]──▶
  ● [ENTITY]     Developer — solo, Apple Silicon
    └─[WORKS_ON]──▶
  ● [ENTITY]     Dory — agent memory library
    └─[CO_OCCURS]──▶
  ● [EVENT]      [2026-03-28] v0.5 temporal spot check — 90.0% temporal-reasoning

  ✗ Flat search: returns both endpoints as separate results. No connecting path.
Query Traversal What it answers
Q1 Supersession SUPERSEDES edges What changed and when
Q2 Chronicle TEMPORALLY_AFTER chain Full session history in order
Q3 Dependencies USES traversal (depth 2) What a project actually needs
Q4 Semantic Path BFS across typed edges How two concepts connect
Q5 Provenance SUPPORTS_FACT traversal What proves a specific fact
Q6 Belief Grounding SUPPORTS_FACT + BELIEF Which beliefs have evidence

Benchmark results

LongMemEval (ICLR 2025), oracle split, 500 questions.

Version Extract Answer n Score
v0.1 Haiku Haiku 500 54.4%
v0.1 Sonnet Sonnet 500 66.8%
v0.3 Sonnet Sonnet (direct API) 500 79.8%
v0.4 Haiku Claude Code (MCP) 500 80.6%
v0.5 Haiku Claude Code (MCP) 30* 90.0% temporal

* v0.5 temporal spot check (30 questions, temporal-reasoning category only). Full 500q run pending.

Per-category (v0.3 vs v0.4):

Category v0.3 Sonnet v0.4 Claude Code MCP Δ n
knowledge-update 84.6% 89.7% +5.1 78
multi-session 80.5% 79.7% -0.8 133
single-session-assistant 87.5% 83.9% -3.6 56
single-session-preference 46.7% 63.3% +16.6 30
single-session-user 88.6% 92.9% +4.3 70
temporal-reasoning 75.9% 72.2% -3.7 133
Overall 79.8% 80.6% +0.8 500

The 0.8pp overall difference is within the noise floor (±3.5pp binomial CI on 500 questions). The category-level results are the meaningful signal — in particular, single-session-preference (+16.6pp) is consistent across two independent runs. Full methodology and failure analysis: benchmarks/REPORT_claudecode_mcp_v04.md.

Published scores for reference: Mem0 68.4%, Zep 71.2%, Mastra 94.87%¹.

¹ Mastra uses GPT-4o-mini on TypeScript. Architecturally different stacks — not directly comparable.

Note: LongMemEval oracle split uses pre-filtered context (~15K tokens per question). Performance with live, unfiltered conversations will differ.


Roadmap

v0.1

  • MCP server — dory_query, dory_observe, dory_consolidate, dory_visualize, dory_stats
  • LangChain adapter — DoryMemoryAdapter implements BaseMemory
  • LangGraph adapter — DoryMemoryNode for StateGraph integration
  • Procedural memory — PROCEDURE node type
  • Multi-agent shared memory — SharedMemoryPool with thread-safe writes
  • JSON-LD export/import

v0.2

  • Episodic layer — SESSION_SUMMARY nodes with salient_counts metadata
  • Retrieval fusion — three-mode routing (graph / episodic / hybrid)
  • Behavioral preference synthesis — Reflector synthesizes PREFERENCE nodes from repeated patterns without LLM calls

v0.3

  • 79.8% on LongMemEval full 500-question run (Sonnet)
  • Temporal arithmetic prompt — step-by-step date math before answering
  • Count cross-validation via salient_counts
  • Graph topology demo — demo_topology.py
  • Ollama demo — demo_ollama.py, fully local, no API key required

v0.4

  • 80.6% on LongMemEval full 500-question run (Claude Code MCP backend)
  • --answer-backend claude-code-mcp benchmark option — Claude Code queries Dory autonomously via MCP
  • Preference context improvements — FTS-ranked retrieval, deduplication, event elevation
  • Extended PREFERENCE extraction guidance in Observer

v0.5

  • Observer async extraction — ThreadPoolExecutor, LLM calls parallel, writes serialized; flush() is sync point
  • Temporal date-anchoring — REFERENCE DATE: at top of MCP system prompt; inclusive day-counting rule
  • Confidence-seeded activation_count — high-confidence extractions start with higher salience, decay slower
  • Session diversity weighting — distinct_sessions field; cross-session nodes score higher than single-session enthusiasm
  • Archived node isolation — archived nodes no longer surface in dory_query results
  • Removed behavioral synthesis noise — Reflector keyword-synthesis disabled; signal handled structurally

v0.6 (planned)

  • Full 500q benchmark run with v0.5 pipeline
  • Multi-session counting improvements

Research basis


License

Apache 2.0 — see LICENSE.


Named after Dory from Finding Nemo, because your AI agent right now is Dory. This fixes it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dory_memory-0.5.0.tar.gz (101.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dory_memory-0.5.0-py3-none-any.whl (89.7 kB view details)

Uploaded Python 3

File details

Details for the file dory_memory-0.5.0.tar.gz.

File metadata

  • Download URL: dory_memory-0.5.0.tar.gz
  • Upload date:
  • Size: 101.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for dory_memory-0.5.0.tar.gz
Algorithm Hash digest
SHA256 ea73058838a361ecc7a926b2d8236c5aca128fd77c484a64e5c6f94263392161
MD5 6e4e00265075ed7749143a249cf1b83b
BLAKE2b-256 3070eaa02c81f69bbe5e8fdb2fca3951162ff766c7264252a930ad98374a8829

See more details on using hashes here.

File details

Details for the file dory_memory-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: dory_memory-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 89.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for dory_memory-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5d78e31c0d821a23443df43347d00f7f14e14c39a4e1bd57228c3434d057775
MD5 89ff160643ed64a5364569dd122d0ea9
BLAKE2b-256 6cb947d202ee1ab4f345cc4cd1fe21190c2b300a972cb6b3765987f0baf83f2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page