Adaptive memory for LLM agents — adaptive decay, knowledge graph, zero infrastructure
Project description
elfmem
Adaptive memory for LLM agents. Knowledge that evolves through use.
elfmem began as an experiment: could a trading bot develop a concept of self, learning which strategies succeeded, which failed, and evolving its approach through experimentation? That question led to a fundamental insight: agents don't need a database of facts. They need memory that adapts. Knowledge that gets reinforced through successful use should grow stronger. Knowledge that misleads should fade. And the agent's identity, its values, style, and hard-won lessons, should persist across every session.
We built elfmem from the ground up to make this real. It's a memory system modelled on how biological memory works: fast ingestion, deep consolidation at pauses, adaptive decay at rest, and a knowledge graph where related ideas strengthen each other over time. Every design decision was derived from first principles across 26 structured explorations, not borrowed from existing patterns, but built from axioms about how agent memory should work.
One SQLite file. Zero infrastructure. Any LLM provider.
An agent's knowledge after several sessions. Nodes are memory blocks, sized by confidence and coloured by decay tier. Edges are semantic relationships discovered during consolidation. Identity blocks (permanent tier) anchor the centre. Knowledge that gets used grows; knowledge that doesn't fades toward the periphery.
Why elfmem exists
To build memory that truly evolves, we had to innovate in areas that existing tools don't address.
Agents need identity, not just storage. Your agent isn't a search index. It has values, a style, and preferences that should persist across every session. elfmem introduces the SELF frame: a persistent identity layer where core beliefs get near-permanent decay rates. Your agent remembers who it is.
Knowledge must earn its place. In most memory systems, everything stored is equally permanent. In elfmem, knowledge lives or dies based on whether it's useful. Blocks that guide successful decisions get reinforced: their confidence rises, their connections strengthen. Blocks that mislead get penalised and eventually archived. After a few sessions, the memory is measurably better than when it started.
Retrieval depends on context. Looking up a quick fact, exploring a novel problem, and checking your values require fundamentally different strategies. elfmem provides five retrieval frames, each a pre-configured scoring pipeline that weights similarity, confidence, recency, centrality, and reinforcement differently for the task at hand.
Related knowledge should surface together. If your agent knows "use Redis for caching" and "Redis requires careful memory management", retrieving one should surface the other, even if the query only matches the first. elfmem builds a knowledge graph where semantic edges form during consolidation and strengthen through co-retrieval.
Time should be meaningful. Wall-clock decay punishes agents for being idle. elfmem's session-aware clock means knowledge only decays during active use. Holidays and downtime don't kill what your agent has learned.
Built agent-first
elfmem is designed for the agent's one-shot loop: read, call, interpret, act. Every surface is optimised for non-human consumers.
- Every operation returns a typed result with
__str__(),.summary, and.to_dict() - Every exception carries a
.recoveryfield with the exact code or command to fix the problem guide()provides runtime self-documentation so the agent can teach itself the API- Duplicate
learn()returns a graceful reject, not an error. Emptydream()returns zero counts, not a crash - All reasoning (alignment scoring, contradiction detection, tag inference) uses official SDKs only:
anthropicandopenai, no third-party gateways
See it work
import asyncio
from elfmem import MemorySystem
async def main():
system = await MemorySystem.from_config("agent.db")
async with system.session():
# 1. Give your agent an identity
result = await system.setup(
identity="I am a backend engineer. I write clean, tested Python.",
values=["I prefer simple solutions over clever ones."],
)
print(result) # "Setup complete: 2/2 new blocks created."
# 2. Learn from experience (fast, no API calls)
result = await system.learn("Redis connection pooling: set max to 20 in production.")
print(result) # "Stored block a1b2c3d4. Status: created."
result = await system.learn("Deploy failed when pool size was left at default (10).")
print(result) # "Stored block e5f6g7h8. Status: created."
# 3. Consolidate: embed, deduplicate, detect contradictions, build graph
result = await system.dream()
print(result) # "Consolidated 2: 2 promoted, 0 deduped, 3 edges."
# 4. Recall through the right frame
identity = await system.frame("self")
print(identity) # "self frame: 2 blocks returned."
context = await system.frame("attention", query="Redis production config")
print(context) # "attention frame: 2 blocks returned."
for block in context.blocks:
print(f" [{block.score:.2f}] {block.content}")
# [0.87] Redis connection pooling: set max to 20 in production.
# [0.72] Deploy failed when pool size was left at default (10).
# 5. Signal what helped (this is where knowledge evolves)
block_ids = [b.id for b in context.blocks]
result = await system.outcome(block_ids, signal=0.85, source="deploy_fix")
print(result) # "Outcome recorded: 2 blocks updated (+0.042 avg confidence), 1 edges reinforced."
# 6. Check memory health
status = await system.status()
print(status)
# Session: active (0.1h) | Inbox: 0/10 | Active: 4 blocks | Health: good
# Tokens this session: LLM: 1,240 tokens (2 calls) | Embed: 680 tokens (3 calls)
# Suggestion: Memory is healthy.
asyncio.run(main())
Core concepts
SELF: persistent agent identity
result = await system.setup(
identity="I am a senior backend engineer. I write clean, tested Python.",
values=[
"I prefer simple solutions over clever ones.",
"I always explain my reasoning before giving recommendations.",
"I never skip error handling at system boundaries.",
],
)
print(result) # "Setup complete: 4/4 new blocks created."
# In any future session, your agent remembers who it is
identity = await system.frame("self")
print(identity.text)
# ## SELF - Agent Identity
# - I am a senior backend engineer. I write clean, tested Python.
# - I prefer simple solutions over clever ones.
# - I always explain my reasoning before giving recommendations.
# - I never skip error handling at system boundaries.
Identity blocks use permanent decay with a half-life of ~80,000 hours. They anchor the centre of the knowledge graph. Regular knowledge uses standard decay (~69 hours) and must be reinforced through use to survive.
| Decay tier | Half-life | Use case |
|---|---|---|
| Permanent | ~80,000 hours | Core identity, constitutional beliefs |
| Durable | ~693 hours | Stable preferences, learned values |
| Standard | ~69 hours | General knowledge |
| Ephemeral | ~14 hours | Session observations, temporary facts |
Six frames: retrieval shaped by intent
Each frame is a pre-configured scoring pipeline. The same knowledge scores differently depending on what the agent needs:
# "Who am I?" - weights confidence and reinforcement
identity = await system.frame("self")
# "What do I know about this?" - weights similarity and recency
context = await system.frame("attention", query="async error handling")
# "What should I do?" - balanced across all signals
plan = await system.frame("task", query="refactor the API layer")
# "What's the broader picture?" - weights similarity and graph centrality
background = await system.frame("world", query="Python best practices")
# "What just happened?" - weights recency above all
recent = await system.frame("short_term")
# "What would they do?" - inhabit another agent's perspective
perspective = await system.frame("simulate", query="how will the user react?")
Every block is scored across five dimensions:
Score = w_similarity * cosine_similarity(query, block)
+ w_confidence * block.confidence
+ w_recency * exp(-lambda * hours_since_reinforced)
+ w_centrality * normalized_weighted_degree(block)
+ w_reinforcement * log(1 + count) / log(1 + max_count)
The self frame heavily weights confidence and reinforcement, because identity is what you've consistently believed. The attention frame weights similarity and recency: what's relevant right now. The task frame balances everything for the goal at hand. The simulate frame uses score boosts to prioritise identity, mind models, and predictions — see below.
Theory of Mind: modelling other agents
elfmem can model other agents, users, or stakeholders as mind blocks — structured representations of their goals, beliefs, fears, and motivations. Attach falsifiable predictions to test your model, then close the loop with outcomes to calibrate.
# 1. Create a mind model
result = await system.mind_create(
subject="Alice",
goals=["Ship the API refactor by Friday"],
beliefs=["Microservices are overengineered for our scale"],
fears=["Breaking the mobile app integration"],
)
print(result) # "Stored block a1b2c3d4. Status: created."
# 2. Make a falsifiable prediction
pred = await system.mind_predict(
mind_block_id=result.block_id,
prediction="Alice will push back on splitting the monolith",
verify_at="2026-05-02",
reasoning="Her belief about microservices + fear of breaking mobile",
)
print(pred) # "Prediction d5e6f7g8 linked to mind a1b2…"
# 3. Retrieve through the simulate frame
perspective = await system.frame("simulate", query="how will Alice react to the proposal?")
# Returns: SELF blocks (10× boost), mind blocks (6×), predictions (5×)
# Grouped by role: Identity → Minds → Decisions → Context
# 4. Close the loop when the prediction resolves
outcome = await system.mind_outcome(
prediction_block_id=pred.prediction_block_id,
hit=True,
reason="Alice vetoed the split in Thursday's meeting, as predicted",
)
print(outcome) # "Prediction hit. Mind confidence: 0.50 → 0.58"
The simulate frame uses score boosts — per-category and per-tag multipliers applied during retrieval — to surface the most relevant identity and mind blocks:
| Boost target | Multiplier | Why |
|---|---|---|
tag:self/ prefix |
10× | Ground perspective in agent's own values |
mind category |
6× | Surface the mind model being simulated |
decision category |
5× | Surface linked predictions |
Mind blocks use DURABLE decay (~6 month half-life), so mental models persist across many sessions. Predictions are tracked as decision blocks linked via predicts edges. On outcome closure, validates edges are created and confidence is updated via Bayesian calibration.
Peer communication: agents that talk to each other
elfmem instances can exchange knowledge and messages. Pull-based, file-mediated, zero infrastructure. Each instance remains sovereign — it owns its blocks, shares selectively, and learns from exchanges through outcome closure.
# 1. Set your identity and register a peer
await system.peer_init("research-elf")
await system.peer_add("elf:trader", "Trading Elf")
# 2. Direct delivery: register with the peer's inbox path (no transport needed)
await system.peer_add(
"elf:vault", "Vault Elf",
delivery_path="/shared/vaults/elf_vault_proj/.elfmem/inbox",
)
# 3. Send a message (heartbeat speed, no LLM)
result = await system.peer_send("elf:vault", "What's your gilt view this week?")
print(result) # "Sent m_a1b2c3d4 to elf:vault → /shared/vaults/.../inbox/research-elf/"
# 4. Export shareable knowledge as a bundle
await system.export_blocks(share_level="public", output_path="knowledge.json")
# 5. Import knowledge from another instance (blocks enter inbox)
result = await system.import_blocks("peer_knowledge.json", from_peer="elf:trader")
print(result) # "Imported 12 blocks (3 skipped) from peer (elf:trader), 4 edges"
# 6. Check inbox for messages
inbox = await system.peer_inbox(import_all=True)
print(inbox) # "Found 2 messages from 1 peer(s). Imported 2, skipped 0."
# 7. Trust evolves through outcomes — no manual scoring needed
await system.outcome([imported_block_id], signal=0.9, source="gilt prediction confirmed")
# → Trust on elf:trader rises automatically
Routing: If a peer has a delivery_path, messages go directly to that directory using your identity slug as the subdirectory. Without it, messages go to your local outbox for manual transport. Self-federation (same identity across machines) uses --self-merge with trust 1.0.
Inbox/outbox location: Peer messaging is project-scoped. Your inbox is always <project>/.elfmem/inbox (and outbox <project>/.elfmem/outbox), derived from the project root (the directory containing .elfmem/config.yaml). Run elfmem setup once per project to initialise it; peer operations outside any project raise ProjectNotFound with a recovery hint.
Trust is outcome-driven: when peer-originated knowledge leads to good outcomes, trust rises. When it misleads, trust falls. Peer trust also decays slowly over inactivity (90 days), incentivising regular exchange.
# CLI equivalents
elfmem peer init research-elf
elfmem peer add elf:vault --name "Vault Elf" \
--delivery-path ~/shared/vaults/elf_vault_proj/.elfmem/inbox
elfmem peer send elf:vault "What's your view on UK gilts?"
elfmem peer inbox --import-all
elfmem peer list
elfmem export knowledge.json --share public
elfmem import peer_knowledge.json --from elf:trader
Three rhythms: learn, dream, curate
Every operation maps to one of three biological rhythms:
# HEARTBEAT - milliseconds, no API calls
# Call constantly. Fast inbox insert with content-hash deduplication.
await system.learn("Deploy failed: Redis connection timeout on staging.")
await system.learn("The fix was to increase the connection pool size to 20.")
# BREATHING - seconds, LLM-powered
# Call at natural pauses. Embeds, deduplicates, detects contradictions, builds graph edges.
if system.should_dream:
result = await system.dream()
print(result) # "Consolidated 2: 2 promoted, 0 deduped, 4 edges."
# SLEEP - minutes, maintenance
# Call on schedule. Archives decayed blocks, prunes weak edges, reinforces top knowledge.
result = await system.curate()
print(result) # "Curated: 2 archived, 3 edges pruned, 5 reinforced."
learn() is instant because it defers all expensive work to dream(). dream() does the heavy lifting (embedding, deduplication, contradiction detection, graph construction) in a single batch. curate() is the gardener: archiving what's faded, pruning weak connections, reinforcing what matters most.
Knowledge graph: connections that strengthen through use
When dream() processes blocks, it discovers semantic relationships and builds a knowledge graph. When blocks are co-retrieved across multiple sessions, those connections are further strengthened through Hebbian learning:
await system.learn("Use Redis for caching frequently accessed data.")
await system.learn("Redis requires careful memory management in production.")
await system.learn("Set maxmemory-policy to allkeys-lru for cache workloads.")
await system.dream()
# Retrieving one surfaces the others through graph expansion
context = await system.frame("attention", query="caching strategy")
for block in context.blocks:
expanded = " (via graph)" if block.was_expanded else ""
print(f" [{block.score:.2f}] {block.content}{expanded}")
# [0.91] Use Redis for caching frequently accessed data.
# [0.74] Set maxmemory-policy to allkeys-lru for cache workloads.
# [0.58] Redis requires careful memory management in production. (via graph)
The third block wasn't a direct match for "caching strategy", but it's connected to blocks that are. Graph expansion recovers related-but-not-similar knowledge that pure vector search misses.
Edges can also be created manually:
result = await system.connect(block_id_a, block_id_b, relation="contradicts")
print(result) # "Created contradicts edge: a1b2c3d4…→e5f6g7h8… (weight=0.50)."
Calibration: the feedback loop that makes memory evolve
This is the mechanism that turns elfmem from a store into a learning system. When your agent uses recalled knowledge, signal back whether it helped:
# 1. Recall before acting
context = await system.frame("attention", query="database migration strategy")
block_ids = [b.id for b in context.blocks]
# 2. Use the knowledge
response = generate_response(context.text, user_query)
# 3. Signal the outcome
result = await system.outcome(
block_ids,
signal=0.85, # 0.0 = harmful, 1.0 = perfect
source="migration_task",
)
print(result) # "Outcome recorded: 3 blocks updated (+0.042 avg confidence), 2 edges reinforced."
Blocks that guided good decisions get stronger. Blocks that misled get weaker. Edges between co-used blocks are reinforced. After a few sessions, the highest-scoring blocks are genuinely the most useful, not just the most similar.
| Signal | Meaning | When to use |
|---|---|---|
| 0.80 -- 0.95 | Guided successful work | Used it, outcome was good |
| 0.55 -- 0.70 | Relevant but not decisive | Informed thinking, didn't drive action |
| 0.40 -- 0.50 | Retrieved but not needed | Recalled, ignored |
| 0.10 -- 0.20 | Set wrong expectation | Relied on it, outcome contradicted it |
| 0.00 -- 0.10 | Caused failure | Followed its guidance, things broke |
Knowledge lifecycle
Every block follows the same path:
BIRTH → learn(): fast inbox insert, no API calls
GROWTH → dream(): embedded, scored, deduplicated, graph edges built
MATURITY → frame()/outcome(): reinforced on retrieval, confidence rises
DECAY → session-aware clock ticks; unused knowledge loses confidence
ARCHIVE → curate(): blocks below threshold archived, not deleted
Decay is session-aware: the clock only ticks during active use. Knowledge survives holidays and downtime. Reinforcement resets the decay clock. A single successful use can save a fading block.
How it compares
| Feature | elfmem | mem0 | LangChain Memory | Chroma/Weaviate |
|---|---|---|---|---|
| Infrastructure required | None (SQLite) | Postgres/Redis | In-memory | Vector DB server |
| Adaptive decay | Yes | No | No | No |
| Knowledge graph | Yes | No | No | No |
| Agent identity (SELF) | Yes | No | No | No |
| Contradiction detection | Yes | No | No | No |
| Feedback loop (outcome) | Yes | No | No | No |
| Session-aware clock | Yes | No | No | No |
| Theory of Mind | Yes | No | No | No |
| Peer communication | Yes | No | No | No |
| Automatic migration | Yes | No | No | No |
| Retrieval frames | 6 optimised | No | No | No |
| MCP native | Yes | No | No | No |
| Official SDKs only | Yes | No | Varies | No |
Installation
pip install elfmem # Python library only
pip install 'elfmem[cli]' # + CLI commands
pip install 'elfmem[tools]' # + CLI + MCP server (recommended)
pip install 'elfmem[viz]' # + interactive visualization dashboard
Or with uv:
uv add elfmem
uv add 'elfmem[tools]'
Requires Python 3.11+. Set your API keys:
export ANTHROPIC_API_KEY=sk-ant-... # for Claude (LLM reasoning)
export OPENAI_API_KEY=sk-... # for embeddings (text-embedding-3-small)
Both are needed for the default setup. See Local models for a fully local alternative with Ollama.
Three interfaces
MCP: for AI agents with MCP support
The fastest way to give Claude (or any MCP-compatible agent) persistent, evolving memory. Works with Claude Code, Claude Desktop, Cursor, VS Code + Cline, and any MCP host.
# One-time project setup (detects root, writes config, updates CLAUDE.md)
elfmem init
# Start the server (reads config from .elfmem/config.yaml)
elfmem serve
Add to your MCP config (e.g. ~/.claude.json):
{
"mcpServers": {
"elfmem": {
"command": "elfmem",
"args": ["serve", "--config", "/path/to/.elfmem/config.yaml"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-...",
"OPENAI_API_KEY": "sk-..."
}
}
}
}
Ten tools are exposed to the agent:
| Tool | Purpose |
|---|---|
elfmem_setup |
Bootstrap agent identity (run once) |
elfmem_remember |
Store knowledge for future retrieval |
elfmem_recall |
Retrieve relevant knowledge, rendered for prompt injection |
elfmem_outcome |
Signal how well recalled knowledge helped |
elfmem_dream |
Deep consolidation (embed, dedup, build graph) |
elfmem_curate |
Archive decayed blocks, prune weak edges |
elfmem_status |
Memory health snapshot |
elfmem_connect |
Create or strengthen an edge between two blocks |
elfmem_disconnect |
Remove an edge between two blocks |
elfmem_guide |
Runtime documentation for any tool |
CLI: for shell access
elfmem init # project setup
elfmem doctor # check config and health
elfmem remember "User prefers dark mode" --tags ui # store knowledge
elfmem recall "code style preferences" --json # retrieve knowledge
elfmem status # memory health
elfmem guide recall # runtime docs
Python library: for full control
See the examples throughout this README, the API reference below, and the complete agent implementations in examples/.
Project setup
elfmem init makes the CLI and MCP server project-aware. Idempotent and state-aware — safe to run anytime.
cd ~/projects/my-agent
elfmem init
What it does — state-aware (one verb, three behaviours selected by detection):
- Fresh install (no config / no DB / empty DB):
- Detects your project root (walks up to find
.git,pyproject.toml, etc.) - Creates
.elfmem/config.yamlwith project settings - Creates a database at
~/.elfmem/databases/{project-name}.db(outside the repo) - Seeds the constitutional cognitive loop (10 role-tagged blocks)
- Writes an elfmem section into
CLAUDE.md/AGENTS.md - Prints the MCP JSON snippet to paste into
~/.claude.json
- Detects your project root (walks up to find
- Established instance (config + populated DB): refresh-only mode. Reads
the live
.elfmem/config.yaml, re-renders the agent doc section from it (never from inferred defaults), runs the constitutional seed idempotently (no-op when role slots are filled), and applies any pending schema migration with a row-count-validated backup. The config and existing blocks are preserved. - Orphaned DB (configured DB is empty but a populated DB exists at a
neighbour path): refuses with a pointer to
elfmem rescue. No data loss.
After init, every elfmem command in that directory tree discovers config automatically.
Discovery chain
| Priority | Config | Database |
|---|---|---|
| 1 | --config PATH flag |
--db PATH flag |
| 2 | ELFMEM_CONFIG env var |
ELFMEM_DB env var |
| 3 | .elfmem/config.yaml (walk up from cwd) |
project.db in discovered config |
| 4 | ~/.elfmem/config.yaml |
~/.elfmem/agent.db (global fallback) |
Doctor
$ elfmem doctor
Config: /path/to/.elfmem/config.yaml [project-local (.elfmem/config.yaml)]
Database: /Users/you/.elfmem/databases/my-agent.db [project.db in config]
Project: my-agent
Agent doc: CLAUDE.md ✓ elfmem section found
MCP config: .claude.json ✓ elfmem entry found
Backups ✓ 2 backup(s), 1,240.0 KB total. Latest: my-agent.before-v2.20260430-120000.bak
Clean up with: rm ~/.elfmem/databases/*.bak
Schema migration and backups
elfmem databases migrate automatically when you upgrade. On first startup after an upgrade, elfmem detects schema changes, backs up your database, then applies the migration. Your data is never lost.
# Check migration status and backup health
elfmem doctor
# Create a manual backup (VACUUM INTO — clean, WAL-free copy)
elfmem backup
# Backups are created automatically before any schema migration
# Format: my-agent.before-v2.20260430-120000.bak
Backup files live alongside the database. elfmem doctor reports count and total size, and suggests cleanup when you have more than three backups.
Building agents with elfmem
Minimal agent
The simplest useful pattern: recall before acting, remember surprises.
from elfmem import MemorySystem
async def agent_turn(system: MemorySystem, user_message: str) -> str:
async with system.session():
context = await system.frame("attention", query=user_message)
response = await llm.complete(f"{context.text}\n\nUser: {user_message}")
if worth_remembering(response):
await system.learn(extract_knowledge(response))
return response
Full discipline loop
Memory only self-improves if the agent closes the feedback loop:
RECALL → EXPECT → ACT → OBSERVE → CALIBRATE → ENCODE
async def agent_turn(system: MemorySystem, user_message: str) -> str:
async with system.session():
# 1. Recall: get relevant knowledge
context = await system.frame("attention", query=user_message, top_k=5)
block_ids = [b.id for b in context.blocks]
# 2. Act: generate response with context
response = await llm.complete(f"{context.text}\n\nUser: {user_message}")
# 3. Calibrate: signal which blocks actually helped
await system.outcome(
block_ids,
signal=0.85, # 0.0 (harmful) → 1.0 (perfect)
source="used_in_response",
)
# 4. Encode: store transferable lessons
if response_was_surprising:
await system.learn(
"Expected X, observed Y. Lesson: <transferable insight>",
tags=["pattern/discovered"],
)
# 5. Consolidate at natural pauses
if system.should_dream:
await system.dream()
return response
Claude-powered agent with persistent memory
import anthropic
from elfmem import MemorySystem
client = anthropic.Anthropic()
async def coding_agent(system: MemorySystem, task: str) -> str:
async with system.session():
identity = await system.frame("self")
context = await system.frame("attention", query=task, top_k=5)
block_ids = [b.id for b in context.blocks]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=f"""{identity.text}
Relevant knowledge:
{context.text}""",
messages=[{"role": "user", "content": task}],
)
result = response.content[0].text
await system.outcome(block_ids, signal=0.85, source="coding_task")
await system.learn(f"Task: {task[:80]}. Approach: {result[:200]}", tags=["task/completed"])
if system.should_dream:
await system.dream()
return result
Reference implementations
examples/ contains two complete, tested agent implementations:
examples/calibrating_agent.py: Self-calibrating agent with session metrics, per-block verdict tracking, and session reflection. Tracks hit rate, surprise rate, and gap rate.
examples/decision_maker.py: Multi-frame decision maker. Synthesises SELF, TASK, and ATTENTION frames to choose between options, then calibrates from objective outcomes.
examples/agent_discipline.md: Copy-pasteable system prompt instructions at three tiers:
- Tier 1 (2 rules): Recall before acting, remember surprises.
- Tier 2 (6 rules): Adds frame selection, inline calibration.
- Tier 3 (12 rules): Full session lifecycle, metrics, and reflection.
Configuration
Zero config (just works)
system = await MemorySystem.from_config("agent.db")
# Uses claude-haiku-4-5-20251001 for LLM, text-embedding-3-small for embeddings
# Requires ANTHROPIC_API_KEY + OPENAI_API_KEY
YAML config file
# elfmem.yaml
llm:
model: "claude-sonnet-4-6"
contradiction_model: "claude-opus-4-6" # higher precision for contradictions
embeddings:
model: "text-embedding-3-small"
dimensions: 1536
memory:
inbox_threshold: 10
curate_interval_hours: 40
self_alignment_threshold: 0.70
prune_threshold: 0.05
system = await MemorySystem.from_config("agent.db", "elfmem.yaml")
Local models (no API key)
Run Ollama locally for a fully offline setup:
llm:
model: "llama3.2"
base_url: "http://localhost:11434/v1"
embeddings:
model: "nomic-embed-text"
dimensions: 768
base_url: "http://localhost:11434/v1"
ollama pull llama3.2
ollama pull nomic-embed-text
No API keys needed.
Any LLM provider
elfmem uses the official anthropic and openai SDKs. Any OpenAI-compatible API works with a base_url:
# OpenAI models
export OPENAI_API_KEY=sk-...
# config: llm.model: "gpt-4o-mini"
# Groq
export GROQ_API_KEY=...
# config: llm.model: "llama-3.1-70b-versatile", llm.base_url: "https://api.groq.com/openai/v1"
# Together, Fireworks, etc. - any OpenAI-compatible endpoint
Domain-specific prompts
Override the LLM prompts for specialised agents:
prompts:
process_block: |
You are evaluating a memory block for a medical AI assistant.
Only flag blocks as self-aligned if they relate to patient safety,
clinical evidence, or regulatory compliance.
## Agent Identity
{self_context}
## Memory Block
{block}
Respond with JSON: {"alignment_score": <float>, "tags": [<strings>], "summary": "<string>"}
valid_self_tags:
- "self/constitutional"
- "self/domain/oncology"
- "self/regulatory/hipaa"
Custom adapters
Implement the port protocols for full control:
from elfmem.ports.services import LLMService, EmbeddingService
class MyLLMService:
async def process_block(self, block: str, self_context: str) -> BlockAnalysis: ...
async def detect_contradiction(self, block_a: str, block_b: str) -> float: ...
class MyEmbeddingService:
async def embed(self, text: str) -> np.ndarray: ...
async def embed_batch(self, texts: list[str]) -> list[np.ndarray]: ...
system = MemorySystem(engine=engine, llm_service=MyLLMService(), embedding_service=MyEmbeddingService())
Visualization
Explore your knowledge graph with an interactive dashboard:
uv run scripts/visualise.py ~/.elfmem/agent.db # your database
uv run scripts/visualise.py ~/.elfmem/agent.db --archived # include archived blocks
uv run scripts/visualise.py # demo data
Dashboard panels:
- Knowledge Graph: Force-directed layout with zoom-dependent labels. Click nodes for detail. Toggle tiers and status with filter pills.
- Lifecycle Flow: Track blocks through inbox, active, and archived stages.
- Decay Curves: Half-lives by tier. Scatter plot shows blocks at risk of archival.
- Scoring Breakdown: Radar chart of frame weights across all five dimensions.
- Health Status: Consolidation suggestions and memory health.
Requires pip install 'elfmem[viz]'.
API reference
MemorySystem
# Factory
system = await MemorySystem.from_config(db_path, config=None)
system = await MemorySystem.from_env(db_path)
# Lifecycle context managers
async with MemorySystem.managed("agent.db") as system: # full lifecycle
...
async with system.session(): # session only
...
# Write
result = await system.learn(content, tags=None, category="knowledge")
# → LearnResult(block_id="a1b2...", status="created")
result = await system.remember(content, tags=None) # alias; also checks should_dream
# → LearnResult(block_id="c3d4...", status="created")
# Read
frame_result = await system.frame(name, query=None, top_k=5)
# → FrameResult(text="...", blocks=[ScoredBlock, ...], frame_name="attention")
blocks = await system.recall(query=None, top_k=5, frame="attention")
# → list[ScoredBlock] (raw, no rendering, no side effects)
# Feedback
result = await system.outcome(block_ids, signal, weight=1.0, source="")
# → OutcomeResult(blocks_updated=3, mean_confidence_delta=0.042, ...)
# Consolidation & maintenance
result = await system.dream() # consolidate inbox → active
# → ConsolidateResult(processed=5, promoted=5, deduplicated=0, edges_created=8)
result = await system.curate() # archive decayed, prune edges
# → CurateResult(archived=2, edges_pruned=3, reinforced=5)
# Identity
result = await system.setup(identity=None, values=None)
# → SetupResult(blocks_created=4, total_attempted=4)
# Graph
result = await system.connect(source, target, relation="similar")
# → ConnectResult(action="created", relation="similar", weight=0.50, ...)
result = await system.disconnect(source, target)
# → DisconnectResult(action="removed", ...)
# Theory of Mind
result = await system.mind_create(subject, goals=None, beliefs=None, fears=None, motivations=None)
# → LearnResult(block_id="...", status="created")
result = await system.mind_predict(mind_block_id, prediction, verify_at, reasoning=None)
# → MindPredictResult(prediction_block_id="...", mind_block_id="...")
result = await system.mind_list()
# → list[MindSummary(subject, block_id, confidence, prediction_count, hit_count, miss_count)]
result = await system.mind_show(mind_block_id)
# → MindShowResult(subject, block_id, content, predictions=[PredictionDetail, ...])
result = await system.mind_outcome(prediction_block_id, hit, reason)
# → MindOutcomeResult(prediction_id, hit, mind_block_id, new_confidence, ...)
# Peer communication
result = await system.peer_init(name)
# → str (identity DID)
result = await system.peer_add(did, name, *, is_self=False, delivery_path=None)
# → PeerInfo(did, name, trust, is_self, delivery_path, ...)
result = await system.peer_remove(did)
# → bool
peers = await system.peer_list()
# → list[PeerInfo]
result = await system.peer_trust(did, set_value=None)
# → PeerInfo (or updates trust when set_value given)
result = await system.peer_send(did, content, *, in_reply_to=None)
# → PeerSendResult(msg_id, to_peer, delivery_path)
result = await system.peer_inbox(*, from_peer=None, import_all=False)
# → PeerInboxResult(messages_found, messages_imported, messages_skipped, peers)
result = await system.export_blocks(*, share_level="public", output_path, min_confidence=0.3)
# → ExportResult(blocks_exported, edges_exported, output_path)
result = await system.import_blocks(path, *, from_peer=None, is_self_merge=False)
# → ImportResult(blocks_imported, blocks_skipped, edges_imported, from_peer)
# Introspection
status = await system.status()
# → SystemStatus(health="good", suggestion="Memory is healthy.", ...)
print(status)
# Session: active (1.2h) | Inbox: 0/10 | Active: 47 blocks | Health: good
# Tokens this session: LLM: 2,340 tokens (3 calls) | Embed: 1,200 tokens (5 calls)
# Suggestion: Memory is healthy.
text = system.guide() # overview of all operations
text = system.guide("learn") # detailed guide for one method
bool = system.should_dream # True when inbox needs consolidation
Return types
All result types implement __str__() (one-line summary), .summary (same), and .to_dict() (JSON-serialisable). All exceptions carry a .recovery field with the exact command or code to fix the problem.
LearnResult(block_id, status)
# status: "created" | "duplicate_rejected" | "near_duplicate_superseded"
FrameResult(text, blocks, frame_name, cached, edges_promoted)
# text: rendered prompt-ready string; blocks: list[ScoredBlock]
ScoredBlock(id, content, score, confidence, similarity, recency, centrality, reinforcement, tags, was_expanded)
ConsolidateResult(processed, promoted, deduplicated, edges_created)
CurateResult(archived, edges_pruned, reinforced, edges_decayed)
OutcomeResult(blocks_updated, mean_confidence_delta, edges_reinforced, blocks_penalized)
ConnectResult(action, source_id, target_id, relation, weight)
DisconnectResult(action, source_id, target_id)
SetupResult(blocks_created, total_attempted)
MindPredictResult(prediction_block_id, mind_block_id, edge_id)
MindShowResult(subject, block_id, content, confidence, predictions)
MindSummary(subject, block_id, confidence, prediction_count, hit_count, miss_count)
MindOutcomeResult(prediction_id, hit, mind_block_id, new_confidence, old_confidence)
PredictionDetail(block_id, content, status, hit, reason)
PeerInfo(did, name, trust, is_self, delivery_path, messages_in, messages_out, ...)
PeerSendResult(msg_id, to_peer, delivery_path)
PeerInboxResult(messages_found, messages_imported, messages_skipped, peers)
ExportResult(blocks_exported, edges_exported, output_path)
ImportResult(blocks_imported, blocks_skipped, edges_imported, from_peer)
SystemStatus(session_active, inbox_count, active_count, health, suggestion, session_tokens, lifetime_tokens)
TokenUsage(llm_input_tokens, llm_output_tokens, embedding_tokens, llm_calls, embedding_calls)
Architecture
src/elfmem/
├── api.py # MemorySystem: all public operations
├── config.py # ElfmemConfig: Pydantic configuration
├── project.py # Project root detection, config/DB discovery
├── mcp.py # FastMCP server: 10 agent tools
├── cli.py # Typer CLI
├── scoring.py # Composite scoring formula
├── types.py # Domain types: shared vocabulary
├── guide.py # AgentGuide: runtime documentation
├── exceptions.py # ElfmemError hierarchy with recovery hints
├── prompts.py # LLM prompt templates
├── session.py # Session lifecycle, active hours tracking
├── token_counter.py # Token usage accumulator
├── ports/
│ └── services.py # LLMService + EmbeddingService protocols
├── adapters/
│ ├── anthropic.py # Claude via official SDK
│ ├── openai.py # OpenAI + any compatible API
│ ├── factory.py # Adapter factory from config
│ └── mock.py # Deterministic mocks for testing
├── db/
│ ├── models.py # SQLAlchemy Core tables
│ ├── engine.py # Async engine factory
│ ├── migrate.py # Schema migration + backup utilities
│ └── queries.py # All database operations
├── memory/
│ ├── blocks.py # Block state, content hashing, decay tiers
│ ├── dedup.py # Near-duplicate detection and resolution
│ ├── graph.py # Centrality, expansion, edge reinforcement
│ └── retrieval.py # 4-stage hybrid retrieval pipeline
├── context/
│ ├── frames.py # Frame definitions, registry, cache
│ ├── rendering.py # Blocks → rendered text
│ └── contradiction.py # Contradiction suppression
└── operations/
├── learn.py # learn(): fast-path ingestion
├── consolidate.py # dream(): batch promotion
├── recall.py # recall(): retrieval + reinforcement
├── curate.py # curate(): maintenance
├── mind.py # mind_create/predict/list/show/outcome
└── peer.py # export, import, send, inbox, peer roster
Four layers, clear boundaries:
| Layer | Responsibility | Side effects |
|---|---|---|
Storage (db/) |
Tables, queries, engine | Database writes |
Memory (memory/) |
Blocks, dedup, graph, retrieval | None (pure) |
Context (context/) |
Frames, rendering, contradictions | None (pure) |
Operations (operations/) |
Orchestration, lifecycle | All side effects |
Development
git clone https://github.com/emson/elfmem.git
cd elfmem
uv sync --extra dev
uv run pytest # all tests (no API key needed)
uv run mypy src/elfmem/ # type checking
uv run ruff check src/ tests/ # lint
All tests run against deterministic mock services. No API keys, no network calls, fully reproducible.
from elfmem.adapters.mock import make_mock_llm, make_mock_embedding
llm = make_mock_llm(
alignment_overrides={"identity": 0.95},
tag_overrides={"identity": ["self/value"]},
)
embedding = make_mock_embedding(
similarity_overrides={
frozenset({"cats are great", "dogs are great"}): 0.85,
},
)
Design decisions
| Decision | Rationale |
|---|---|
| SQLAlchemy Core, not ORM | Bulk updates, embedding BLOBs, N+1 centrality queries |
| Session-aware decay | Knowledge survives holidays and downtime |
| Soft bias for identity | Everything is learned; self-aligned knowledge just survives longer |
| Retrieval is pure; reinforcement is separate | Clean read path / side effect separation |
| Calibration is opt-in | Useful without it; dramatically better with it |
| Official SDKs only | anthropic and openai packages, no third-party gateway |
| Mock-first testing | All logic verified without API keys |
Exceptions carry .recovery |
Every error tells the agent exactly what to do next |
Migrating between versions
elfmem ships a structured migration system for upgrading config drift across releases — env var renames, MCP launch-pattern changes, and project config updates. The flow is plan → review → apply, with backups and atomic writes throughout.
elfmem migrate status # one-line summary; exit 0 if clean
elfmem migrate plan # full diff per step (read-only)
elfmem migrate plan --json # structured plan for agents
elfmem migrate apply --dry-run # show what would happen
elfmem migrate apply # interactive: prompts to confirm
elfmem migrate apply --yes # non-interactive; for scripts and agents
elfmem migrate apply --id <step> # apply one specific step
Properties of the system:
- Idempotent — re-running after success is a no-op. Already-canonical
entries return
skipped. - Hash-gated — every step records the source file's SHA256 at plan
time; apply refuses if the file changed in between. Re-run
planto recover. - Atomic + backed up — each apply writes a
<file>.elfmem-bak-<step_id>-<timestamp>backup, then commits the new contents via tmp-file rename. Reverting is a singlemv. - Per-step granularity — agents can apply migrations one at a time. Per-step failure does not block other steps.
- Read-only by default —
statusandplannever write.applyprompts unless--yesis passed.
For agent invocation, elfmem migrate plan --json is the contract:
{
"elfmem_version": "0.12.0",
"pending_count": 1,
"steps": [{
"id": "mcp-elfmem@claude_code_config-2dacbee7",
"kind": "claude_mcp_config",
"summary": "Update 'elfmem' MCP entry: …",
"file": "/Users/.../claude_code_config.json",
"file_sha256": "e48877…",
"issues": ["renamed env var ELFMEM_CONFIG_PATH → ELFMEM_CONFIG", …],
"before": { … },
"after": { … },
"json_pointer": "/mcpServers/elfmem",
"reversible": true,
"post_apply_step": "Restart Claude Code so MCP servers reload.",
"apply_command": "elfmem migrate apply --id mcp-elfmem@… --yes"
}],
"next_action": "elfmem migrate apply --yes # apply all"
}
Per-version migration notes (env var renames, removed APIs, schema changes)
live in CHANGELOG.md under each release's ### Migration
heading. Database schema migrations run automatically on startup via
MemorySystem.from_config() — no manual step needed for those.
API stability
Stable (no breaking changes within 0.x):
MemorySystem public methods, all result types in elfmem.types, all exception types, ElfmemConfig, ConsolidationPolicy.
Internal (may change):
elfmem.operations.*, elfmem.memory.*, elfmem.db.*, elfmem.context.*, elfmem.adapters.*.
Embedding model lock-in: The embedding model is fixed on first use. Changing
embeddings.modelon an existing database raisesConfigError. Choose your embedding model before storing knowledge.
Contributing
Contributions welcome. Please read CONTRIBUTING.md before opening a PR.
- Bug reports / feature requests: GitHub Issues
- Design questions: GitHub Discussions
- Security: see SECURITY.md
- Updates and announcements: follow @emson on X
Changelog
See CHANGELOG.md.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file elfmem-0.13.2.tar.gz.
File metadata
- Download URL: elfmem-0.13.2.tar.gz
- Upload date:
- Size: 3.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd2a3c24e3edec16c89b110041ea3fec5a937ff9b556518968121b4841e070d0
|
|
| MD5 |
48aafd57ccc0f6c7c286afe91260601b
|
|
| BLAKE2b-256 |
21bf02802e348754f460048e8156c6a855daa9e9a51f6017e3b888838f1e1c94
|
File details
Details for the file elfmem-0.13.2-py3-none-any.whl.
File metadata
- Download URL: elfmem-0.13.2-py3-none-any.whl
- Upload date:
- Size: 222.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51ca18c575312096cce840f5cd29ec6de4aa4915577ecce2c7cb5f07b79113d8
|
|
| MD5 |
b27c03e653ae3847f86dbc0514afa408
|
|
| BLAKE2b-256 |
5120218f99763b7b801d30bd019557cf120b271cb74e2256b07cdf3609149cc6
|