Local-first semantic memory for AI agents — MLX embeddings + sqlite-vec, MCP server. No cloud, no API keys. Apple Silicon.
Project description
memo
Local-first semantic memory for AI agents — with time-travel, contradiction radar, and automatic synthesis.
memo gives any MCP-aware agent (Claude Code, Codex, Devin, OpenCode, Cursor, Cline, Continue, …) a long-term memory that runs entirely on your Mac. Each memory is a plain Markdown file; embeddings live in a single sqlite file; the LLM, embedder, and reranker run in-process via Apple MLX — no Ollama, no Qdrant, no cloud API, no keys. Your prompts and memories never leave the machine.
What makes memo different
| Capability | memo | mem0 | letta | cognee | engram |
|---|---|---|---|---|---|
| 100% local (no cloud API) | ✅ | ❌ | ⚠️ | ⚠️ | ✅ |
| Time-machine (rewind corpus to any date) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Contradiction radar (detect + resolve conflicts) | ✅ | ❌ | ❌ | ⚠️ | ❌ |
| Synthesis pipeline (auto-infer cross-cluster insights) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Cross-Mac git sync (shared corpus, no server) | ✅ | ❌ | ❌ | ❌ | ✅ |
| Cloud sync (opt-in replication) | ✅ | ❌ | ✅ | ❌ | ✅ |
| TUI (terminal UI) | ✅ | ❌ | ❌ | ❌ | ✅ |
| Obsidian as source-of-truth | ✅ | ❌ | ❌ | ❌ | ❌ |
| Knowledge graph + entity extraction | ✅ | ⚠️ | ⚠️ | ✅ | ❌ |
| Eval regression gate (pre-commit wireable) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Multi-modal (images, audio OCR) | ✅ | ⚠️ | ❌ | ❌ | ❌ |
| MCP surface profiles (token economy) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Passive capture (auto-extract from transcripts) | ✅ | ❌ | ❌ | ❌ | ✅ |
| Session timeline (context before/after) | ✅ | ❌ | ❌ | ❌ | ✅ |
Why it pays for itself — in tokens
memo is built to spend fewer tokens, not more.
- 92% smaller MCP surface. The default
agentprofile exposes 10 tools / ~1.2k schema tokens, versus 123 tools / ~15k tokens for the full surface — that overhead is paid every session, in every client. memo trims it to almost nothing. - Recall injects the answer instead of re-deriving it. Ambient recall surfaces the top memory before the agent answers, on a tight ~160-token budget. The agent stops re-explaining what it already figured out last week.
On a ~200-memory corpus, memo roi estimates ~80k tokens of model work avoided per session. The number is corpus-specific; it grows as memo learns more.
| Technique | How to enable | Typical saving |
|---|---|---|
| Compact recall format | export MEMO_RECALL_FORMAT=compact |
~65% per injection |
| Trivial prompt gate | On by default | ~25% fewer injections |
| Context file compression | memo compress-context CLAUDE.md |
30–40% smaller context |
Use cases
- Continuity across sessions. Decide "we use Postgres, not Mongo" today; tomorrow, in a fresh session, the agent recalls it on its own — recall injects the decision before it answers, so you never re-explain it.
- Shared memory across agents. Save something while working in Claude Code; Codex, Cursor, or Cline pick it up later. They all read the same local store over MCP.
- Memory that follows you across Macs. Start on the laptop, continue on the desktop. The corpus travels over serverless git sync and the agent starts with the same context on both.
- Preferences and conventions that stick. "Tests first", "commit messages in English", "don't touch the auth module" — say it once, the agent applies it every future session.
- Contradiction radar. Change your mind on an old decision and memo flags the now-stale version — the agent won't reintroduce what you already discarded.
- Time-machine / audit. "What did we know about this bug last month?" Rewind the corpus to any date and see the state of knowledge at that point.
- Instant project onboarding. A cold agent gets the project's durable decisions, facts, and preferences up front via the session-start briefing.
- Fewer tokens, not more. Instead of re-deriving what you solved last week, recall injects the answer on a tight budget — and the default MCP surface is ~10 tools, not ~120.
Requirements
- macOS on Apple Silicon (M1–M4) — MLX is the load-bearing piece and the only path with the reranker + LLM features (ask / synthesize / dream).
- Linux / Ubuntu / Intel Mac — supported as a standalone install via a CPU
sentence-transformersbackend (search + recall + save, no MLX). One command:pipx install "mlx-memo[cpu]". See docs/ubuntu.md for what works and the trade-offs. - ~8 GB free disk for the default model set (the installer downloads it).
- Optional: an Obsidian vault. Without one, memo defaults to
~/Documents/memo/.
Python ≥ 3.13 is required if you install without uv. The
curl | bashinstaller handles this automatically — it detectsuvand uses its managed Python if no system Python ≥ 3.13 is on PATH.
Install — one step
curl -fsSL https://raw.githubusercontent.com/jagoff/memo/master/install.sh | bash
The installer auto-detects uv (preferred) or falls back to pipx. It downloads MLX models, and wires memo into every agent client it finds (Claude Code, Codex, Devin, OpenCode, Windsurf).
Prefer a manual install? Any of these expose the same two binaries — memo (CLI) and memo-mcp (MCP server):
uv tool install mlx-memo # recommended
pipx install mlx-memo
brew tap jagoff/memo && brew install mlx-memo
Keep memo isolated as its own tool (uv tool / pipx / Homebrew). Don't vendor it inside another project's
.venv.memo doctor --strict-runtimeverifies the install.
First install downloads ~8 GB of MLX models (5–15 min); later installs hit the HuggingFace cache. Full installer knobs and "move to a new Mac" steps: docs/reference.md › Install.
Migrating from another Mac? Install first, then restore your corpus:
curl -fsSL https://raw.githubusercontent.com/jagoff/memo/master/install.sh | bash
memo sync bootstrap git@github.com:yourname/memo-sync.git # restore from git
Hand it to your agent
memo installs itself if you hand the repo (or just the install line) to an AI agent:
curl -fsSL https://raw.githubusercontent.com/jagoff/memo/master/install.sh | bash
memo doctor --strict-runtime # verify runtime is healthy
After install, tools surface as mcp__memo__memo_* (memo_save, memo_search, memo_ask, memo_get, memo_unified_briefing). Per-client setup (Claude Desktop, Cursor, Cline, Continue, manual JSON) is in docs/reference.md › MCP setup.
Quick start
memo doctor # self-check: models, vault, sqlite-vec
memo save 'MLX prefill ~30% faster than Ollama on M3 Max' --title 'MLX bench' -t mlx -t bench
memo search 'how fast was the MLX benchmark' # search by meaning, not just keywords
memo list --limit 5 # most recent
memo ask 'what changed in the embedder this month?' # RAG — cites memories by id
Core features
- Ambient recall — every prompt silently consults memory and injects top hits as context. Warm recall daemon keeps it under <200 ms. No
/remembercalls. - Auto-capture — a
Stophook extracts durable insights from each exchange through a quality gate. The corpus grows on its own. - Session briefing —
SessionStartsurfaces open loops, a memory of the day, and one-line crash recovery.
Key capabilities
🕰️ Time-machine
Rewind the corpus to any past date and query it as it was then:
memo as-of ask "what was the deployment strategy?" --date 2026-02-01
memo as-of search "redis config" --date 2026-01-15
memo diff --from 2026-01-01 --to 2026-03-01 # what changed
No other agent-memory system offers this. Full historical reconstruction via reverse-replay of history.db.
⚡ Contradiction radar
memo contradict scan # detect conflicting facts corpus-wide
memo contradict triage # resolve interactively: fuse / newer-wins / dismiss
The LLM classifies each candidate pair. Results persist in contradictions.db; resolved conflicts inform future saves.
🔮 Synthesis pipeline
memo synthesize # generate cross-cluster insights (LLM)
memo dream # nightly: signal gather → prune → orient
MEMO_SYNTHESIS_ENABLED=1 runs synthesis automatically during memo maintain.
🌐 Cross-Mac git sync
memo sync bootstrap git@github.com:yourname/memo-sync.git # wire a shared corpus
memo sync once # push/pull now
Pull-rebase-before-push. flock-based single owner per machine. Async debounced hooks keep the corpus current without blocking.
📚 Obsidian vault as source-of-truth
MEMO_MEMORIES_IN_VAULT=1 memo init # store memories inside your vault
memo migrate --into-vault # non-destructive migration
Human edits in Obsidian win on the next memo reindex. The sqlite index is always rebuildable from the .md files.
🕸️ Knowledge graph
memo graph neighbors "MLX" # what's related
memo graph path "embedder" "reranker" # how two concepts connect
memo entities # list extracted entities
memo links --id abc123 # backlinks + outlinks
Entity extraction uses a dependency-free regex backend; Codegraph integration provides fallback for code graphs.
🏥 Health scoring & eval gates
memo health # grounded rate, ROI, usefulness verdict
memo eval recall --labels eval/regression_labels.json --k 5
memo eval recall --gate # exit non-zero if precision drops
memo eval recall --update-baseline # snapshot current best
Wire --gate into a pre-commit hook to catch retrieval regressions before they ship.
🖼️ Multi-modal ingestion
memo ocr-image screenshot.png # macOS Vision OCR
memo multimodal add-image photo.jpg --title "whiteboard"
memo search "whiteboard diagram" # finds it
Daemons
memo runs four background daemons:
| Daemon | Command | Purpose |
|---|---|---|
| recall-daemon | memo recall-daemon start |
Warm MLX embedder over socket (<200 ms recall) |
| idle-daemon | auto-started by memo-mcp |
Auto-capture for MCP-only clients (Devin, OpenCode) |
| ingest-daemon | memo ingest-daemon start |
Bulk vault ingestion |
| maint-daemon | memo maint-daemon start |
Background cleanup + synthesis |
All 95 CLI commands
Click to expand
Core: save search ask get edit delete list
Recall & Hooks: recall recall-hook briefing continuity prewarm capture-tick capture-stop
Session & History: history as-of diff record-history session resume reflect mine-history
Maintenance: reindex maintain dream consolidate synthesize dedupe cross-dedup retier contradict temporal
Analysis & Quality: health stats doctor lint analytics eval roi token-savings usefulness gaps outcome profile
Knowledge Graph: graph entities entity extract-entities links version
Advanced Search: embed rerank contextual chat chat-ask multimodal repo
Import / Export / Sync: import export backup restore sync ingest
Visualization: tui dashboard map logs hook-log
Setup & Config: init config install-mcp install-watcher uninstall-watcher install-slash install-statusline install-shell-wrapper install-shims startup-banner migrate migrate-vault update watch mcp-command
Daemons: recall-daemon ingest-daemon maint-daemon embed-daemon
MCP surface profiles
| Profile | Tools | Schema tokens | Use when |
|---|---|---|---|
agent (default) |
10 | ~1.2k | Standard agent work — max token economy |
core |
30 | ~2.8k | Constrained clients (Codex, OpenCode) |
full |
123 | ~15k | Power users, debugging |
Set via MEMO_MCP_PROFILE=full or in each client's MCP env config.
Retrieval architecture
Hybrid search: vec leg (MLX embedding) + BM25 leg (FTS5/Tantivy, diacritic-folding for Spanish) fused via Reciprocal Rank Fusion → optional MLX cross-encoder rerank.
Markdown is the source of truth. The .md files are canonical; sqlite is a rebuildable index. A hand-edit in Obsidian wins on the next memo reindex. delete() removes the index first, then the file — no silent data loss.
Embedding models:
| Model | Dims | Disk | Use |
|---|---|---|---|
Qwen3-Embedding-0.6B-4bit |
1024 | ~0.4 GB | Default (fast, good) |
Qwen3-Embedding-4B-4bit |
2560 | ~2.5 GB | Higher recall quality |
Qwen3-Embedding-8B-4bit |
4096 | ~5 GB | Maximum quality |
Switch with MEMO_EMBEDDER_MODEL + MEMO_EMBEDDER_DIMS (requires memo reindex --rebuild).
Documentation
| Topic | Where |
|---|---|
| Full install detail, installer knobs, new-Mac migration | docs/reference.md › Install |
Per-client MCP setup + the /memo slash command |
docs/reference.md › MCP setup |
| All MCP tools reference | docs/reference.md › MCP tools |
| Ambient memory, recall daemon, capture & recall tuning | docs/reference.md › Ambient memory |
| Time-machine, session briefing, semantic map | docs/reference.md › Surfaces |
Full CLI reference + live dashboard (memo tui) |
docs/reference.md › CLI |
All MEMO_* flags, model profiles, upgrading the embedder |
docs/reference.md › Configuration |
| Architecture, sync tiers, design notes | docs/reference.md › Design & comparison |
Contributors: git clone https://github.com/jagoff/memo && cd memo && uv pip install -e '.[dev]'. See CONTRIBUTING.md.
License & provenance
MIT — see LICENSE. Forked philosophically from mem-vault (storage layout + frontmatter schema); the MLX backend pieces are ported from obsidian-rag. memo is one of three sovereign systems in a wider stack (Memflow, Synapse) — the integration is opt-in everywhere; single-Mac users see zero behaviour change.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_memo-2.6.3.tar.gz.
File metadata
- Download URL: mlx_memo-2.6.3.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ee98defd845e4f58d0074e4e5c0cfb2b208c37b76872393e7b65b13f7c6738e
|
|
| MD5 |
4f6a1fd836268abc7a2bf2d2d6a8ce80
|
|
| BLAKE2b-256 |
316341a53b4ac9b7d4d0be65cc037aca3a9986784b00397c5b4dbaaf32874234
|
Provenance
The following attestation bundles were made for mlx_memo-2.6.3.tar.gz:
Publisher:
publish.yml on jagoff/memo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlx_memo-2.6.3.tar.gz -
Subject digest:
7ee98defd845e4f58d0074e4e5c0cfb2b208c37b76872393e7b65b13f7c6738e - Sigstore transparency entry: 2024555397
- Sigstore integration time:
-
Permalink:
jagoff/memo@478f4d99fa603183dee149bfbf4cd05947cdec6e -
Branch / Tag:
refs/tags/v2.6.3 - Owner: https://github.com/jagoff
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@478f4d99fa603183dee149bfbf4cd05947cdec6e -
Trigger Event:
release
-
Statement type:
File details
Details for the file mlx_memo-2.6.3-py3-none-any.whl.
File metadata
- Download URL: mlx_memo-2.6.3-py3-none-any.whl
- Upload date:
- Size: 837.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d6ec2a3ef659a935556eb4501412e0cdca05bc03008c0b0d4c7af9b4fd2578c
|
|
| MD5 |
c9355b276daf4bb0b49fb412fc7a396e
|
|
| BLAKE2b-256 |
3efd71616987a4877b95f553ac1230518e258c872ebf8a4d409e96761f7d052f
|
Provenance
The following attestation bundles were made for mlx_memo-2.6.3-py3-none-any.whl:
Publisher:
publish.yml on jagoff/memo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlx_memo-2.6.3-py3-none-any.whl -
Subject digest:
2d6ec2a3ef659a935556eb4501412e0cdca05bc03008c0b0d4c7af9b4fd2578c - Sigstore transparency entry: 2024555637
- Sigstore integration time:
-
Permalink:
jagoff/memo@478f4d99fa603183dee149bfbf4cd05947cdec6e -
Branch / Tag:
refs/tags/v2.6.3 - Owner: https://github.com/jagoff
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@478f4d99fa603183dee149bfbf4cd05947cdec6e -
Trigger Event:
release
-
Statement type: