Skip to main content

Local-first semantic memory for AI agents — MLX embeddings + sqlite-vec, MCP server. No cloud, no API keys. Apple Silicon.

Project description

memo 2.0

Local-first semantic memory for AI agents — with time-travel, contradiction radar, and automatic synthesis.

PyPI Python License: MIT MCP

memo gives any MCP-aware agent (Claude Code, Codex, Devin, OpenCode, Cursor, Cline, Continue, …) a long-term memory that runs entirely on your Mac. Each memory is a plain Markdown file; embeddings live in a single sqlite file; the LLM, embedder, and reranker run in-process via Apple MLX — no Ollama, no Qdrant, no cloud API, no keys. Your prompts and memories never leave the machine.

What makes 2.0 different

Capability memo 2.0 mem0 letta cognee
100% local (no cloud API) ⚠️ ⚠️
Time-machine (rewind corpus to any date)
Contradiction radar (detect + resolve conflicts) ⚠️
Synthesis pipeline (auto-infer cross-cluster insights)
Cross-Mac git sync (shared corpus, no server)
Obsidian as source-of-truth
Knowledge graph + entity extraction ⚠️ ⚠️
Eval regression gate (pre-commit wireable)
Multi-modal (images, audio OCR) ⚠️
MCP surface profiles (token economy)

Why it pays for itself — in tokens

memo is built to spend fewer tokens, not more.

  • 91% smaller MCP surface. The default agent profile exposes 10 tools / ~2.4k schema tokens, versus 122 tools / ~28k tokens for the full surface — that overhead is paid every session, in every client. memo trims it to almost nothing.
  • Recall injects the answer instead of re-deriving it. Ambient recall surfaces the top memory before the agent answers, on a tight ~160-token budget. The agent stops re-explaining what it already figured out last week.

On a ~200-memory corpus, memo roi estimates ~80k tokens of model work avoided per session. The number is corpus-specific; it grows as memo learns more.

Technique How to enable Typical saving
Compact recall format export MEMO_RECALL_FORMAT=compact ~65% per injection
Trivial prompt gate On by default ~25% fewer injections
Context file compression memo compress-context CLAUDE.md 30–40% smaller context

Requirements

  • macOS on Apple Silicon (M1–M4) — MLX is the load-bearing piece. memo does not run on Linux / Windows / Intel Macs.
  • ~8 GB free disk for the default model set (the installer downloads it).
  • Optional: an Obsidian vault. Without one, memo defaults to ~/Documents/memo/.

Python ≥ 3.13 is required if you install without uv. The curl | bash installer handles this automatically — it detects uv and uses its managed Python if no system Python ≥ 3.13 is on PATH.

Install — one step

curl -fsSL https://raw.githubusercontent.com/jagoff/memo/master/install.sh | bash

The installer auto-detects uv (preferred) or falls back to pipx. It downloads MLX models, and wires memo into every agent client it finds (Claude Code, Codex, Devin, OpenCode, Windsurf).

Prefer a manual install? Any of these expose the same two binaries — memo (CLI) and memo-mcp (MCP server):

uv tool install mlx-memo          # recommended
pipx install mlx-memo
brew tap jagoff/memo && brew install mlx-memo

Keep memo isolated as its own tool (uv tool / pipx / Homebrew). Don't vendor it inside another project's .venv. memo doctor --strict-runtime verifies the install.

First install downloads ~8 GB of MLX models (5–15 min); later installs hit the HuggingFace cache. Full installer knobs and "move to a new Mac" steps: docs/reference.md › Install.

Migrating from another Mac? Install first, then restore your corpus:

curl -fsSL https://raw.githubusercontent.com/jagoff/memo/master/install.sh | bash
memo sync bootstrap git@github.com:yourname/memo-sync.git   # restore from git

Hand it to your agent

memo installs itself if you hand the repo (or just the install line) to an AI agent:

curl -fsSL https://raw.githubusercontent.com/jagoff/memo/master/install.sh | bash
memo doctor --strict-runtime     # verify runtime is healthy

After install, tools surface as mcp__memo__memo_* (memo_save, memo_search, memo_ask, memo_get, memo_unified_briefing). Per-client setup (Claude Desktop, Cursor, Cline, Continue, manual JSON) is in docs/reference.md › MCP setup.

Quick start

memo doctor                                            # self-check: models, vault, sqlite-vec
memo save 'MLX prefill ~30% faster than Ollama on M3 Max' --title 'MLX bench' -t mlx -t bench
memo search 'how fast was the MLX benchmark'           # search by meaning, not just keywords
memo list --limit 5                                    # most recent
memo ask 'what changed in the embedder this month?'   # RAG — cites memories by id

Core features

  • Ambient recall — every prompt silently consults memory and injects top hits as context. Warm recall daemon keeps it under <200 ms. No /remember calls.
  • Auto-capture — a Stop hook extracts durable insights from each exchange through a quality gate. The corpus grows on its own.
  • Session briefingSessionStart surfaces open loops, a memory of the day, and one-line crash recovery.

What's new in 2.0

🕰️ Time-machine

Rewind the corpus to any past date and query it as it was then:

memo as-of ask "what was the deployment strategy?" --date 2026-02-01
memo as-of search "redis config" --date 2026-01-15
memo diff --from 2026-01-01 --to 2026-03-01    # what changed

No other agent-memory system offers this. Full historical reconstruction via reverse-replay of history.db.

⚡ Contradiction radar

memo contradict scan                  # detect conflicting facts corpus-wide
memo contradict triage                # resolve interactively: fuse / newer-wins / dismiss

The LLM classifies each candidate pair. Results persist in contradictions.db; resolved conflicts inform future saves.

🔮 Synthesis pipeline

memo synthesize                       # generate cross-cluster insights (LLM)
memo dream                            # nightly: signal gather → prune → orient

MEMO_SYNTHESIS_ENABLED=1 runs synthesis automatically during memo maintain.

🌐 Cross-Mac git sync

memo sync bootstrap git@github.com:yourname/memo-sync.git   # wire a shared corpus
memo sync once                                                # push/pull now

Pull-rebase-before-push. flock-based single owner per machine. Async debounced hooks keep the corpus current without blocking.

📚 Obsidian vault as source-of-truth

MEMO_MEMORIES_IN_VAULT=1 memo init                # store memories inside your vault
memo migrate --into-vault                          # non-destructive migration

Human edits in Obsidian win on the next memo reindex. The sqlite index is always rebuildable from the .md files.

🕸️ Knowledge graph

memo graph neighbors "MLX"             # what's related
memo graph path "embedder" "reranker"  # how two concepts connect
memo entities                          # list extracted entities
memo links --id abc123                 # backlinks + outlinks

Entity extraction uses a dependency-free regex backend; Graphify integration provides fallback for code graphs.

🏥 Health scoring & eval gates

memo health                                         # grounded rate, ROI, usefulness verdict
memo eval recall --labels eval/regression_labels.json --k 5
memo eval recall --gate                             # exit non-zero if precision drops
memo eval recall --update-baseline                  # snapshot current best

Wire --gate into a pre-commit hook to catch retrieval regressions before they ship.

🖼️ Multi-modal ingestion

memo ocr-image screenshot.png               # macOS Vision OCR
memo multimodal add-image photo.jpg --title "whiteboard"
memo search "whiteboard diagram"            # finds it

Daemons

memo runs four background daemons:

Daemon Command Purpose
recall-daemon memo recall-daemon start Warm MLX embedder over socket (<200 ms recall)
idle-daemon auto-started by memo-mcp Auto-capture for MCP-only clients (Devin, OpenCode)
ingest-daemon memo ingest-daemon start Bulk vault ingestion
maint-daemon memo maint-daemon start Background cleanup + synthesis

All 95 CLI commands

Click to expand

Core: save search ask get edit delete list

Recall & Hooks: recall recall-hook briefing continuity prewarm capture-tick capture-stop

Session & History: history as-of diff record-history session resume reflect mine-history

Maintenance: reindex maintain dream consolidate synthesize dedupe cross-dedup retier contradict temporal

Analysis & Quality: health stats doctor lint analytics eval roi token-savings usefulness gaps outcome profile

Knowledge Graph: graph entities entity extract-entities links version

Advanced Search: embed rerank contextual chat chat-ask multimodal repo

Import / Export / Sync: import export backup restore sync ingest

Visualization: tui dashboard map logs hook-log

Setup & Config: init config install-mcp install-watcher uninstall-watcher install-slash install-statusline install-shell-wrapper install-shims startup-banner migrate migrate-vault update watch mcp-command

Daemons: recall-daemon ingest-daemon maint-daemon embed-daemon

MCP surface profiles

Profile Tools Schema tokens Use when
agent (default) 10 ~2.4k Standard agent work — max token economy
core ~30 ~7.2k Constrained clients (Codex, OpenCode)
full 122 ~28k Power users, debugging

Set via MEMO_MCP_PROFILE=full or in each client's MCP env config.

Retrieval architecture

Hybrid search: vec leg (MLX embedding) + BM25 leg (FTS5/Tantivy, diacritic-folding for Spanish) fused via Reciprocal Rank Fusion → optional MLX cross-encoder rerank.

Markdown is the source of truth. The .md files are canonical; sqlite is a rebuildable index. A hand-edit in Obsidian wins on the next memo reindex. delete() removes the index first, then the file — no silent data loss.

Embedding models:

Model Dims Disk Use
Qwen3-Embedding-0.6B-4bit 1024 ~0.4 GB Default (fast, good)
Qwen3-Embedding-4B-4bit 2560 ~2.5 GB Higher recall quality
Qwen3-Embedding-8B-4bit 4096 ~5 GB Maximum quality

Switch with MEMO_EMBEDDER_MODEL + MEMO_EMBEDDER_DIMS (requires memo reindex --rebuild).

Documentation

Topic Where
Full install detail, installer knobs, new-Mac migration docs/reference.md › Install
Per-client MCP setup + the /memo slash command docs/reference.md › MCP setup
All MCP tools reference docs/reference.md › MCP tools
Ambient memory, recall daemon, capture & recall tuning docs/reference.md › Ambient memory
Time-machine, session briefing, semantic map docs/reference.md › Surfaces
Full CLI reference + live dashboard (memo tui) docs/reference.md › CLI
All MEMO_* flags, model profiles, upgrading the embedder docs/reference.md › Configuration
Architecture, sync tiers, design notes docs/reference.md › Design & comparison

Contributors: git clone https://github.com/jagoff/memo && cd memo && uv pip install -e '.[dev]'. See CONTRIBUTING.md.

License & provenance

MIT — see LICENSE. Forked philosophically from mem-vault (storage layout + frontmatter schema); the MLX backend pieces are ported from obsidian-rag. memo is one of three sovereign systems in a wider stack (Memflow, Synapse) — the integration is opt-in everywhere; single-Mac users see zero behaviour change.


Español

memo 2.0 es memory semántica persistente para agentes de IA: 100% local, sobre Apple Silicon con MLX. Cada memory es un archivo Markdown; los embeddings viven en un único sqlite; el LLM, el embedder y el reranker corren en proceso vía MLX — sin Ollama, sin nube, sin API keys. Tus prompts y memories nunca salen de la Mac.

Las novedades de 2.0: máquina del tiempo (memo as-of), radar de contradicciones (memo contradict), pipeline de síntesis (memo synthesize), sync cross-Mac vía git, vault de Obsidian como fuente de verdad, knowledge graph, puntuación de salud (memo health), gates de regresión de retrieval (memo eval --gate), e ingesta multi-modal (imágenes + OCR de audio).

Por qué ahorra tokens: la superficie MCP por defecto son 10 tools (~2.4k tokens) contra 122 (~28k) → ~91% menos contexto por sesión; y el recall inyecta la respuesta en vez de que el agente la vuelva a deducir. En un corpus de ~200 memories, memo roi estima ~80k tokens de trabajo del modelo evitados por sesión.

Instalación:

curl -fsSL https://raw.githubusercontent.com/jagoff/memo/master/install.sh | bash
memo doctor --strict-runtime    # verifica el runtime

Requisitos: macOS en Apple Silicon (M1–M4), ~8 GB de disco para los modelos. La documentación completa está en docs/reference.md.

Migrar desde otra Mac:

curl -fsSL https://raw.githubusercontent.com/jagoff/memo/master/install.sh | bash
memo sync bootstrap git@github.com:tuusuario/memo-sync.git

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_memo-2.3.3.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_memo-2.3.3-py3-none-any.whl (767.8 kB view details)

Uploaded Python 3

File details

Details for the file mlx_memo-2.3.3.tar.gz.

File metadata

  • Download URL: mlx_memo-2.3.3.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlx_memo-2.3.3.tar.gz
Algorithm Hash digest
SHA256 ca122132ab75bdb932993a2bfdc8d5465e6d8ffb34c2dd54e5ac097bd22cb87b
MD5 e34db4d720b7b3afb14728958990035f
BLAKE2b-256 3161c52146fc527d46cf1f202ec819e9942243d09dae38f6bf145b8a37fbc2da

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_memo-2.3.3.tar.gz:

Publisher: publish.yml on jagoff/memo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlx_memo-2.3.3-py3-none-any.whl.

File metadata

  • Download URL: mlx_memo-2.3.3-py3-none-any.whl
  • Upload date:
  • Size: 767.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlx_memo-2.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0b86c5858791374fed14ee5e94d822d38d392859859c853c17c129049511c20a
MD5 4d1df8c8a755626288cb94ee4d4d8f96
BLAKE2b-256 2afb296c675c776342e9a2fd5c30b5bf5639ae066ba7a8f2763a11f36f8022ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_memo-2.3.3-py3-none-any.whl:

Publisher: publish.yml on jagoff/memo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page