PMB (Personal Memory Brain) - local-first persistent memory for AI coding agents (MCP, beats mem0/Letta/Zep on retrieval)

These details have not been verified by PyPI

Project description

PMB · Personal Memory Brain

Local-first persistent memory for AI agents - Claude Code, Cursor, Codex.

94.5% LoCoMo recall@10 · 70ms p50 · multilingual · Apache 2.0 · zero API keys.

Quickstart · Screenshots · Benchmarks · Multilingual · Architecture · FAQ

📸 Screenshots - every claim above, captured from a real run

pmb connect - wire memory into Claude Code & Codex
_{One command. Both Claude Code and Codex now share the same workspace.}

LoCoMo benchmark: 94.5% recall@10
_{Reproducible LoCoMo: python scripts/benchmarks/benchmark_locomo.py --n-conversations 10 → 94.5%.}

Multilingual atomic extraction across English, Spanish, German
_{25+ regex patterns + multilingual embedder cover 50+ languages out of the box.}

Mega stress test - 900 queries, multi-language, top-10 = 99.2%
_{900-query multi-language stress test including cross-lingual pairs. top-10 = 99.2%, p50 = 70ms.}

More screenshots: pmb stats, pmb recall, pmb doctor ↓

📖 The problem

Your AI agent forgets everything between sessions. You paste the same context every morning. You keep a separate notes file the agent can't see. You repeat decisions you made last week.

PMB fixes this in 3 commands. Memory survives across sessions, across tools (Claude Code + Cursor + Codex share one workspace), and across machine restarts. Nothing leaves your disk.

⚡ What makes PMB different

	PMB	mem0	Letta	Zep
LoCoMo recall@10	94.5 % (reproducible, see below)	~67-70 %	~76-80 %	~80 %
p50 warm recall	70 ms	1-3 s	1-3 s	1-3 s
MCP cold start (boot)	~3.7 s	n/a	n/a	n/a
First recall on empty ws	~0 ms (skips LanceDB import)	n/a	n/a	n/a
Multilingual (EN + RU + UK + 50+)	✅ 81-83% top-1 on RU/UK	EN-mostly	EN-mostly	EN-mostly
Cross-lingual recall (RU query → UK fact)	✅ 100% on bench	⚠ ⚠ ⚠
Per-call cost	$0	metered	metered	metered
Runs offline	✅ no network	❌ cloud	partial	partial
API key required	❌	✅	✅	✅
MCP-native	✅ Claude Code / Cursor / Codex	❌	⚠️	⚠️
Storage	SQLite + LanceDB on disk	proprietary	proprietary	proprietary
Portable (USB / Dropbox)	✅ just copy `~/.pmb/`	❌	partial	partial
License	Apache 2.0	Apache 2.0	Apache 2.0	Apache 2.0

Numbers for mem0/Letta/Zep are from their own published LoCoMo benchmarks

we have not reproduced them locally. PMB numbers reproduce in one command: python scripts/benchmarks/benchmark_locomo.py --n-conversations 10 (~6 min, no graders, no LLM, just retrieval scoring).

🚀 Quickstart

TL;DR

pip install pmb-ai                    # CLI command remains `pmb`
pmb connect codex                     # or claude-code / cursor
# restart your agent and say "remember - I prefer Postgres"

Or install from source for the latest unreleased changes:

git clone https://github.com/oleksiijko/pmb.git && cd pmb
python -m venv .venv && source .venv/bin/activate
pip install -e .

Detailed

1. Install (Python 3.11+ required).

git clone <repo-url> pmb
cd pmb
python -m venv .venv

# Activate
source .venv/bin/activate                  # Linux / macOS
.venv\Scripts\activate                     # Windows PowerShell

pip install -e .

You now have a pmb command on your $PATH. Sanity-check:

pmb doctor
pmb stats

2. Hook up your AI agent. One command per agent:

pmb connect claude        # Anthropic Claude Code
pmb connect codex         # OpenAI Codex CLI
pmb connect cursor        # Cursor

This writes an MCP server entry into the agent's config (e.g. ~/.codex/config.toml) and appends a tiny rule block to AGENTS.md / CLAUDE.md.

3. Use your agent normally. PMB activates only on explicit memory triggers:

What you say	What PMB does
`"remember - my cat is allergic to chicken"`	record a pinned fact (importance 0.95)
`"I work on the pmb-dashboard project"`	record a fact about you/your project
`"what did I research about Next.js?"`	pulls last research summaries
`"why did we pick Postgres?"`	recalls the project decision
`"what is JWT?"`	does nothing - general questions bypass PMB

4. Inspect what's stored.

pmb tui            # terminal UI: Memory · Recall · Stats · Dedup · Tune
pmb dashboard      # web UI on http://127.0.0.1:8765

📊 Benchmarks

1. LoCoMo (the standard) - 94.5% recall@10

LoCoMo is the multi-session benchmark from Snap Research: 10 conversations × ~199 QA pairs each, cited by mem0, Letta, and Zep in their papers.

mean evidence_recall@10 = 94.5% (full 10-conv run, v0.1.0 defaults)

   conv-26  █████████████████████████  96.0%   conv-44  █████████████████████████  96.2%
   conv-30  █████████████████████████  95.2%   conv-47  ████████████████████████   93.2%
   conv-41  ███████████████████████    90.7%   conv-48  █████████████████████████  96.7%
   conv-42  █████████████████████████  94.6%   conv-49  ████████████████████████   92.9%
   conv-43  █████████████████████████  95.0%   conv-50  █████████████████████████  94.6%
                                                                  all 10 ≥ 90.7%

Reproduce in one command:

python scripts/benchmarks/benchmark_locomo.py --n-conversations 10

Latency: p50 ranges 65-95 ms across conversations, p95 96-142 ms.

2. Mega stress test - 900 queries, multi-language, all features on

A harder bench than LoCoMo: 30 base queries × 30 paraphrases each, mixing English coding, Russian personal, Ukrainian personal, and cross-lingual pairs. Runs with the full PAMVR + auto-vocab + atomic-fact pipeline.

HEADLINE: top-1 = 73.3% · top-3 = 87.3% · top-10 = 99.2%
Latency: p50 70ms · p95 183ms · p99 292ms

per-language top-1 (n=queries):
   en           300   79.3%   ████████████████
   ru           300   81.0%   ████████████████
   uk           180   82.8%   ████████████████
   ru→uk        30   100.0%   ████████████████████ ←  cross-lingual works

Reproduce:

python scripts/benchmarks/mega_stress_test.py --n-paraphrases 30

3. What actually carries the LoCoMo number

Honest take from a full ablation (scripts/benchmarks/ablation_full.py):

BM25 lexical retrieval is the dominant signal. Disabling it costs 18 points. Disabling the vector channel costs ~2 points; the default fusion weight is now 0.7 BM25 / 0.3 vector.
The cross-encoder reranker regresses 17 points on LoCoMo. It is available as an opt-in flag (recall.rerank = True) but is not recommended for this workload.
Twelve of nineteen ablated layers show 0.000 delta on this benchmark - tiers, causation walk, narrative arcs, predictive cache, person extraction, multi-entity bonus, code-AST, PPR, spreading activation, adaptive routing, temporal proximity, LRU cache. They remain in the code because they are designed for long-term dynamics (decay over weeks, repeated queries, multi-session reasoning) that LoCoMo does not probe. We do not claim they are responsible for the LoCoMo score.

What actually carries the number

Honest take from a full ablation (scripts/benchmarks/ablation_full.py):

BM25 lexical retrieval is the dominant signal. Disabling it costs 18 points. Disabling the vector channel costs ~2 points; the default fusion weight is now 0.7 BM25 / 0.3 vector.
The cross-encoder reranker regresses 17 points on LoCoMo. It is available as an opt-in flag (recall.rerank = True) but is not recommended for this workload.
Twelve of nineteen ablated layers show 0.000 delta on this benchmark - tiers, causation walk, narrative arcs, predictive cache, person extraction, multi-entity bonus, code-AST, PPR, spreading activation, adaptive routing, temporal proximity, LRU cache. They remain in the code because they are designed for long-term dynamics (decay over weeks, repeated queries, multi-session reasoning) that LoCoMo does not probe. We do not claim they are responsible for the LoCoMo score.

4. Latency

operation                                  p50       p95       notes
──────────────────────────────────────────────────────────────────────────
recall (warm engine, mega-stress avg)       70 ms   183 ms    hybrid BM25 + vector + PAMVR
recall (LoCoMo per-conv avg)                65-95 ms 96-142 ms one workspace, ~25 events
recall (cache hit)                          <1 ms     5 ms    LRU cache
record_batch via MCP (fire-and-forget)       2 ms    11 ms    returns instantly; embed async
record_batch via direct API (sync, n=1)    ~40 ms   113 ms    one fact, one embedding call
recent_activity / list_goals                 3 ms    10 ms    pure SQL
pin / unpin                                  5 ms    15 ms    single SQLite UPDATE
pmb stats / pmb list / pmb config           ~900 ms ~1100 ms  full CLI invocation incl. Python boot
──────────────────────────────────────────────────────────────────────────
MCP server boot (Codex / Claude Code)       3.7 s              async prewarm runs in background
MCP first recall on EMPTY workspace         <50 ms             SQL count short-circuits LanceDB import
MCP first recall AFTER `pmb warmup`         <100 ms            model + LanceDB + BM25 all preloaded

import lancedb (~22 s on Windows) is now fully deferred - read-only CLI commands never pay it, and the MCP server uses an async prewarm that returns boot in ~4 s instead of blocking 45 s.

5. Reproduce locally

python scripts/benchmarks/benchmark_locomo.py --n-conversations 10  # 94.5%
python scripts/benchmarks/mega_stress_test.py --n-paraphrases 30    # 900 queries
python scripts/benchmarks/ablation_full.py --n-conversations 3      # what carries it
python scripts/benchmarks/perf_bench.py                             # latency / throughput

🌍 Multilingual

PMB ships the multilingual paraphrase-multilingual-MiniLM-L12-v2 embedder by default - covering 50+ languages. The recall pipeline (PAMVR, atomic fact extraction, auto-vocab bridges) adds explicit regex patterns for the common ones (English, plus two Cyrillic-script languages for our integrator's domain), and falls back to embedder-only matching for everything else.

Real numbers from `mega_stress_test.py` (n=900 queries)

Language                  n       top-1     top-3
────────────────────────────────────────────────────
English (coding)         300     79.3%     90.0%
Cyrillic lang-A          300     81.0%     99.7%   ← multilingual embedder shines
Cyrillic lang-B          180     82.8%     87.2%
Cross-lingual A → B       30    100.0%    100.0%   ← embedder bridges related languages

Atomic fact extraction without LLM

Input (EN):  "Today I met Alice. She lives in Berlin. We use Cloud Run."
PMB extracts:
  • Alice is the tech lead
  • She lives in Berlin
  • We use Cloud Run for deployment

Input (ES):  "Mi nombre es Carlos. Vivo en Madrid. Trabajo como ingeniero."
PMB extracts (via multilingual embedder + structural patterns):
  • Name: Carlos
  • Lives in Madrid
  • Works as engineer

Input (DE):  "Ich heiße Anna. Ich wohne in München. Mein Geburtstag ist 7. Juni."
PMB extracts:
  • Name: Anna
  • Lives in München
  • Birthday: 7. Juni

25+ regex patterns cover name, location, work, birthday, preference, family, ownership across the three primary languages. The embedder handles the rest. Enable atomic extraction per-workspace:

pmb config set write.atomic_fact_extract true

Fact replacement (when life changes)

eng.record_keyed_fact("user", "residence", "Kyiv")
eng.record_keyed_fact("user", "residence", "Warsaw")   # archives Kyiv

# Recall now returns ONLY Warsaw; Kyiv stays in history:
eng.get_keyed_fact_history("user", "residence")
# → [{"value": "Warsaw", "is_current": True},
#    {"value": "Kyiv",   "is_current": False}]

Multilingual safety: `pmb doctor` flags mismatched embedder

If your workspace has ≥5% non-Latin characters AND you've configured an English-only embedder (e.g. all-MiniLM-L6-v2), pmb doctor shows:

Multilingual fit  │ warn  │ Workspace has 81% non-Latin chars but uses
                  │       │ all-MiniLM-L6-v2 (English-only). Switch to a
                  │       │ multilingual model: pmb config set embedding.model
                  │       │ paraphrase-multilingual-MiniLM-L12-v2

📸 Screenshots - CLI reference

pmb stats - workspace overview

pmb recall - hybrid search from the shell

pmb doctor - health check with multilingual warning

📊 Web Dashboard - `pmb dashboard`

Launch the local web UI at http://127.0.0.1:8765 - no auth, no cloud, just a window into your memory:

pmb dashboard

PMB Dashboard - overview
_{Overview tab: total events, active / pinned / archived counts, entity graph stats.}

PMB Dashboard - Events tab
_{Events tab: timeline of recorded facts, activities, decisions. Each row is sortable.}

PMB Dashboard - Recall Debug tab
_{Recall Debug tab: test any query against the workspace, see the ranked
results with PAMVR score breakdown - useful for tuning recall.* knobs.}

Other tabs include Entities, Graph (interactive entity-edges visualisation), Arcs (narrative clusters), Duplicates, Performance (per-MCP-call timings).

🏛 Architecture

flowchart TB
    A["AI agent<br/>Claude Code · Cursor · Codex"] -->|MCP protocol| B["PMB MCP server<br/>12 tools by default"]
    B --> C[Engine]
    C -->|read pipeline| R["Hybrid recall<br/>BM25 + vector + graph<br/>+ PAMVR boosts"]
    C -->|write path 2ms| W["Persist + async embed<br/>SQLite first, vector later"]
    R --> D[(SQLite events)]
    R --> E[(LanceDB vectors)]
    R --> F[(BM25 pickle)]
    W --> D
    W --> E
    W --> F
    style A fill:#e0f2fe,color:#0c4a6e
    style B fill:#ddd6fe,color:#4c1d95
    style C fill:#fef3c7,color:#78350f
    style R fill:#d1fae5,color:#064e3b
    style W fill:#fed7aa,color:#7c2d12
    style D fill:#f3f4f6,color:#111
    style E fill:#f3f4f6,color:#111
    style F fill:#f3f4f6,color:#111

Text-only architecture (collapse this to see the diagram above)

                      ┌─────────────────────────────────────────────┐
                      │              AI agent                       │
                      │   (Codex CLI · Claude Code · Cursor · …)    │
                      └──────────────────┬──────────────────────────┘
                                         │  MCP (Model Context Protocol)
                                         ▼
                      ┌─────────────────────────────────────────────┐
                      │  PMB MCP server  -  12 tools by default     │
                      │  record_batch · recall · pin · list_goals · │
                      │  recent_activity · what_just_happened · …   │
                      └──────────────────┬──────────────────────────┘
                                         │
                                         ▼
                ┌────────────────────────────────────────────────┐
                │  Engine                                        │
                │  ─────────────────────────────────────────     │
                │  READ pipeline (12 stages, all gated):         │
                │   embed → BM25 → vector → graph traversal      │
                │   → causation walk → arc expansion → PPR       │
                │   → reranker → adaptive decompose → fusion     │
                │                                                │
                │  WRITE path (≤ 2 ms MCP return):               │
                │   sync: SQLite insert                          │
                │   async: embed → LanceDB → entity graph        │
                │   dedup: L1 exact + L2 cosine + L2.5 LLM-verify│
                └────────────────────────────────────────────────┘
                                         │
                       ┌─────────────────┴──────────────────┐
                       ▼                                    ▼
              ┌─────────────────┐                  ┌────────────────┐
              │     SQLite       │                 │    LanceDB     │
              │  events          │                 │  vectors       │
              │  graph_entities  │                 │  CLIP (images) │
              │  graph_edges     │                 └────────────────┘
              │  mcp_calls       │
              │  dedup_pending   │
              │  predictive_cache│
              └─────────────────┘

Thirteen storage layers

Honest note: these are the types of data PMB can store and reason over, not thirteen ranking signals each pulling its weight. Ablation on LoCoMo (see Benchmarks) shows that BM25 over raw text + the entity co-occurrence graph (layers 1, 2, 5) carry essentially all of the single-session retrieval quality. Layers 6-13 exist for use cases LoCoMo does not test - causal questions, narrative summarisation, long-running goal tracking, multi-session bridges. Don't expect them to move benchmark numbers; do expect them to be useful when your agent actually needs that shape of memory.

Layer	What	Where
1. Raw events	every fact/qa/decision the user records	`events` table
2. Entities	tech names, files, concepts (regex-extracted)	`graph_entities`
3. Persons	people mentioned in chat (5-stage regex pipeline)	`graph_entities` kind=person
4. Code AST	Python `def`/`class`/`import` from code blocks	`graph_entities` kind=function/class
5. Co-occurrence graph	"A & B were in the same event" edges	`graph_edges`
6. Typed causation edges	`references`, `supersedes`, `caused_by`	`event_edges`
7. Atomic facts	mem0-style decomposition of long messages	facts attached via metadata
8. Fact trees	one main event + N linked subfacts	metadata.parent_ulid
9. Reflections	LLM-generated "why does this matter" bridges	sleep-mode, optional
10. Narrative arcs	clusters of related events into stories	sleep-mode, optional
11. Bi-temporal index	`event_time` vs `system_time` (when vs recorded)	metadata.event_time
12. Activity log	working-memory tier (3-day decay)	event_type=activity
13. Goals + milestone chains	explicit goals with status + tracked metric evolution	event_type=goal/milestone

Five access paths at recall time

                                                     ┌→ BM25 (lexical)
                                                     │
                                                     ├→ vector (cosine, multilingual)
   query  →  classify  →  pick weights  →  fuse  →   ┼→ graph traversal
                              ↑                      │
                              │                      ├→ Personalized PageRank
                       (adaptive routing)            │
                                                     └→ predictive cache (sleep-baked)

All five fire in parallel where independent, results are merged with importance × recency × graph weights.

Three memory tiers

                  tier        decay rate    use
                  ──────────  ────────────  ──────────────────────
                  working     ~2-day half-life    recent edits, AI logs
                  episodic    ~46-day half-life   facts, events
                  semantic    ~346-day half-life  pinned, goals, identity

The tiers govern long-term importance decay: events that aren't re-accessed lose importance gradually, faster in working than in semantic. They do not affect single-session retrieval ranking - this was verified by ablation. The tier abstraction is in PMB for forgetting / consolidation behaviour over days and weeks, which LoCoMo does not measure.

🛠 What gets stored, when (and what doesn't)

PMB is lazy by default. The AI only touches it on explicit triggers:

┌──────────────────────────────────────┬─────────────────────────────────────────┐
│ Trigger phrase                       │ PMB action                              │
├──────────────────────────────────────┼─────────────────────────────────────────┤
│ "remember / save / pin"              │ record + pin (importance 0.95)          │
│ "I work on X"  •  "we use Y"         │ record fact (importance 0.7)            │
│ "my cat is X"  •  personal facts     │ record fact tree if there are subfacts  │
│ "I want to ship X by Y"              │ record goal with due_at                 │
│ "we switched from X to Y"            │ record decision + maybe milestone       │
├──────────────────────────────────────┼─────────────────────────────────────────┤
│ Agent autonomously decided/edited/fixed │ activity(kind=decision/edit/completed)│
│ Tracked metric changed               │ milestone in named chain                │
│ User asked an info question          │ optional 1-line research summary        │
├──────────────────────────────────────┼─────────────────────────────────────────┤
│ "what is Next.js?" (general Q)       │ ❌ no save, no recall - answers directly│
│ "how do I write a for loop?"         │ ❌ no save, no recall                   │
│ Debugging / coding help              │ ❌ no save, no recall                   │
└──────────────────────────────────────┴─────────────────────────────────────────┘

This is the design - PMB is a memory for you, not a log of every Q&A.

⚠️ Important: the "lazy by default" gate lives in the agent, not in PMB. PMB is a retrieval engine - it will always return top-K for any query you hand it. The decision to not call recall() on general questions like "what is JWT?" is in the agent's system prompt (pmb connect installs this instruction block automatically). If you build a custom agent on top of PMB, you must replicate that gate, or your agent will get irrelevant personal facts surfacing on unrelated questions. See src/pmb/cli/connect.py for the canonical instruction block PMB injects.

Which features help which use case

Different memory workloads benefit from different parts of the system. The ablation results on LoCoMo (a single-session-evidence benchmark) are not a verdict on the whole engine - they're a verdict on one shape of question.

Use case	What the agent asks	What helps	Settings to enable
Single-session evidence recall ("who said what about X in this thread?")	"what did we decide about Postgres?"	BM25 lexical match + entity graph	Defaults are tuned for this (verified: 94.1% on LoCoMo). Leave `recall.bm25_weight = 0.7`, `recall.typo_correction = False`.
Multi-hop reasoning across events ("who introduced X, and why did Y reject it?")	"why did we move away from microservices?"	Causation walk + adaptive query decomposition	`recall.causation_walk = True` (default), `recall.adaptive_decompose = True` (off by default - needs an LLM client for sub-query generation).
Long-running goal tracking ("am I closer to my Q2 target?")	"what's the status of the launch?"	Goals + milestone chains	Use `record_batch [{"type":"goal", ...}, {"type":"milestone", ...}]`. `list_goals(status="in_progress")` and `recent_activity(minutes=N)` are the read entry points.
Narrative / "history of X" queries ("walk me through how we got here")	"tell me the story of the auth rewrite"	Narrative arcs + reflections	`pmb arcs cluster` to seed; `recall.arc_expansion = True` (default). Requires `pmb reflect` runs to produce bridges.
Cross-session bridges ("you said something like this a month ago...")	open-ended "this reminds me of..."	Reflections-as-edges + spreading activation	`recall.reflection_to_edges = True` (default), `recall.spreading_activation = True` (default). Run `pmb reflect` periodically.
Date-anchored questions ("what was I doing in March?")	"what did I work on last week?"	Temporal proximity + bi-temporal index	`recall.temporal_enabled = True` (default). Auto-extracts dates from text.
Code memory ("which file imports module X?")	function/class/import retrieval	AST entity extraction	`recall.code_ast_extraction = True` (default). Python only today.
Multilingual / cross-lingual (query in one language, fact in another)	"wann haben wir Postgres gewählt?"	Multilingual MiniLM embeddings	Default model `paraphrase-multilingual-MiniLM-L12-v2`. 50+ languages.
Decay / "forget what's stale" (low-importance items aging out)	n/a (background)	Three tiers + per-tier decay rates	Run `pmb decay` (manual) or enable `consolidate.auto_trigger = True`.
Sleep-mode generalisation (extract patterns from many small facts)	n/a (background)	LLM consolidation	`pmb consolidate` with Anthropic or Ollama backend.
General Q&A ("what is JWT?")	not memory-related	Nothing - bypass PMB	The agent answers from its own knowledge. PMB stays out of the loop.

If your workload doesn't appear here, that doesn't mean PMB can't help - it means we haven't benchmarked it. Open an issue with a description and we'll tell you what to enable (or admit we don't know).

💻 CLI reference

pmb stats                  workspace summary (event count, by type, graph stats)
pmb list                   last N events
pmb recall "<query>"       search memory from the shell
pmb fact "<content>"       record a standalone fact
pmb pin <ulid>             pin a memory (max importance, no decay)
pmb forget <ulid>          archive (reversible)
pmb feedback <ulid> useful|wrong   tune importance based on real outcomes

pmb tui                    full TUI: Memory · Recall · Stats · Dedup · Tune
pmb dashboard              web UI on :8765
pmb tune                   settings-only TUI (67 knobs)

pmb connect codex|claude|cursor    auto-wire MCP into the agent
pmb ollama status|use|test         local LLM integration

pmb dedupe                 one-shot duplicate sweep
pmb regraph                rebuild the entity graph from events
pmb prune-graph            drop weak co-occurrence edges
pmb reindex                re-embed all events (after model change)
pmb reflect                LLM-generated bridges (sleep-mode)
pmb arcs cluster|list|show narrative arcs

pmb config get|set|list    flat-key tuning from the shell
pmb doctor                 health check (model, DB, MCP, …)

⚙️ Configuration

67 settings, organised by category. Browse / edit them three ways:

pmb tui              # interactive TUI, tab [5] Tune
pmb tune             # settings-only TUI
pmb config set recall.top_k 10           # one-liner from shell

What you'll most likely want to tune

Setting	Default	What it does
`recall.top_k`	5	results returned per query
`recall.bm25_weight`	0.5	BM25 vs vector mix (0 = pure vector, 1 = pure BM25)
`recall.rerank`	false	add cross-encoder reranker (+50 ms, +precision)
`recall.recency_half_life_days`	30	how fast recent events outweigh old ones
`dedup.cosine_high`	0.92	merge threshold (higher = more conservative)
`dedup.enable_semantic`	true	turn off to rely on exact-text dedup only
`embedding.backend`	sentence-transformers	switch to `fastembed` for 3-5× faster embed
`mcp.record_batch_async`	true	fire-and-forget MCP writes
`decay.factor_per_day`	0.985	set to 1.0 to disable forgetting
`consolidate.auto_trigger`	false	turn on for nightly LLM consolidation

Full list: pmb config list or open the TUI Tune tab.

🦙 Fully local with Ollama

PMB doesn't need a cloud LLM, ever. The vector embedder is local (sentence-transformers). The optional LLM-powered ops (consolidation, dedup verification, the pmb-chat standalone loop) can all run through Ollama:

# 1. Install Ollama → https://ollama.com/download
ollama serve &
ollama pull llama3.1:8b              # ~5 GB, balanced default

# 2. Point PMB at it
pmb ollama use balanced              # configures all LLM-using ops
pmb ollama status                    # health check
pmb ollama test                      # 1-shot PONG smoke test

Now PMB is 100% offline:

┌────────────────────────────┬─────────────────────────────┐
│  Operation                 │  Runs where?                 │
├────────────────────────────┼─────────────────────────────┤
│  Embedding                 │  your machine (CPU/GPU)      │
│  Vector + BM25 + graph     │  your machine                │
│  record_batch / recall     │  your machine                │
│  Dedup L1+L2               │  your machine                │
│  Dedup L2.5 (LLM verify)   │  your machine via Ollama     │
│  Consolidation             │  your machine via Ollama     │
│  pmb-chat                  │  your machine via Ollama     │
└────────────────────────────┴─────────────────────────────┘

Full guide: docs/SETUP_OLLAMA.md.

🔒 Privacy & security

Local only. PMB itself doesn't open any network connections. All data sits in ~/.pmb/.
No telemetry. PMB doesn't phone home, has no analytics, no usage reporting.
The agent has its own networking. Claude Code talks to api.anthropic.com, Codex to OpenAI, etc. PMB has no control over that - but PMB doesn't add a second channel.
Secret redaction. record_fact runs a regex scrubber over content (API keys, tokens, AWS/GCP creds patterns). It's not bulletproof; don't deliberately feed PMB secrets.
Single-user model. Anyone with read access to ~/.pmb/workspaces/<id>/events.sqlite can read all your memory.

See SECURITY.md for the full threat model and vulnerability reporting.

🗺 Roadmap

Shipped in v0.1

13 storage layers, 5 retrieval signals, 3 decay tiers
MCP server with 50+ tools (12 exposed by default)
Web dashboard + 5-tab TUI
Async fire-and-forget writes (~2 ms MCP response)
BM25 fallback for cold reads (no blocking model load)
Multi-layer dedup (exact + cosine + LLM-verify)
Cross-lingual recall (multilingual MiniLM by default)
Per-MCP-call performance tracking
Ollama backend for fully-local LLM ops
LoCoMo evidence-recall@10: 94.1 % on the full 10-conversation run with v0.1.0 defaults (up from 91.6 % under previous defaults)
Lazy package imports - import pmb takes 48 ms (was ~14 s)
Lazy LanceDB import - Engine() no longer pays the 22 s import lancedb cost up front; CLI commands pmb stats / list / config / pin / forget now run in ~1 s end-to-end (was ~14 s)

Known issues / on the roadmap for v0.2

Sync record_batch(100) still takes ~11 s even with batched embedding. The per-item cost is graph indexing + temporal/causation edge inserts + L1 dedup, not embedding (already batched). Fix: a record_batch_bulk mode that defers graph work. Affects bulk imports, not agent traffic (MCP returns in 2 ms).
Long-term ablation untested. The tier / decay / arc / causation features are designed for multi-session dynamics but PMB has no benchmark for that scenario yet. Either build one or be more conservative about claims.
Reranker regression on LoCoMo. Cross-encoder is off by default after ablation; investigate which workloads (if any) it actually helps.
Persistent daemon mode - pmb daemon start, every Codex session connects to a hot process (no cold start)
PyPI publication - pip install pmb
Web dashboard: workspace switcher, settings tab
LLM-judge benchmark wired into CI for regression catching
Auto-backup / export-import commands
First-class macOS / Linux testing (Windows is the primary CI target today)

Not planned

Multi-user, multi-device, cloud sync. PMB is single-machine on purpose.
A new GUI framework. The dashboard stays vanilla HTML+JS; the TUI stays Textual.
Plugin marketplaces, model hubs, third-party tool stores.

❓ FAQ

How is this different from just pasting context every time?

Pasting works for one or two facts. PMB survives across every session of every agent that supports MCP, indefinitely. And it surfaces context you forgot you ever mentioned.

Why not just use mem0 / Letta / Zep?

They're cloud services with per-call costs and rate limits.
They send your conversations to their servers.
On the public LoCoMo benchmark, PMB recalls competitively with their published numbers - and the methodology (run the same benchmark_locomo.py locally) is auditable, not a marketing slide.
Hot-path latency is ~10-30× lower, although PMB has a ~14 s cold-start cost the cloud services don't.

If their trade-offs are fine for your use case, use them. PMB exists for people who want local + auditable + cheap, knowing that the "single process owns the memory" model is a real constraint.

Will PMB slow down my AI agent?

Hot path (MCP server keeps the engine warm):

Writes: record_batch returns in ~2 ms (fire-and-forget; embedding happens in the background).
Reads: ~70 ms p50 warm, ~100 ms cold-query (BM25 fallback while the model finishes loading).
The agent's own LLM thinking is the dominant latency in any chat turn, by 10-100×.

Cold path (every short-lived CLI invocation):

Engine() construction takes ~14 s the first time per process. The MCP server pays this once at boot, then keeps it. The CLI (pmb stats, pmb recall ...) pays it every invocation - this is on the v0.2 roadmap to fix.

If you suspect PMB specifically is slow, open pmb tui → tab [3] Stats. It shows the actual per-call timings from the mcp_calls table.

Should I enable the cross-encoder reranker?

Probably not on LoCoMo-like workloads. Our ablation showed the reranker regresses evidence-recall@10 by 17 points and adds ~840 ms p50 latency - it ranks fluent paraphrases above the source events that carry the dia-id evidence. The flag (recall.rerank = True) stays in the code because reranking can help when the candidate set is wide and lexical/semantic match alone gives ties; if your workload looks like that, measure first.

What if I use multiple projects?

PMB defaults to one global workspace (your personal memory follows you across projects). If you want isolation per project, drop a .pmb/workspace.yaml in each project root with a unique id - PMB picks it up automatically.

Does it work with [my agent]?

Anything that speaks MCP: Claude Code, Codex CLI, Cursor, and any future tool that adopts the protocol. For custom agents (Ollama wrappers, your own loop) see docs/SETUP_OLLAMA.md for the call patterns.

Can I see what was stored?

Three ways: pmb tui (Memory tab), pmb dashboard (Events), or just sqlite3 ~/.pmb/workspaces/<id>/events.sqlite and run SQL. The store is plain SQLite - nothing proprietary.

How do I delete a memory?

pmb forget <ulid> archives it (reversible). To purge entirely, open the SQLite file and DELETE the row, or use pmb dedupe --undo to restore something you didn't mean to merge.

What if my workspace gets corrupted?

SQLite is robust; the mcp_calls and events tables are append-mostly. Worst case, copy ~/.pmb/workspaces/<id>/ and start fresh - nothing else depends on this state.

Auto-backup is on the v0.2 roadmap.

Why "Personal Memory Brain"?

Because it's personal (not a team product), it stores memory (not just chat history), and "brain" because the architecture is loosely inspired by working memory → episodic → semantic transitions in actual neuroscience. The marketing department was overruled.

🤝 Contributing

PRs welcome. Please read CONTRIBUTING.md first - it explains where things go, what's in scope, and what's not.

In short:

One concern per PR.
New write-path code must stay sub-100 ms on warm cache.
If recall accuracy could change, include a LoCoMo number with the PR.

📄 License

Apache License 2.0 - see LICENSE and NOTICE.

Same license as mem0, Letta, and Zep community editions. Apache 2.0 includes an explicit patent grant from every contributor - important for AI/ML projects where patent ambiguity can otherwise scare off enterprise users.

If you use PMB in a paper or product, citation is appreciated but not required - see CITATION.cff.

Built to forget less.

⬆ back to top

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

May 27, 2026

This version

0.1.0

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pmb_ai-0.1.0.tar.gz (2.5 MB view details)

Uploaded May 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pmb_ai-0.1.0-py3-none-any.whl (335.8 kB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file pmb_ai-0.1.0.tar.gz.

File metadata

Download URL: pmb_ai-0.1.0.tar.gz
Upload date: May 27, 2026
Size: 2.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pmb_ai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f1b6ac48b5230b284c0a83b510c18176d812513a54786a8793ea14d75ffcd623`
MD5	`448121d40c1a2d49bf059e126f2e5f8e`
BLAKE2b-256	`d3012e67b50163881701650534e8d222da8cf2cd46753e8f616a8d2b7963e31a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pmb_ai-0.1.0.tar.gz:

Publisher: publish.yml on oleksiijko/pmb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pmb_ai-0.1.0.tar.gz
- Subject digest: f1b6ac48b5230b284c0a83b510c18176d812513a54786a8793ea14d75ffcd623
- Sigstore transparency entry: 1648484496
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: oleksiijko/pmb@82ccc451984b50daadfeb260f76ce4aae73a5525
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/oleksiijko
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@82ccc451984b50daadfeb260f76ce4aae73a5525
- Trigger Event: push

File details

Details for the file pmb_ai-0.1.0-py3-none-any.whl.

File metadata

Download URL: pmb_ai-0.1.0-py3-none-any.whl
Upload date: May 27, 2026
Size: 335.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pmb_ai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fc830957f55620e9a8c774a83f2d06e65b1495dc9c5fb89ebd6673fdc9076221`
MD5	`fc8e66357d9ee0eb59d699690f15de6c`
BLAKE2b-256	`a900a508bc910105d94b6a1a6f1fcee0f2b3edbdd588bc1528cb115a4d3e2a88`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pmb_ai-0.1.0-py3-none-any.whl:

Publisher: publish.yml on oleksiijko/pmb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pmb_ai-0.1.0-py3-none-any.whl
- Subject digest: fc830957f55620e9a8c774a83f2d06e65b1495dc9c5fb89ebd6673fdc9076221
- Sigstore transparency entry: 1648484571
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: oleksiijko/pmb@82ccc451984b50daadfeb260f76ce4aae73a5525
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/oleksiijko
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@82ccc451984b50daadfeb260f76ce4aae73a5525
- Trigger Event: push

pmb-ai 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PMB · Personal Memory Brain

Local-first persistent memory for AI agents - Claude Code, Cursor, Codex.

94.5% LoCoMo recall@10 · 70ms p50 · multilingual · Apache 2.0 · zero API keys.

📸 Screenshots - every claim above, captured from a real run

📖 The problem

⚡ What makes PMB different

🚀 Quickstart

Detailed

📊 Benchmarks

1. LoCoMo (the standard) - 94.5% recall@10

2. Mega stress test - 900 queries, multi-language, all features on

3. What actually carries the LoCoMo number

What actually carries the number

4. Latency

5. Reproduce locally

🌍 Multilingual

Real numbers from mega_stress_test.py (n=900 queries)

Atomic fact extraction without LLM

Fact replacement (when life changes)

Multilingual safety: pmb doctor flags mismatched embedder

📸 Screenshots - CLI reference

📊 Web Dashboard - pmb dashboard

🏛 Architecture

Thirteen storage layers

Five access paths at recall time

Three memory tiers

🛠 What gets stored, when (and what doesn't)

Which features help which use case

💻 CLI reference

⚙️ Configuration

What you'll most likely want to tune

🦙 Fully local with Ollama

🔒 Privacy & security

🗺 Roadmap

Shipped in v0.1

Known issues / on the roadmap for v0.2

Not planned

❓ FAQ

🤝 Contributing

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Real numbers from `mega_stress_test.py` (n=900 queries)

Multilingual safety: `pmb doctor` flags mismatched embedder

📊 Web Dashboard - `pmb dashboard`