Skip to main content

THOUGHT — Temporal Hierarchical Object Union & Graph Hybrid Toolkit. A local MCP memory server with bi-temporal graph + vector + temporal layers, a query router, and a consolidation engine.

Project description

THOUGHT

PyPI Python License: MIT CI Docker GHCR

Temporal Hierarchical Object Union & Graph Hybrid Toolkit — a local MCP memory server that gives any LLM a persistent, auditable memory fabric on your own machine.

OB1 stores your thoughts. Karpathy's wiki compiles your knowledge. THOUGHT remembers with provenance, understands relationships, detects contradictions, never forgets what used to be true — and routes every query to the right mathematical structure before touching a single byte of data.


✨ New in v0.2 — Memory for AI coding agents

v0.2 specialises the same architecture for the workflow with the strongest natural fit: AI-assisted coding. THOUGHT now parses your source with tree-sitter, builds a real function-call graph as typed edges, and stamps every fact with its git commit. The bi-temporal as_of queries you already had now answer "what did the codebase look like at commit X?" for free.

thought ingest-code src/                      # tree-sitter ingest, multi-language
thought ingest-git . --mode full              # stamp every commit
thought callers GraphLayer.personalized_pagerank
#  #  score    type    entity                              file
#  1  0.0132   method  Dispatcher._dispatch_code           Dispatcher
#  2  0.0130   method  Dispatcher._dispatch_fact           Dispatcher
#  3  0.0122   method  CodeLayer.impact_set                CodeLayer
#  4  0.0110   method  CodeLayer.callers_of                CodeLayer
thought impact authenticate_user              # what's affected if I change this?
thought diff --from v1.0 --to HEAD            # set diff between two commits

Real measurement on this codebase: 38 files → 425 entities → 575 CALLS edges in <250 ms. The killer-demo query "who calls GraphLayer.personalized_pagerank" returns the four real callers ranked by Personalized PageRank in 60 ms on a 1086-edge graph.

What's new:

  • AST-aware ingest via tree-sitter — Python + TypeScript / JavaScript out of the box, multi-language plugin shape for the rest.
  • Function-call-graph edgesCALLS, IMPORTS, INHERITS_FROM, OVERRIDES, DEFINES as typed edges. The Graph Layer's HippoRAG-style PageRank then powers ranked callers / impact-set queries.
  • Git-history stamping — every entity carries code_commit_sha. thought diff --from <sha1> --to <sha2> returns the set difference of functions between two commits.
  • New Router CODE class — natural-language queries like "who calls authenticate_user" route through the call-graph machinery automatically.
  • 5 new CLI commands: ingest-code, ingest-git, callers, impact, diff.

See CHANGELOG.md for the full v0.2 list. The v0.1 horizontal-memory surface below is unchanged — v0.2 is purely additive.


What is THOUGHT?

THOUGHT is a memory server for LLMs. You install it on your machine, wire it into your AI coding assistant (Claude Code, Cursor, Cline, Continue, Windsurf), and now your assistant has a brain that persists across conversations and across projects.

Everything runs locally — your memory is a single SQLite file on your laptop. No cloud, no account, no sync service, no API key.

The problem it solves

Out of the box, AI coding assistants have goldfish memory. Every new conversation starts blank. If you told it last week to "always use Postgres for v2 features," you'll be telling it again today. If you decided in March that "the auth module is being rewritten," by April that context is gone.

Existing fixes don't really solve this — they trade one problem for another:

Common workaround What goes wrong
Stuff context into your system prompt You hit token limits fast, and the model can't tell what's current vs. obsolete.
Cloud memory (ChatGPT, Claude Projects) Locked to one vendor, no audit log, can't query "as of last week," no contradiction handling.
RAG over your notes (mem0, Letta, …) Stores facts as flat vectors. No relationships between facts, no time tracking, no provenance, no notion of "this used to be true."
An LLM-maintained Markdown wiki (Karpathy's gist) Lossy by design (the LLM summarises everything), grows linearly, no semantic search, no temporal queries.

THOUGHT fixes the structural issues, not just the symptoms.

What you get

Once installed, your AI assistant gains two new tools it can call automatically when the conversation implies it:

  • remember(content)"note that we decided X." THOUGHT extracts the entities and relationships, embeds them for similarity search, and links everything to its source so you can audit later.
  • recall(query)"what did we decide about X?" THOUGHT figures out what kind of question you asked, routes it to the right retrieval strategy, and returns at most 10 hits — each tagged with how trustworthy it is.

You can also drive it from your terminal (CLI) or use the Python API directly.

Why it's better than existing solutions

The TL;DR, in plain English:

  • It knows when facts changed. Every fact carries two timestamps: when it was true in the world, and when the system learned it. "What did we say about pricing on Jan 15?" actually works — even if pricing changed on Feb 3.

  • It tracks how facts relate. Functions, classes, people, projects, decisions — they're all entities in a typed graph (CALLS, OWNS, INHERITS_FROM, CONTRADICTS, …). Asking "who calls authenticate_user?" is a real graph query, not a fuzzy text match.

  • It refuses to hallucinate relationships. Every edge has a mandatory pointer back to the source document that produced it. If a fact has no source, it doesn't exist. No more "the model invented a connection that was never in the data."

  • It surfaces contradictions instead of silently overwriting. When you say "auth is now using sessions" after previously saying "auth is JWT," both facts stay. A CONTRADICTS edge is created. recall can then answer "what facts about auth are currently disputed?"

  • It picks the right retrieval method per question. Fuzzy associative queries hit vector similarity. Relationship queries hit graph traversal. Time-travel queries hit the temporal layer. The wrong question never hits the wrong index.

  • It bounds output. No matter how big the knowledge base gets, recall returns at most 10 hits. Your context window doesn't get blown up by a runaway retrieval.

  • It's append-only. Nothing is ever deleted. When facts go stale, they're retired (their validity window closes), not erased. Full forensic audit of every change.

  • It's natively multi-user. scope='shared' for project-wide facts, scope='private' with owner_id for personal notes. Five devs on one repo each get their own private memory plus a shared common pool.

Plus eleven cutting-edge retrieval techniques from 2024–2026 literature (Anthropic Contextual Retrieval, HippoRAG-style PageRank, bi-temporal Graphiti, CRAG, MetaRAG confidence, …) stacked on top — see the Frontier techniques section below for the full list with citations.

The technical capability matrix vs. the closest comparable systems:

OB1 (pgvector) Karpathy LLM-Wiki THOUGHT
Relationship logic flat rows flat markdown typed graph edges
Temporal awareness none none bi-temporal (world-time + learned-time)
Provenance informal tag informal citation mandatory source_ref on every edge
Multi-user RLS bolted on single-user native two-zone graph
Query routing always vector always inject VIBE / FACT / CHANGE / CODE / HYBRID router
Contradiction model absent LLM lint only CONTRADICTS typed edge, write-time
Bounded result size unbounded unbounded ≤10 enforced

What THOUGHT is not

  • Not a cloud service. Everything runs locally. No data leaves your machine.
  • Not a vector DB replacement. It uses one (sqlite-vec by default, pgvector optional), but adds the graph + temporal layers on top.
  • Not a fine-tuner. It doesn't change your model. It changes what your model can remember.
  • Not retrieval-quality magic. No single 10× win exists in 2024–2026 LLM-retrieval literature; THOUGHT compounds several 1.5-3× gains across orthogonal dimensions. Expect 2-3× better recall on questions that actually need the typed graph or temporal layer; expect roughly parity on pure-vibe semantic queries.

How to use THOUGHT

This section walks through everything from install to advanced workflows, with explanations of why each step exists. If you just want the 30-second version, skip to Quickstart.

Install

Three ways. Pick one:

# Option 1 (recommended) — full bundle, everything you'll use
pip install 'thought-mcp[all]'

# Option 2 — minimal: CLI + MCP server only (no production embeddings)
pip install thought-mcp

# Option 3 — zero install: uvx fetches it on demand
uvx thought-mcp install --client cursor

uvx is what the MCP client configs use internally, so option 3 is fine if you don't want a global install. After install, verify with:

thought doctor

You should see all green. Any red items will print the exact command to fix them.

Quickstart

The one-line happy path for connecting THOUGHT to your AI client:

thought start --client cursor   # or claude-code, cline, continue, windsurf

Then restart your AI client (close every window, reopen). Done. The next conversation will have the remember and recall tools available.

If you're not sure which client to pick, run thought install --detect first — it shows every supported client's config path and whether it's installed on your machine.

What thought start actually does

Knowing what changed makes troubleshooting easier later:

  1. Creates the SQLite database at .thought/thought.db in your current directory. This is your memory. Back it up like any database.
  2. Writes CLAUDE.md in your current directory. This tells your AI assistant how the memory tools work and when to use them. You can edit it to add project-specific conventions like "always tag finance decisions with scope=private."
  3. Writes thought.toml with sensible defaults. Most people never need to touch it.
  4. Updates your AI client's MCP config to register thought as a server. The previous config is backed up to <config>.thought.bak.
  5. Starts the MCP server in the background, listening on 127.0.0.1:8765.

After your AI client restarts, it discovers thought and gains the two new tools.

Day-to-day usage — letting your AI use the memory

Once wired up, your AI assistant calls remember and recall automatically when the conversation implies it. You usually don't need to invoke them explicitly. Here's what that looks like:

Telling it to remember something:

You:  We've decided to use Postgres for the v2 release.
AI:   Got it. [calls remember internally]
      Stored. We can recall this with "database plan" or similar later.

The AI parses out the facts (v2 → Postgres), links them to the conversation as the source, and writes them to memory. You don't see the tool call.

Asking about something it remembered:

You:  What's our database plan?
AI:   [calls recall internally]
      Based on what's in memory: for v2 we're using Postgres
      (source_grounded, recorded 3 days ago).

The AI also sees how trustworthy each hit is via its confidence_class (source_grounded, inferred, or hallucination_risk). If a hit comes back as hallucination_risk, a well-prompted AI will flag the uncertainty rather than presenting it as fact.

Time travel — what was true at some past date:

You:  What was our pricing on January 15?
AI:   [calls recall with as_of="2026-01-15"]
      On Jan 15: Free / Pro $19 / Team $99.
      Pricing changed on Feb 3 to Free / Pro $25 / Team $129, so the
      current prices are different.

This is THOUGHT's most distinctive capability. Existing memory tools structurally can't answer this — they overwrite old facts. THOUGHT keeps both, marks the older one retired, and as_of queries see the world as it was.

Catching contradictions automatically:

You:  Remember that auth is now using sessions.
AI:   Noted. [calls remember]
      Heads up: a previously-stored fact says "auth is JWT."
      Both are now in memory, linked by a CONTRADICTS edge.
      Want me to retire the JWT fact?

The AI sees the conflict at write time and prompts you. The JWT fact isn't deleted — it's marked as superseded but still queryable for audit.

Private vs. shared scope (multi-user / multi-project):

You:  Remember as a private note: I prefer 4-space indentation.
AI:   Stored in your private scope. Won't surface in shared recalls.

Use scope='private' for personal preferences. Use scope='shared' for project decisions everyone on the team should see. A shared recall returns public facts plus the requester's own private facts; never another user's.

How to nudge the AI when it doesn't reach for memory

If your AI is being lazy and skipping recall, try phrases like:

  • "According to memory..."
  • "What do we have on..."
  • "As of last week, ..."
  • "Check memory for..."
  • "@thought what's our..." (in clients that support tool-prefix syntax)

To insist on storing something:

  • "Note this down: ..."
  • "Remember that..."
  • "Store this for later: ..."
  • "Add to memory: ..."

The single highest-leverage thing is the CLAUDE.md file that thought init drops in your project. Edit it to add project-specific conventions. The AI reads it on every session start, so rules like "always remember architectural decisions, never remember code snippets" are honored consistently.

Working with code — the v0.2 capabilities

If you're using THOUGHT for AI-assisted coding (the v0.2 specialisation), there's a separate ingest path that parses your source files via tree-sitter and builds a real function-call graph:

# Ingest a codebase — entities are functions / classes / methods / modules
thought ingest-code src/

# Ingest with full git history so as_of queries work for code
thought ingest-git . --mode full

# Ask who calls a function (ranked by importance)
thought callers authenticate_user

# Ask what's affected if you change a function
thought impact authenticate_user

# Show the set difference of entities between two commits
thought diff --from v1.0 --to HEAD

After ingestion, the AI's regular recall tool also gains code awareness. Natural-language questions like "who calls authenticate_user?" route through the call-graph machinery automatically.

Using the CLI directly (no AI involved)

THOUGHT works fine from a terminal without an AI assistant. The CLI is most useful for bulk operations and inspection:

# Add a single fact
thought ingest "Alice owns Acme Corp. Acme is part of HoldCo."

# Bulk-ingest a directory of Markdown notes
thought ingest --glob 'docs/**/*.md'

# Pipe in from any tool that emits one fact per line
git log --since='1 week ago' --format='%s' | thought ingest --stdin

# Query directly
thought recall "who owns Acme"

# Open an interactive REPL — type queries, type +text to add facts
thought repl

# See what's currently in memory
thought stats

# Soft-delete entities matching a SQL LIKE pattern (audit-logged, not destroyed)
thought forget "kendra%"

Upgrading

When a new version of THOUGHT ships:

pip install --upgrade thought-mcp     # pull the new package
thought upgrade --all                 # re-pin every MCP client config to the new version
# Restart your AI client to pick up the new server.

thought upgrade --all solves the "uvx is still using its cached old version" problem by re-pinning your MCP client configs to the exact version you just installed (with the required extras included).

MCP client config paths (manual install if --detect can't find your client)

If thought install --detect doesn't find a client you have installed, the JSON block to add manually is:

{
  "mcpServers": {
    "thought": {
      "command": "uvx",
      "args": ["--from", "thought-mcp[mcp,sqlite-vec]", "thought", "serve"]
    }
  }
}

Per-client locations:

  • Claude Code~/.claude.json (top-level mcpServers block)
  • Cursor~/.cursor/mcp.json
  • Cline — VS Code globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json (or ~/.cline/cline_mcp_settings.json)
  • Continue~/.continue/config.json
  • Windsurf~/.codeium/windsurf/mcp_config.json

Standing on the shoulders of

THOUGHT exists because of:

Frontier techniques incorporated (with credits)

# Technique Source
1 Contextual Retrieval — LLM-generated chunk context prepended before embedding Anthropic, Sept 2024
2 HippoRAG 2 — Personalized PageRank memory Gutiérrez et al., NeurIPS 2024 (repo)
3 Bi-temporal Graphiti — separate valid-time and transaction-time Zep, arXiv 2501.13956 (repo)
4 Atomic fact decomposition + Jaccard dedup Wanner et al., 2024
5 BGE-M3 hybrid embeddings (sparse + dense + ColBERT) BAAI
6 Matryoshka two-pass retrieval Kusupati et al.; OpenAI text-embedding-3
7 CRAG (Corrective RAG) — retrieval evaluator + fallback Yan et al., 2024
8 MetaRAG epistemic uncertaintyconfidence_class per hit arXiv 2504.14045
9 Ebbinghaus decay scoring — strength × e^(-λ·days) × recall-boost @sachitrafa/YourMemory
10 Context-engineering budget per query class Karpathy & community, 2025
11 Append-only writes (Mem0 2026) — never UPDATE/DELETE Mem0 State of Memory 2026

Built on: MCP Python SDK (@modelcontextprotocol), sqlite-vec (Alex Garcia), pgvector (Andrew Kane), Pydantic, Typer, structlog. spaCy (Explosion AI) is an optional extra.


Architecture

   Claude Code · Cursor · Cline · Continue · Windsurf
   ┬───────────────────────────────────────────────────
   │                  (auto-wired by `thought install`)
   ▼
┌──────────────────────────────────────────────────────────────────┐
│         MCP server  (Streamable HTTP · async handlers)           │
│            remember(content, ...)    recall(query, ...)          │
└──────────────────────────┬───────────────────────────────────────┘
                           │
                           ▼
              ┌───────────────────────────┐    LRU recall cache
              │          Router           │    (write-version keyed)
              │  VIBE  FACT  CHANGE  HYBRID│  ↳ rules.yaml (user-editable)
              │  + CRAG confidence eval   │
              └───────────┬───────────────┘
              ┌───────────┼───────────────┐
              ▼           ▼               ▼
      ┌─────────────┐ ┌──────────┐ ┌────────────┐
      │  Vector L.  │ │ Graph L. │ │ Temporal L.│
      │ Matryoshka  │ │ HippoRAG │ │ bi-temporal│
      │  + GraphRAG │ │ PPR (+   │ │  as_of     │
      │  + sqlite-  │ │ scipy.   │ │ (valid +   │
      │  vec MATCH  │ │ sparse + │ │  learned)  │
      │             │ │ local    │ │            │
      │             │ │ push)    │ │            │
      └──────┬──────┘ └────┬─────┘ └─────┬──────┘
             │             │              │
             ▼             ▼              ▼
        ┌───────────────────────────────────────┐
        │      StorageBackend (ABC)             │
        │  SQLite + sqlite-vec  |  pgvector     │
        │  sources · entities · edges · triples │
        │  embeddings · strength_cache · log    │
        │  + bulk source-provenance JOIN        │
        │  + touch-access flush queue           │
        └──────────────┬────────────────────────┘
                       │
                       ▼
         ┌─────────────────────────┐
         │  Consolidation Engine   │  background thread
         │  Ebbinghaus · cold/warm │  + `thought consolidate` CLI
         │  · dedup · audit log    │
         └─────────────────────────┘

Bi-temporal axis: every entity and edge tracks (valid_from, valid_until) (world-time) and (learned_at, unlearned_at) (transaction-time). "What did we know about X on date Y" and "what was true about X on date Y" are different queries; THOUGHT answers both via recall(..., as_of=Y, as_of_kind='valid' | 'learned').


What makes THOUGHT qualitatively different

These are capabilities neither OB1 nor the Karpathy wiki structurally supports — adding them would require rewriting their data layer:

  • recall(query, as_of=<past>) returns the world as it was, not as it is.
  • Every hit carries confidence_class ∈ {source_grounded, inferred, hallucination_risk} so the LLM knows what to trust.
  • Contradictions are first-class dataCONTRADICTS typed edge with detected_at and confidence_score, queryable, not LLM lint notes.
  • Multi-user scope is structural(scope, owner_id) filter at the storage layer, inherited by every retrieval path.
  • All writes are append-only. Supersession is a new edge plus a valid_until close, never an UPDATE/DELETE — full forensic audit is guaranteed.
  • The query router classifies before searching — wrong question never hits the wrong index.

Measured results

These numbers come from tests/comparison/run.py — same workload, same deterministic embedder, three architectures. Reproducible: python -m tests.comparison.run.

Recall@10 by query class

System VIBE FACT CHANGE HYBRID overall
THOUGHT 100% 100% 68% 66% 83.5%
OB1 100% 100% 32% 100% 83.0%
Karpathy wiki 100% 30% 0% 100% 57.5%

THOUGHT and OB1 tie on overall recall@10, but the CHANGE column (68% vs 32%) is the headline number — THOUGHT is 2.1× more accurate on the queries where temporal correctness matters. Karpathy wiki is 0% on temporal: it has no notion of time.

Temporal correctness on CHANGE queries (strict — penalizes returning contemporary answer for historical query)

System rate
THOUGHT 68%
OB1 32%
Karpathy wiki 0%

Contradictions detected at write-time

System count
THOUGHT 2
OB1 0
Karpathy wiki 0

Ablation — marginal contribution of each frontier technique

(From python -m tests.comparison.ablationdocs/ablation.md)

Variant Overall FACT CHANGE HYBRID
Full v0.1 (all Tier A) 83.5% 100% 68% 66%
− HippoRAG bidirectional PPR 66.0% 30% 68% 66%
− Bi-temporal edge retirement 75.0% 100% 34% 66%
− Query router (force VIBE) 65.5% 30% 32% 100%

Each disabled technique costs THOUGHT real measurable accuracy on the dimension it was added to improve. HippoRAG is worth +70pp on FACT queries; bi-temporal supersession is worth +34pp on CHANGE; the router is worth +35pp overall.

Performance

THOUGHT went through three performance passes. Each one targeted the bottleneck the previous one exposed.

v0.2 pass — architectural (sqlite-vec + scipy.sparse + local push PPR):

  1. sqlite-vec C/SIMD MATCH for vector ANN (was Python brute-force over the embeddings table).
  2. Binary sign-quantized index mirror (Charikar 2002 LSH) for dense embeddings — opt-in via use_binary_quantization=True; another ~8-16× over the float path on production models.
  3. scipy.sparse vectorised Personalized PageRank — one CSR matvec per iteration in place of the dict-of-lists power loop.
  4. Andersen-Chung-Lang local push PPR (2006) — ε-approximate PPR touching only O(1/(ε·(1−α))) nodes, automatically used when the in-scope KB exceeds 5k entities.

v0.3 pass — system + UX: 5. Batched ingest — all writes from one remember() in one transaction; remember_many() batches across N items in one transaction with one embed_many call → 2-4× ingest throughput. 6. LRU recall cache keyed by (write_version, query, ...) — repeat queries become µs-scale (~130,000× over cold-recall p50). 7. Touch-access batched flush queue — eliminates the per-hit UPDATE on the recall hot path, batches into one executemany periodically. 8. PPR transition-matrix cache with write_version invalidation — repeat FACT recalls skip the COO→CSR matrix rebuild entirely. 9. One-query bulk source-provenance fetch — replaced N+M roundtrips (edges_to per hit + SELECT per source) with a single JOIN. 10. WAL tuning — 64 MiB page cache, 256 MiB mmap, synchronous=NORMAL, busy_timeout=5s. 11. Async MCP tool handlersasyncio.to_thread lets the Streamable HTTP transport service concurrent recalls.

Measured progression

Same workload (Entity{i} owns Company{i%50} Corp.), same Windows laptop, deterministic embedder, 30 unique queries (no cache hits) for cold recall measurement:

KB size v0.1 recall p50 v0.2 recall p50 v0.3 recall p50 v0.3 ingest (bulk) v0.3 cache-hit p50
1,000 50.3 ms 12.3 ms 8.5 ms 0.67 s 0.7 µs
5,000 261.6 ms 42.5 ms 37.8 ms 3.73 s 0.7 µs
10,000 521.4 ms 61.6 ms 93.6 ms¹ 7.47 s 0.7 µs
25,000 ~1,300 ms² 171.8 ms 186.0 ms 17.18 s 0.7 µs

¹ v0.3 honest-cold-cache numbers are slightly higher than v0.2's warm-cache numbers at the same KB size — v0.2 measured 20 repeats of the same query without a cache, which our profiler flattered. With the v0.3 LRU cache, repeated queries become essentially free (0.7 µs), so the real-world latency curve is the cold-cache row for first-time queries and the cache-hit column for everything else.

² Original v0.1 took >10s per recall at 25k entities; numbers extrapolated from the linear growth pattern.

Overall vs v0.1: 5-7× faster cold recalls, ~10,000-130,000× faster cache hits, 2-4× faster ingest (bulk).

Growth pattern: 25× more data → ~22× more latency in v0.3 — closer to linear at the high end because the deterministic embedder is itself O(N) on the brute-force fallback; with sentence-transformers/all-MiniLM-L6-v2 (production embedder, dense vectors), sqlite-vec's index becomes sub-linear and you get the full architectural win.

Also unchanged:

  • Result boundlen(hits) ≤ 10 always, verified at every KB size.
  • Comparison-harness latency dropped from 7.78 ms → 2.75 ms with full accuracy preserved (FACT 100%, CHANGE 68%).

Structural capability matrix (none of these are accuracy claims — they're either present or absent)

Capability THOUGHT OB1 Karpathy wiki
bi-temporal as_of
source-grounded confidence class
contradiction as typed edge
multi-user scope isolation partial (RLS)
append-only audit log
Personalized PageRank retrieval
Ebbinghaus decay scoring
CRAG-style low-confidence flag
Matryoshka 2-pass ANN
Anthropic Contextual Retrieval
query router (VIBE/FACT/CHANGE)
forecasting (TLogic, v0.2) planned

Design rationale

Full architectural discussion in plan.md. Short version of the philosophy:

A memory system should know what kind of question is being asked before it searches anything, store facts with their origin and validity, and never lose history in the act of updating.

The three-layer split (Vector / Graph / Temporal) plus the Router is the architectural answer: each query class is dispatched to the mathematical structure that fits it. The eleven frontier techniques stack 1.5-3× gains on orthogonal axes; together they take the system from "pgvector wrapper" to "memory fabric."

Honest framing: no single 2024-2026 technique gives a 10× recall jump. The "1000× more useful" claim isn't about recall@10; it's about capabilities competitors structurally cannot have (the matrix above) compounded with stacked accuracy gains (the ablation table).


Configuration

Default config (thought.toml, written by thought init):

db_path = ".thought/thought.db"

[embedding]
choice = "auto"           # "auto" picks sentence-transformers if installed,
                          # else deterministic (zero-dep test embedder).
                          # Override: "minilm" | "bge-m3" | "openai" | "deterministic"
dim = 384

[server]
host = "127.0.0.1"
port = 8765

[consolidation]
enabled = true
cycle_seconds = 60.0
cold_demotion_days = 30
staleness_days = 30
batch_size = 100

[llm]                     # optional — enables Contextual Retrieval enrichment
enabled = false
provider = "none"         # "anthropic" | "openai" | "ollama"

thought walks the directory tree (git-style) looking for a thought.toml, so you don't need a --config flag when running from a subfolder of your project.

Environment overrides: THOUGHT_DB_PATH, THOUGHT_EMBEDDER.


CLI reference

Setup / lifecycle

thought init [--quick] [--embedder auto|minilm|deterministic]
                                  # write config + db + CLAUDE.md
thought install --detect          # show every detected MCP client config path
thought install --client cursor   # wire one client (with backup, idempotent)
thought install --all             # wire every detected client
thought start [--client cursor]   # init-if-needed + install + serve in one command
thought serve [--host ... --port ... --skip-precheck]
                                  # start MCP server on Streamable HTTP
thought doctor                    # deep environment health check
thought --version

Ingest

thought ingest "Alice owns Acme Corp."
thought ingest --file notes.md
thought ingest --glob 'docs/**/*.md'
cat changelog.txt | thought ingest --stdin

# Per-item scope
thought ingest --file private-notes.md --scope private --owner-id alice

Recall

thought recall "who owns Acme"
thought recall "what did we say about pricing" --as-of 2026-01-01
thought recall "auth changes" --as-of 2026-01-01 --as-of-kind learned
thought recall "alice" --json     # raw JSON for piping into other tools

Inspect + maintenance

thought stats                     # entities / edges / sources / contradictions / top accessed
thought repl                      # interactive shell — type queries, +text to remember
thought forget 'kendra%'          # soft-delete by SQL LIKE pattern (audit-logged)
thought consolidate               # run one consolidation cycle

Code-vertical commands (v0.2)

thought ingest-code <path> [--glob '**/*.py'] [--lang python|typescript|auto]
                                  # tree-sitter ingest — functions / classes / methods as entities
thought ingest-git <repo> [--mode snapshot|full] [--paths '*.py,*.ts']
                                  # commit-stamped ingest; --mode full walks every commit
thought callers <name> [--file path] [--limit 10]
                                  # direct callers ranked by HippoRAG PageRank
thought impact  <name> [--file path] [--limit 20]
                                  # transitive impact set: what's affected if you change <name>
thought diff   --from <sha1> --to <sha2> [--file path]
                                  # set diff of entities between two ingested commits

Docker

docker build -t thought-mcp .
docker run --rm -p 8765:8765 -v thought-data:/data thought-mcp

The image runs as a non-root user, exposes :8765, persists state at /data, and runs thought serve as the default command. Once tagged releases are pushed, an upstream image is published at ghcr.io/<owner>/thought-mcp:<version> and :latest.


Troubleshooting

thought install --detect says my client path doesn't exist

Most clients only create their config file after first launch. Open the client once, then re-run thought install --client <name>. The installer will create the file if its parent directory exists.

sqlite enable_load_extension reports NO in thought doctor

You're on a Python build without loadable-extension support — most commonly Anaconda's bundled Python. Two fixes:

# Option A — install python.org Python and use that interpreter
# Option B — use pysqlite3-binary
pip install pysqlite3-binary

THOUGHT falls back to a pure-Python ANN path automatically, so this is a performance issue, not a correctness one.

Recall returns low_confidence: true with no results

The CRAG evaluator flags this when the top hit's score is below threshold. Common causes:

  • Knowledge base is empty or lacks anything relevant. Try thought stats to confirm.
  • You're using the deterministic embedder (the test default). Set embedder = "auto" in thought.toml and reinstall sentence-transformers: pip install 'thought-mcp[embeddings-local]'.
  • Query phrasing doesn't match indexed entity names. Use the repl to iterate.

MCP client can't find the server

thought doctor                              # confirm MCP SDK + vec extension load
thought serve --skip-precheck               # try without the precheck
# Then inspect the client's MCP logs — most surface "failed to start" with a path

If uvx thought-mcp serve is in your mcpServers config and uvx isn't on PATH for the GUI client, switch the command to an absolute path to the thought entrypoint (which thought / where thought).

First recall after startup is slow

The first call lazy-loads the embedder (downloads all-MiniLM-L6-v2, ~80 MB, on first run). After that it's warm. Use thought init (without --quick) to pre-download.

Windows console garbles output

The CLI reconfigures stdout/stderr to UTF-8 at startup. If you're piping through a tool that still uses cp1252, set PYTHONIOENCODING=utf-8 in your shell.


Testing & development

pytest tests/unit -q                 # 56 unit tests
pytest tests/perf -m perf            # 4 performance benchmarks
python -m tests.comparison.run       # rebuilds docs/comparison.md
python -m tests.comparison.ablation  # rebuilds docs/ablation.md

Coverage target: 85% on src/thought. CI matrix runs Python 3.11/3.12/3.13 × Ubuntu/macOS/Windows on every push (see .github/workflows/ci.yml). Tagging v* triggers release.yml (PyPI trusted publishing) and docker.yml (multi-arch GHCR image).


Roadmap

Current (shipped) — 11 Tier A frontier techniques (Contextual Retrieval, HippoRAG PageRank, bi-temporal Graphiti, atomic-fact triples + Jaccard dedup, BGE-M3 hybrid embeddings, Matryoshka 2-pass retrieval, CRAG evaluator, MetaRAG confidence class, Ebbinghaus decay, context-engineering budget per query class, append-only writes); comparison + ablation harnesses; two MCP tools; multi-platform CLI with auto-install for five MCP clients; LRU recall cache + PPR matrix cache + sqlite-vec + scipy.sparse PageRank + local push PPR + batched ingest (the three perf passes described above); Docker + PyPI release workflows.

v0.2 fast-follow — RAPTOR hierarchical summary trees at WARM→COLD demotion (Sarthi et al., ICLR 2024); sleep-time compute pre-computation (Letta + UCB, April 2025); TLogic temporal-rule forecasting (arXiv 2112.08025); Reflexion-style self-edit (Shinn et al., NeurIPS 2023); multi-hop deep recall (IRCoT/PRISM); introspective thought audit (transformer-circuits, 2025).

v0.3+ — RankZephyr local reranker, PIKE-RAG domain rationale extraction, DSPy-learned retrieval policies, real Postgres backend, REST API alongside MCP, encryption-at-rest (SQLCipher / pgcrypto), tenant isolation, OpenTelemetry traces/metrics.


License

MIT — see LICENSE.


References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thought_mcp-0.2.2.tar.gz (147.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thought_mcp-0.2.2-py3-none-any.whl (103.7 kB view details)

Uploaded Python 3

File details

Details for the file thought_mcp-0.2.2.tar.gz.

File metadata

  • Download URL: thought_mcp-0.2.2.tar.gz
  • Upload date:
  • Size: 147.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thought_mcp-0.2.2.tar.gz
Algorithm Hash digest
SHA256 3bc5fafd055c479525f0661281ae99759b41f2048300a0795c1e89d7ea2478c2
MD5 e5643a120b71e9ec01b605bfa592676c
BLAKE2b-256 e2d6eb033be46041feb198bd343e0d17c1388136b4f550862fbec238d97f2cd9

See more details on using hashes here.

Provenance

The following attestation bundles were made for thought_mcp-0.2.2.tar.gz:

Publisher: release.yml on RNBBarrett/thought-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thought_mcp-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: thought_mcp-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 103.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thought_mcp-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 561f28c91f468386ad5ea6d4fe6c915b4ca394764fdff9e9cc5a714a167dd743
MD5 439a17965b884537dfd40cae769583f3
BLAKE2b-256 18e9230cfbe227ca737875569f1c5e53c9a31dc728c1ab598fa482ac1fec0769

See more details on using hashes here.

Provenance

The following attestation bundles were made for thought_mcp-0.2.2-py3-none-any.whl:

Publisher: release.yml on RNBBarrett/thought-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page