Skip to main content

Local-first, token-efficient memory system for Claude Code via MCP

Project description


Cortex Claude

Memory that remembers what matters. Built for Claude Code.

License Version Python

Quick StartHow It WorksToolsDashboardConfigurationBenchmarksDevelopment

Cortex gives Claude Code persistent memory through a local MCP server. Unlike solutions that dump everything into context, Cortex uses progressive recall — a 3-layer retrieval system that returns only what's relevant, using the minimum tokens needed.


Cortex Claude Dashboard

Interactive knowledge graph — explore entities, facts, and memories visually.


The Problem

Memory solutions for AI assistants today waste tokens. They inject entire memory banks into every prompt, regardless of relevance. Cortex takes a different approach:

Save once:

"The auth service uses JWT tokens with 24-hour expiry. Refresh tokens are stored in httpOnly cookies."

Ask later, get back only what matters:

Layer 1: Facts (cheapest)
  auth service → use → jwt tokens
  auth service → use → hour expiry

Layer 2: Summary (~25% of original)

Layer 3: Full content (only if needed)

The system stops at the cheapest layer that answers the question. 66% fewer tokens on average.


Key Features

  • Progressive recall — 3 layers (facts → summaries → full content), stops at the cheapest sufficient layer
  • Knowledge graph — auto-extracts structured facts via spaCy NLP with multi-hop traversal
  • Smart extraction — handles bullet lists, key:value pairs, comma lists, slash-separated tech, parentheticals, passive voice
  • Claude fallback — optional Claude-assisted extraction when local NLP isn't enough
  • Entity normalization — "postgres", "PostgreSQL", "pg" all resolve to the same entity
  • Graph traversal — navigate entity connections across multiple hops (A → B → C)
  • Hybrid search — vector similarity + FTS5 keyword search combined
  • Configurable scopes — global, per-project, or custom memory boundaries
  • Deduplication — detects and merges near-identical memories automatically
  • Fact merging — when multiple memories mention the same fact, it's consolidated into one with boosted confidence
  • Temporal awareness — facts detect when things happened ("in April 2024", "yesterday", "since v2") and attach timestamps
  • Confidence recalibration — facts gain confidence when accessed frequently, lose it when contradicted by newer information
  • Decay system — unused memories lose relevance over time, keeping results fresh
  • Auto-capture — hooks automatically save tool results (Bash, Read, Grep, Edit, Write) to memory
  • Session context injection — injects memory stats and known facts at session start
  • Background daemon — pre-loaded model for instant saves via Unix socket (~0.3s vs ~5s cold start)
  • Multi-language — EN, PT (auto-detected). ES, DE, FR with additional spaCy models
  • Embedding model choice — 11 models supported, from fast (all-MiniLM, 384d) to high-quality (bge-large, 1024d). Auto-detects dimensions
  • Local-first — SQLite + local embeddings + local NLP. Zero API calls, zero network, zero cost
  • Privacy tags — wrap sensitive content in <private>...</private> to exclude it from memory
  • Fully configurable — all thresholds, ratios, and behaviors via config.json

Quick Start

One-Command Install

pip install cortex-claude && cortex-claude setup

That's it. The setup command:

  • Configures the MCP server globally for Claude Code
  • Installs auto-capture hooks (SessionStart + PostToolUse)
  • Creates ~/.claude/CLAUDE.md with instructions for Claude to use Cortex automatically
  • Downloads the embedding model (~80MB) and spaCy model (~12MB)
  • Starts the background daemon for instant saves

Restart Claude Code and it works in every project, no per-project config needed. Claude will automatically consult Cortex before saying "I don't know" and save important context to memory.

With Claude-assisted extraction (optional)

pip install cortex-claude[claude]

Manual Setup (alternative)

If you prefer manual configuration, add a .mcp.json to your project root:

{
  "mcpServers": {
    "cortex": {
      "type": "stdio",
      "command": "python3",
      "args": ["-m", "cortex_claude"]
    }
  }
}

Use

Just talk to Claude naturally. Cortex works in the background:

"Remember that the API uses rate limiting at 500 req/min"
"What do you know about rate limiting?"
"What facts do you have about the API?"
"What's connected to the auth service?"
"Forget what I said about the old API key"
"Show me the memory status"

Tools

Cortex exposes 7 MCP tools to Claude Code:

Tool What it does Token cost
cortex_save Store memory with auto fact extraction, summarization, and embedding N/A
cortex_recall Progressive retrieval: facts → summaries → full content Controlled via max_tokens
cortex_facts Direct knowledge graph query, returns structured triplets ~5-15 tokens per fact
cortex_traverse Navigate the knowledge graph across multiple hops ~5-15 tokens per connection
cortex_forget Delete memories by query or ID (dry-run by default) N/A
cortex_scopes Manage scopes: list, create, delete, link/unlink directories N/A
cortex_status Dashboard: memory count, fact count, storage size per scope N/A

Recall Depth Modes

Mode Returns When to use
auto Starts cheap, escalates if needed Default — best for most queries
facts Only knowledge graph triplets Quick lookups, minimal token use
summaries Facts + compressed summaries Medium detail needed
full All layers including original text Full context needed

How It Works

Save: content → embedding + fact extraction + summarization → SQLite

Recall (progressive):
  1. Facts layer     (~5-15 tokens/fact)   → sufficient? stop
  2. Summaries layer (~25% of original)    → sufficient? stop
  3. Full chunks     (original content)    → return

Fact Extraction

Three methods combined for maximum coverage:

  1. spaCy NLP — dependency parsing, NER, passive voice handling, conjunction expansion
  2. Pattern matching — bullet lists (- X for Y), key:value, comma lists, slash-separated (React/TypeScript), parentheticals (FastAPI (Python)), with/for constructs
  3. Claude fallback (opt-in) — when local extraction produces < 2 high-confidence facts, falls back to Claude Haiku. Off by default.

Entities are normalized and deduplicated: "postgres""postgresql", "js""javascript", "k8s""kubernetes".

Graph Traversal

Navigate entity connections across multiple hops:

auth → JWT → express-jwt → middleware
  ↓
  httpOnly cookies

Query cortex_traverse("auth") and discover everything connected.

Decay

Memories that aren't accessed lose relevance over time:

score = e^(-lambda * days) * (1 + log(access_count))

Recalculated on server startup. Frequently accessed memories get boosted. Stale ones fade.

Auto-Capture & Hooks

Cortex uses Claude Code hooks to work automatically:

  • SessionStart — injects memory stats and known facts when you open a session. Claude knows it has memory and consults it before saying "I don't know".
  • PostToolUse — captures results from all tools in background: Bash, Read, Edit, Write, Grep, Glob, Agent, WebSearch, WebFetch, and all third-party MCP tools. No manual save needed.
  • Background Daemon — keeps the embedding model pre-loaded via Unix socket. First save after boot: ~5s (model load). Subsequent saves: ~0.3s.

All hooks are installed globally by cortex-claude setup. No per-project config needed.

Privacy

Wrap sensitive content in <private> tags to exclude it from memory:

The API uses JWT for auth. <private>API_KEY=sk-abc123secret</private> Tokens expire in 24h.

Cortex strips everything between <private>...</private> before saving. Works in both manual saves and auto-capture. If the entire content is private, nothing is saved.

Hybrid Search

Combines vector similarity (semantic meaning) with FTS5 (exact keyword match) for best recall. FTS5 synced automatically via SQLite triggers.

Web Dashboard

Interactive knowledge graph visualization and memory browser:

cortex-claude web
# Opens at http://localhost:37800

Features:

  • Interactive knowledge graph — nodes are entities, edges are relations. Click a node to see all its facts and related memories.
  • Memory browser — browse all memories with search, tags, scope, and decay score.
  • Facts list — all extracted triplets, click to focus on the graph node.
  • Live stats — memory count, fact count, scopes, storage size.
  • Dark theme — designed for developers.

Cortex vs Traditional Memory (claude-mem, etc.)

Most memory solutions for AI assistants follow the same pattern: capture observations, compress them, inject into every prompt. Cortex takes a fundamentally different approach.

Traditional (claude-mem) Cortex
Storage model Compressed text / summaries Knowledge graph (structured triplets) + summaries + full text
Retrieval Auto-inject into every session (~500-2000 tokens) Progressive recall: facts → summaries → full (~50-100 tokens)
Intelligence Linear summaries Structured facts with entity relationships + graph traversal
Search Keyword / vector Hybrid: semantic embedding + FTS5 keyword + graph traversal
Extraction Captures tool call observations NLP fact extraction (spaCy + patterns + optional Claude)
Entity awareness None "postgres" = "postgresql" = "pg" (normalized + fuzzy match)
Graph navigation None Multi-hop traversal (A → B → C)
Staleness No decay Unused memories lose relevance over time
Duplicates Can accumulate Auto-merged (cosine similarity > 0.92)
Auto-capture All tool calls via hooks All tools (Bash, Read, Edit, Write, Grep, Glob, Agent, MCP, Web) via hooks + daemon
Session injection Full context dump on start Lightweight facts injection via direct SQLite (<1s)
Dependencies Node.js, Bun, Chroma vector DB Python + SQLite only (zero external services)
Web UI Viewer on localhost:37777 Interactive graph dashboard on localhost:37800

Where Cortex wins

  • Token efficiency — 66% fewer tokens returned. Progressive recall stops at the cheapest sufficient layer.
  • Structured understanding — knowledge graph with traversal vs flat text. Ask "what's connected to auth?" and get real graph navigation.
  • Entity intelligence — normalizes and deduplicates entities across memories. "postgres", "PostgreSQL", and "pg" are the same thing.
  • Freshness — decay system ensures frequently accessed memories rank higher. Old, unused memories fade.
  • Zero external services — no Chroma, no Bun, no Node.js. Just Python + SQLite.
  • Multi-language — fact extraction works in EN, PT (auto-detected), with ES/DE/FR support.

Where claude-mem wins

  • Broader auto-capture — captures all tool calls including MCP tools from third-party servers.
  • Web viewer — real-time memory browser (Cortex has an interactive graph dashboard, but claude-mem's is more mature).
  • One-command installnpx claude-mem install vs manual MCP config.
  • Larger community — more stars, Discord, extensive documentation site.

The fundamental difference

Traditional memory asks "what happened?" — Cortex asks "what matters?"

Traditional solutions observe and replay. Cortex understands, structures, and retrieves surgically. The result: fewer tokens, more relevant answers, and a knowledge graph that grows smarter over time.


Benchmarks

With 10 stored memories (244 total tokens):

Depth Tokens returned Reduction Latency
facts 82 66% ~10ms
auto 82 66% ~10ms
full 244 0% ~12ms
Operation Latency
cortex_facts query 0.1ms
Graph traversal (2 hops) 0.2ms
Save (after model load) ~30ms

Run benchmarks yourself:

uv run python scripts/benchmark.py

Configuration

All behavior is customizable via ~/.cortex-claude/config.json:

{
  "recall": {
    "default_max_tokens": 200,
    "default_depth": "auto",
    "sufficiency": {
      "coverage_threshold": 0.7,
      "confidence_threshold": 0.6
    }
  },
  "embeddings": {
    "model": "all-MiniLM-L6-v2",
    "batch_size": 32,
    "_available_models": [
      "all-MiniLM-L6-v2 (384d, 80MB, fast)",
      "all-mpnet-base-v2 (768d, 420MB, medium)",
      "BAAI/bge-small-en-v1.5 (384d, 130MB, fast)",
      "BAAI/bge-base-en-v1.5 (768d, 440MB, medium)",
      "BAAI/bge-large-en-v1.5 (1024d, 1.3GB, best quality)",
      "intfloat/e5-large-v2 (1024d, 1.3GB, best quality)",
      "intfloat/multilingual-e5-small (384d, 470MB, multilingual)"
    ]
  },
  "facts": {
    "extraction_method": "local",
    "min_confidence": 0.5,
    "claude_fallback": false,
    "claude_confidence_threshold": 0.5
  },
  "decay": {
    "lambda": 0.05,
    "recalculate_interval_hours": 6,
    "min_score": 0.01
  },
  "deduplication": {
    "similarity_threshold": 0.92,
    "merge_strategy": "append"
  },
  "scopes": {
    "mappings": {
      "/path/to/project-a": "project:a",
      "/path/to/project-b": "project:b"
    },
    "default": "global",
    "search_order": "project_first"
  },
  "storage": {
    "max_db_size_mb": 500
  }
}

All fields are optional. Defaults are used for anything not specified.

See examples/ for ready-to-use configuration files.


Development

git clone https://github.com/rafaelaugustos/cortex-claude.git
cd cortex-claude
uv venv --python python3.13
uv sync --all-extras
uv run python -m spacy download en_core_web_sm
uv run pytest
# Run the demo
uv run python scripts/demo.py

# Run benchmarks
uv run python scripts/benchmark.py

See CONTRIBUTING.md for contribution guidelines.

Architecture

See ARCHITECTURE.md for the full technical specification.


Something Missing?

We'd love to hear from you. Open an issue on GitHub:

Found a bug?

  1. Go to Issues
  2. Describe what happened vs. what you expected
  3. Include your Python version, OS, and steps to reproduce

Have a feature request?

  1. Go to Issues
  2. Describe the problem you're trying to solve
  3. Suggest a solution if you have one

Want to contribute?

  1. Check CONTRIBUTING.md for guidelines
  2. Look for issues labeled good first issue
  3. Fork, branch, code, test, PR

License

MIT — see LICENSE.


Built with PythonPowered by Claude CodeMade by Rafael Augusto

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cortex_claude-0.5.0.tar.gz (4.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cortex_claude-0.5.0-py3-none-any.whl (337.5 kB view details)

Uploaded Python 3

File details

Details for the file cortex_claude-0.5.0.tar.gz.

File metadata

  • Download URL: cortex_claude-0.5.0.tar.gz
  • Upload date:
  • Size: 4.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cortex_claude-0.5.0.tar.gz
Algorithm Hash digest
SHA256 36fe634cf0d627800c2ef7286f7232e3ec4c3956de73d0f4baec3343651ae065
MD5 35f7ac3137967a62e0774f6215e760fc
BLAKE2b-256 9d9354cd46e146638c3314154ea36e564039e308595b566a22d97094e2da83dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortex_claude-0.5.0.tar.gz:

Publisher: publish.yml on rafaelaugustos/cortex-claude

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cortex_claude-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: cortex_claude-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 337.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cortex_claude-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e791d2e453ec9ea7a018623534c339323435ffcd5abc070d20cbbb9f5cdd2f0
MD5 63ff7adce536a0ce871397c3890a8f7a
BLAKE2b-256 e117be25b5e3ad8054438d5ec6ed969a117351ea137a23b39140fc36b0f59e56

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortex_claude-0.5.0-py3-none-any.whl:

Publisher: publish.yml on rafaelaugustos/cortex-claude

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page