Local-first, token-efficient memory system for Claude Code via MCP
Project description
Memory that remembers what matters. Built for Claude Code.
Quick Start • How It Works • Tools • Configuration • Benchmarks • Development
Cortex gives Claude Code persistent memory through a local MCP server. Unlike solutions that dump everything into context, Cortex uses progressive recall — a 3-layer retrieval system that returns only what's relevant, using the minimum tokens needed.
The Problem
Memory solutions for AI assistants today waste tokens. They inject entire memory banks into every prompt, regardless of relevance. Cortex takes a different approach:
Save once:
"The auth service uses JWT tokens with 24-hour expiry. Refresh tokens are stored in httpOnly cookies."
Ask later, get back only what matters:
Layer 1: Facts (cheapest)
auth service → use → jwt tokens
auth service → use → hour expiry
Layer 2: Summary (~25% of original)
Layer 3: Full content (only if needed)
The system stops at the cheapest layer that answers the question. 66% fewer tokens on average.
Key Features
- Progressive recall — 3 layers (facts → summaries → full content), stops at the cheapest sufficient layer
- Knowledge graph — auto-extracts structured facts via spaCy NLP with multi-hop traversal
- Smart extraction — handles bullet lists, key:value pairs, comma lists, slash-separated tech, parentheticals, passive voice
- Claude fallback — optional Claude-assisted extraction when local NLP isn't enough
- Entity normalization — "postgres", "PostgreSQL", "pg" all resolve to the same entity
- Graph traversal — navigate entity connections across multiple hops (A → B → C)
- Hybrid search — vector similarity + FTS5 keyword search combined
- Configurable scopes — global, per-project, or custom memory boundaries
- Deduplication — detects and merges near-identical memories automatically
- Decay system — unused memories lose relevance over time, keeping results fresh
- Auto-capture — hooks automatically save tool results (Bash, Read, Grep, Edit, Write) to memory
- Session context injection — injects memory stats and known facts at session start
- Background daemon — pre-loaded model for instant saves via Unix socket (~0.3s vs ~5s cold start)
- Multi-language — EN, PT (auto-detected). ES, DE, FR with additional spaCy models
- Local-first — SQLite + local embeddings + local NLP. Zero API calls, zero network, zero cost
- Fully configurable — all thresholds, ratios, and behaviors via config.json
Quick Start
Install
pip install cortex-claude
# With Claude-assisted extraction (optional)
pip install cortex-claude[claude]
Configure Claude Code
Add a .mcp.json to your project root (or ~/.claude.json for global):
{
"mcpServers": {
"cortex": {
"type": "stdio",
"command": "python",
"args": ["-m", "cortex_claude"]
}
}
}
First run downloads the embedding model (~80MB) and spaCy model (~12MB) automatically.
Use
Just talk to Claude naturally. Cortex works in the background:
"Remember that the API uses rate limiting at 500 req/min"
"What do you know about rate limiting?"
"What facts do you have about the API?"
"What's connected to the auth service?"
"Forget what I said about the old API key"
"Show me the memory status"
Tools
Cortex exposes 7 MCP tools to Claude Code:
| Tool | What it does | Token cost |
|---|---|---|
cortex_save |
Store memory with auto fact extraction, summarization, and embedding | N/A |
cortex_recall |
Progressive retrieval: facts → summaries → full content | Controlled via max_tokens |
cortex_facts |
Direct knowledge graph query, returns structured triplets | ~5-15 tokens per fact |
cortex_traverse |
Navigate the knowledge graph across multiple hops | ~5-15 tokens per connection |
cortex_forget |
Delete memories by query or ID (dry-run by default) | N/A |
cortex_scopes |
Manage scopes: list, create, delete, link/unlink directories | N/A |
cortex_status |
Dashboard: memory count, fact count, storage size per scope | N/A |
Recall Depth Modes
| Mode | Returns | When to use |
|---|---|---|
auto |
Starts cheap, escalates if needed | Default — best for most queries |
facts |
Only knowledge graph triplets | Quick lookups, minimal token use |
summaries |
Facts + compressed summaries | Medium detail needed |
full |
All layers including original text | Full context needed |
How It Works
Save: content → embedding + fact extraction + summarization → SQLite
Recall (progressive):
1. Facts layer (~5-15 tokens/fact) → sufficient? stop
2. Summaries layer (~25% of original) → sufficient? stop
3. Full chunks (original content) → return
Fact Extraction
Three methods combined for maximum coverage:
- spaCy NLP — dependency parsing, NER, passive voice handling, conjunction expansion
- Pattern matching — bullet lists (
- X for Y), key:value, comma lists, slash-separated (React/TypeScript), parentheticals (FastAPI (Python)), with/for constructs - Claude fallback (opt-in) — when local extraction produces < 2 high-confidence facts, falls back to Claude Haiku. Off by default.
Entities are normalized and deduplicated: "postgres" → "postgresql", "js" → "javascript", "k8s" → "kubernetes".
Graph Traversal
Navigate entity connections across multiple hops:
auth → JWT → express-jwt → middleware
↓
httpOnly cookies
Query cortex_traverse("auth") and discover everything connected.
Decay
Memories that aren't accessed lose relevance over time:
score = e^(-lambda * days) * (1 + log(access_count))
Recalculated on server startup. Frequently accessed memories get boosted. Stale ones fade.
Hybrid Search
Combines vector similarity (semantic meaning) with FTS5 (exact keyword match) for best recall. FTS5 synced automatically via SQLite triggers.
Cortex vs Traditional Memory (claude-mem, etc.)
Most memory solutions for AI assistants follow the same pattern: capture observations, compress them, inject into every prompt. Cortex takes a fundamentally different approach.
| Traditional (claude-mem) | Cortex | |
|---|---|---|
| Storage model | Compressed text / summaries | Knowledge graph (structured triplets) + summaries + full text |
| Retrieval | Auto-inject into every session (~500-2000 tokens) | Progressive recall: facts → summaries → full (~50-100 tokens) |
| Intelligence | Linear summaries | Structured facts with entity relationships + graph traversal |
| Search | Keyword / vector | Hybrid: semantic embedding + FTS5 keyword + graph traversal |
| Extraction | Captures tool call observations | NLP fact extraction (spaCy + patterns + optional Claude) |
| Entity awareness | None | "postgres" = "postgresql" = "pg" (normalized + fuzzy match) |
| Graph navigation | None | Multi-hop traversal (A → B → C) |
| Staleness | No decay | Unused memories lose relevance over time |
| Duplicates | Can accumulate | Auto-merged (cosine similarity > 0.92) |
| Auto-capture | All tool calls via hooks | Bash, Read, Grep, Edit, Write via hooks + background daemon |
| Session injection | Full context dump on start | Lightweight facts injection via direct SQLite (<1s) |
| Dependencies | Node.js, Bun, Chroma vector DB | Python + SQLite only (zero external services) |
| Web UI | Viewer on localhost:37777 | CLI only (planned) |
Where Cortex wins
- Token efficiency — 66% fewer tokens returned. Progressive recall stops at the cheapest sufficient layer.
- Structured understanding — knowledge graph with traversal vs flat text. Ask "what's connected to auth?" and get real graph navigation.
- Entity intelligence — normalizes and deduplicates entities across memories. "postgres", "PostgreSQL", and "pg" are the same thing.
- Freshness — decay system ensures frequently accessed memories rank higher. Old, unused memories fade.
- Zero external services — no Chroma, no Bun, no Node.js. Just Python + SQLite.
- Multi-language — fact extraction works in EN, PT (auto-detected), with ES/DE/FR support.
Where claude-mem wins
- Broader auto-capture — captures all tool calls including MCP tools from third-party servers.
- Web viewer — real-time memory browser at localhost:37777.
- One-command install —
npx claude-mem installvs manual MCP config. - Larger community — more stars, Discord, extensive documentation site.
The fundamental difference
Traditional memory asks "what happened?" — Cortex asks "what matters?"
Traditional solutions observe and replay. Cortex understands, structures, and retrieves surgically. The result: fewer tokens, more relevant answers, and a knowledge graph that grows smarter over time.
Benchmarks
With 10 stored memories (244 total tokens):
| Depth | Tokens returned | Reduction | Latency |
|---|---|---|---|
facts |
82 | 66% | ~10ms |
auto |
82 | 66% | ~10ms |
full |
244 | 0% | ~12ms |
| Operation | Latency |
|---|---|
cortex_facts query |
0.1ms |
| Graph traversal (2 hops) | 0.2ms |
| Save (after model load) | ~30ms |
Run benchmarks yourself:
uv run python scripts/benchmark.py
Configuration
All behavior is customizable via ~/.cortex-claude/config.json:
{
"recall": {
"default_max_tokens": 200,
"default_depth": "auto",
"sufficiency": {
"coverage_threshold": 0.7,
"confidence_threshold": 0.6
}
},
"embeddings": {
"model": "all-MiniLM-L6-v2",
"batch_size": 32
},
"facts": {
"extraction_method": "local",
"min_confidence": 0.5,
"claude_fallback": false,
"claude_confidence_threshold": 0.5
},
"decay": {
"lambda": 0.05,
"recalculate_interval_hours": 6,
"min_score": 0.01
},
"deduplication": {
"similarity_threshold": 0.92,
"merge_strategy": "append"
},
"scopes": {
"mappings": {
"/path/to/project-a": "project:a",
"/path/to/project-b": "project:b"
},
"default": "global",
"search_order": "project_first"
},
"storage": {
"max_db_size_mb": 500
}
}
All fields are optional. Defaults are used for anything not specified.
See examples/ for ready-to-use configuration files.
Development
git clone https://github.com/rafaelaugustos/cortex-claude.git
cd cortex-claude
uv venv --python python3.13
uv sync --all-extras
uv run python -m spacy download en_core_web_sm
uv run pytest
# Run the demo
uv run python scripts/demo.py
# Run benchmarks
uv run python scripts/benchmark.py
See CONTRIBUTING.md for contribution guidelines.
Architecture
See ARCHITECTURE.md for the full technical specification.
License
MIT — see LICENSE.
Built with Python • Powered by Claude Code • Made by Rafael Augusto
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cortex_claude-0.2.0.tar.gz.
File metadata
- Download URL: cortex_claude-0.2.0.tar.gz
- Upload date:
- Size: 300.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e99bb24bacd0ec8af8d10b4fa3041fe80037b9b19fd0146327e324bbe726ccc5
|
|
| MD5 |
5e3efcc1cd6891ac35dd01510dd0b74d
|
|
| BLAKE2b-256 |
f7a261861cf513c60f29d2c4f3d706b25abe28e45777494c6b8a4179a7a22458
|
Provenance
The following attestation bundles were made for cortex_claude-0.2.0.tar.gz:
Publisher:
publish.yml on rafaelaugustos/cortex-claude
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cortex_claude-0.2.0.tar.gz -
Subject digest:
e99bb24bacd0ec8af8d10b4fa3041fe80037b9b19fd0146327e324bbe726ccc5 - Sigstore transparency entry: 1376513807
- Sigstore integration time:
-
Permalink:
rafaelaugustos/cortex-claude@6538a2f4cf842bc00b89b5bbfe6687e43e4486b0 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/rafaelaugustos
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6538a2f4cf842bc00b89b5bbfe6687e43e4486b0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cortex_claude-0.2.0-py3-none-any.whl.
File metadata
- Download URL: cortex_claude-0.2.0-py3-none-any.whl
- Upload date:
- Size: 40.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdb2a56499efbb03d476b9d9ae1cc18344ef01ea6f37c06e770aeea1f147d388
|
|
| MD5 |
849d6137e066fcba7f8045d98ed9b15c
|
|
| BLAKE2b-256 |
e1c5d8ded6ef016c600c7a90bbe805a3ed7b5cb47ce41935e846755630141814
|
Provenance
The following attestation bundles were made for cortex_claude-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on rafaelaugustos/cortex-claude
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cortex_claude-0.2.0-py3-none-any.whl -
Subject digest:
fdb2a56499efbb03d476b9d9ae1cc18344ef01ea6f37c06e770aeea1f147d388 - Sigstore transparency entry: 1376513813
- Sigstore integration time:
-
Permalink:
rafaelaugustos/cortex-claude@6538a2f4cf842bc00b89b5bbfe6687e43e4486b0 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/rafaelaugustos
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6538a2f4cf842bc00b89b5bbfe6687e43e4486b0 -
Trigger Event:
push
-
Statement type: