cortex-claude

Local-first, token-efficient memory system for Claude Code via MCP

These details have not been verified by PyPI

Project links

Project description

Memory that remembers what matters. Built for Claude Code.

Quick Start • How It Works • Tools • Configuration • Benchmarks • Development

Cortex gives Claude Code persistent memory through a local MCP server. Unlike solutions that dump everything into context, Cortex uses progressive recall — a 3-layer retrieval system that returns only what's relevant, using the minimum tokens needed.

The Problem

Memory solutions for AI assistants today waste tokens. They inject entire memory banks into every prompt, regardless of relevance. Cortex takes a different approach:

Save once:

"The auth service uses JWT tokens with 24-hour expiry. Refresh tokens are stored in httpOnly cookies."

Ask later, get back only what matters:

Layer 1: Facts (cheapest)
  auth service → use → jwt tokens
  auth service → use → hour expiry

Layer 2: Summary (~25% of original)

Layer 3: Full content (only if needed)

The system stops at the cheapest layer that answers the question. 66% fewer tokens on average.

Key Features

Progressive recall — 3 layers (facts → summaries → full content), stops at the cheapest sufficient layer
Knowledge graph — auto-extracts structured facts via spaCy NLP with multi-hop traversal
Smart extraction — handles bullet lists, key:value pairs, comma lists, slash-separated tech, parentheticals, passive voice
Claude fallback — optional Claude-assisted extraction when local NLP isn't enough
Entity normalization — "postgres", "PostgreSQL", "pg" all resolve to the same entity
Graph traversal — navigate entity connections across multiple hops (A → B → C)
Hybrid search — vector similarity + FTS5 keyword search combined
Configurable scopes — global, per-project, or custom memory boundaries
Deduplication — detects and merges near-identical memories automatically
Decay system — unused memories lose relevance over time, keeping results fresh
Auto-capture — hooks automatically save tool results (Bash, Read, Grep, Edit, Write) to memory
Session context injection — injects memory stats and known facts at session start
Background daemon — pre-loaded model for instant saves via Unix socket (~0.3s vs ~5s cold start)
Multi-language — EN, PT (auto-detected). ES, DE, FR with additional spaCy models
Local-first — SQLite + local embeddings + local NLP. Zero API calls, zero network, zero cost
Fully configurable — all thresholds, ratios, and behaviors via config.json

Quick Start

Install

pip install cortex-claude

# With Claude-assisted extraction (optional)
pip install cortex-claude[claude]

Configure Claude Code

Add a .mcp.json to your project root (or ~/.claude.json for global):

{
  "mcpServers": {
    "cortex": {
      "type": "stdio",
      "command": "python",
      "args": ["-m", "cortex_claude"]
    }
  }
}

First run downloads the embedding model (~80MB) and spaCy model (~12MB) automatically.

Use

Just talk to Claude naturally. Cortex works in the background:

"Remember that the API uses rate limiting at 500 req/min"
"What do you know about rate limiting?"
"What facts do you have about the API?"
"What's connected to the auth service?"
"Forget what I said about the old API key"
"Show me the memory status"

Tools

Cortex exposes 7 MCP tools to Claude Code:

Tool	What it does	Token cost
`cortex_save`	Store memory with auto fact extraction, summarization, and embedding	N/A
`cortex_recall`	Progressive retrieval: facts → summaries → full content	Controlled via `max_tokens`
`cortex_facts`	Direct knowledge graph query, returns structured triplets	~5-15 tokens per fact
`cortex_traverse`	Navigate the knowledge graph across multiple hops	~5-15 tokens per connection
`cortex_forget`	Delete memories by query or ID (dry-run by default)	N/A
`cortex_scopes`	Manage scopes: list, create, delete, link/unlink directories	N/A
`cortex_status`	Dashboard: memory count, fact count, storage size per scope	N/A

Recall Depth Modes

Mode	Returns	When to use
`auto`	Starts cheap, escalates if needed	Default — best for most queries
`facts`	Only knowledge graph triplets	Quick lookups, minimal token use
`summaries`	Facts + compressed summaries	Medium detail needed
`full`	All layers including original text	Full context needed

How It Works

Save: content → embedding + fact extraction + summarization → SQLite

Recall (progressive):
  1. Facts layer     (~5-15 tokens/fact)   → sufficient? stop
  2. Summaries layer (~25% of original)    → sufficient? stop
  3. Full chunks     (original content)    → return

Fact Extraction

Three methods combined for maximum coverage:

spaCy NLP — dependency parsing, NER, passive voice handling, conjunction expansion
Pattern matching — bullet lists (- X for Y), key:value, comma lists, slash-separated (React/TypeScript), parentheticals (FastAPI (Python)), with/for constructs
Claude fallback (opt-in) — when local extraction produces < 2 high-confidence facts, falls back to Claude Haiku. Off by default.

Entities are normalized and deduplicated: "postgres" → "postgresql", "js" → "javascript", "k8s" → "kubernetes".

Graph Traversal

Navigate entity connections across multiple hops:

auth → JWT → express-jwt → middleware
  ↓
  httpOnly cookies

Query cortex_traverse("auth") and discover everything connected.

Decay

Memories that aren't accessed lose relevance over time:

score = e^(-lambda * days) * (1 + log(access_count))

Recalculated on server startup. Frequently accessed memories get boosted. Stale ones fade.

Hybrid Search

Combines vector similarity (semantic meaning) with FTS5 (exact keyword match) for best recall. FTS5 synced automatically via SQLite triggers.

Cortex vs Traditional Memory (claude-mem, etc.)

Most memory solutions for AI assistants follow the same pattern: capture observations, compress them, inject into every prompt. Cortex takes a fundamentally different approach.

	Traditional (claude-mem)	Cortex
Storage model	Compressed text / summaries	Knowledge graph (structured triplets) + summaries + full text
Retrieval	Auto-inject into every session (~500-2000 tokens)	Progressive recall: facts → summaries → full (~50-100 tokens)
Intelligence	Linear summaries	Structured facts with entity relationships + graph traversal
Search	Keyword / vector	Hybrid: semantic embedding + FTS5 keyword + graph traversal
Extraction	Captures tool call observations	NLP fact extraction (spaCy + patterns + optional Claude)
Entity awareness	None	"postgres" = "postgresql" = "pg" (normalized + fuzzy match)
Graph navigation	None	Multi-hop traversal (A → B → C)
Staleness	No decay	Unused memories lose relevance over time
Duplicates	Can accumulate	Auto-merged (cosine similarity > 0.92)
Auto-capture	All tool calls via hooks	Bash, Read, Grep, Edit, Write via hooks + background daemon
Session injection	Full context dump on start	Lightweight facts injection via direct SQLite (<1s)
Dependencies	Node.js, Bun, Chroma vector DB	Python + SQLite only (zero external services)
Web UI	Viewer on localhost:37777	CLI only (planned)

Where Cortex wins

Token efficiency — 66% fewer tokens returned. Progressive recall stops at the cheapest sufficient layer.
Structured understanding — knowledge graph with traversal vs flat text. Ask "what's connected to auth?" and get real graph navigation.
Entity intelligence — normalizes and deduplicates entities across memories. "postgres", "PostgreSQL", and "pg" are the same thing.
Freshness — decay system ensures frequently accessed memories rank higher. Old, unused memories fade.
Zero external services — no Chroma, no Bun, no Node.js. Just Python + SQLite.
Multi-language — fact extraction works in EN, PT (auto-detected), with ES/DE/FR support.

Where claude-mem wins

Broader auto-capture — captures all tool calls including MCP tools from third-party servers.
Web viewer — real-time memory browser at localhost:37777.
One-command install — npx claude-mem install vs manual MCP config.
Larger community — more stars, Discord, extensive documentation site.

The fundamental difference

Traditional memory asks "what happened?" — Cortex asks "what matters?"

Traditional solutions observe and replay. Cortex understands, structures, and retrieves surgically. The result: fewer tokens, more relevant answers, and a knowledge graph that grows smarter over time.

Benchmarks

With 10 stored memories (244 total tokens):

Depth	Tokens returned	Reduction	Latency
`facts`	82	66%	~10ms
`auto`	82	66%	~10ms
`full`	244	0%	~12ms

Operation	Latency
`cortex_facts` query	0.1ms
Graph traversal (2 hops)	0.2ms
Save (after model load)	~30ms

Run benchmarks yourself:

uv run python scripts/benchmark.py

Configuration

All behavior is customizable via ~/.cortex-claude/config.json:

{
  "recall": {
    "default_max_tokens": 200,
    "default_depth": "auto",
    "sufficiency": {
      "coverage_threshold": 0.7,
      "confidence_threshold": 0.6
    }
  },
  "embeddings": {
    "model": "all-MiniLM-L6-v2",
    "batch_size": 32
  },
  "facts": {
    "extraction_method": "local",
    "min_confidence": 0.5,
    "claude_fallback": false,
    "claude_confidence_threshold": 0.5
  },
  "decay": {
    "lambda": 0.05,
    "recalculate_interval_hours": 6,
    "min_score": 0.01
  },
  "deduplication": {
    "similarity_threshold": 0.92,
    "merge_strategy": "append"
  },
  "scopes": {
    "mappings": {
      "/path/to/project-a": "project:a",
      "/path/to/project-b": "project:b"
    },
    "default": "global",
    "search_order": "project_first"
  },
  "storage": {
    "max_db_size_mb": 500
  }
}

All fields are optional. Defaults are used for anything not specified.

See examples/ for ready-to-use configuration files.

Development

git clone https://github.com/rafaelaugustos/cortex-claude.git
cd cortex-claude
uv venv --python python3.13
uv sync --all-extras
uv run python -m spacy download en_core_web_sm
uv run pytest

# Run the demo
uv run python scripts/demo.py

# Run benchmarks
uv run python scripts/benchmark.py

See CONTRIBUTING.md for contribution guidelines.

Architecture

See ARCHITECTURE.md for the full technical specification.

License

MIT — see LICENSE.

Built with Python • Powered by Claude Code • Made by Rafael Augusto

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.0

Apr 25, 2026

0.4.0

Apr 25, 2026

0.3.0

Apr 25, 2026

This version

0.2.0

Apr 25, 2026

0.1.0

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cortex_claude-0.2.0.tar.gz (300.5 kB view details)

Uploaded Apr 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cortex_claude-0.2.0-py3-none-any.whl (40.1 kB view details)

Uploaded Apr 25, 2026 Python 3

File details

Details for the file cortex_claude-0.2.0.tar.gz.

File metadata

Download URL: cortex_claude-0.2.0.tar.gz
Upload date: Apr 25, 2026
Size: 300.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cortex_claude-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e99bb24bacd0ec8af8d10b4fa3041fe80037b9b19fd0146327e324bbe726ccc5`
MD5	`5e3efcc1cd6891ac35dd01510dd0b74d`
BLAKE2b-256	`f7a261861cf513c60f29d2c4f3d706b25abe28e45777494c6b8a4179a7a22458`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortex_claude-0.2.0.tar.gz:

Publisher: publish.yml on rafaelaugustos/cortex-claude

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cortex_claude-0.2.0.tar.gz
- Subject digest: e99bb24bacd0ec8af8d10b4fa3041fe80037b9b19fd0146327e324bbe726ccc5
- Sigstore transparency entry: 1376513807
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: rafaelaugustos/cortex-claude@6538a2f4cf842bc00b89b5bbfe6687e43e4486b0
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/rafaelaugustos
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6538a2f4cf842bc00b89b5bbfe6687e43e4486b0
- Trigger Event: push

File details

Details for the file cortex_claude-0.2.0-py3-none-any.whl.

File metadata

Download URL: cortex_claude-0.2.0-py3-none-any.whl
Upload date: Apr 25, 2026
Size: 40.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cortex_claude-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fdb2a56499efbb03d476b9d9ae1cc18344ef01ea6f37c06e770aeea1f147d388`
MD5	`849d6137e066fcba7f8045d98ed9b15c`
BLAKE2b-256	`e1c5d8ded6ef016c600c7a90bbe805a3ed7b5cb47ce41935e846755630141814`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cortex_claude-0.2.0-py3-none-any.whl:

Publisher: publish.yml on rafaelaugustos/cortex-claude

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cortex_claude-0.2.0-py3-none-any.whl
- Subject digest: fdb2a56499efbb03d476b9d9ae1cc18344ef01ea6f37c06e770aeea1f147d388
- Sigstore transparency entry: 1376513813
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: rafaelaugustos/cortex-claude@6538a2f4cf842bc00b89b5bbfe6687e43e4486b0
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/rafaelaugustos
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6538a2f4cf842bc00b89b5bbfe6687e43e4486b0
- Trigger Event: push

cortex-claude 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Memory that remembers what matters. Built for Claude Code.

The Problem

Key Features

Quick Start

Install

Configure Claude Code

Use

Tools

Recall Depth Modes

How It Works

Fact Extraction

Graph Traversal

Decay

Hybrid Search

Cortex vs Traditional Memory (claude-mem, etc.)

Where Cortex wins

Where claude-mem wins

The fundamental difference

Benchmarks

Configuration

Development

Architecture

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance