Skip to main content

Local RAG memory for Claude Code -- reduce prompt tokens by 80%

Project description

CI Python 3.10+ License: MIT codecov

TokenKeeper

Local RAG memory for Claude Code. Reduce prompt token consumption by ~80% on knowledge-heavy projects.

TokenKeeper is an MCP server that indexes your project's documents and code, then exposes semantic search tools to Claude Code. Instead of loading entire files into context, your agents query for only the relevant chunks.

The Problem

On a project with 34 phases of planning docs, a single agent cycle loads 141K tokens (70% of context) just for background knowledge — before it starts working. Quality degrades as context fills up.

The Solution

TokenKeeper replaces "load everything" with "query what's relevant":

Traditional With TokenKeeper
Prompt tokens 141,345 26,959
Context used 70.7% 13.5%
Tokens saved 114,386 (80.9%)

Your agents stay in the high-quality zone of their context window.

How It Works

Your project files
    |
    v
[Indexer] --> Chunks with embeddings --> ChromaDB (persistent vectors)
                                              |
                                              v
Claude Code agent --> search_knowledge("topic") --> Top-k relevant chunks
  • Hybrid search — semantic similarity (vector) + keyword matching (BM25), merged via Reciprocal Rank Fusion
  • Local-first — Ollama for embeddings, ChromaDB for storage. No cloud, no API keys required
  • Auto-indexing — file watcher detects changes and re-indexes automatically
  • Per-project isolation — each project gets its own .rag/ directory

Quick Start

Package name: TokenKeeper is the project brand name. The PyPI package is tokenkeeper:

pip install tokenkeeper

Until published to PyPI, install from source with uv sync.

Prerequisites

  • Python 3.10+
  • Ollama installed and running
  • uv (Python package manager)

Install

git clone https://github.com/admin-sosys/TokenKeeper.git
cd TokenKeeper
uv sync
ollama pull nomic-embed-text

Add to Any Project

Create .mcp.json in your project root:

{
  "mcpServers": {
    "tokenkeeper": {
      "command": "/path/to/TokenKeeper/.venv/bin/python",
      "args": ["-m", "tokenkeeper"],
      "env": {
        "TOKENKEEPER_PROJECT": "${workspaceFolder}"
      }
    }
  }
}

Windows: Use .venv\Scripts\python.exe instead of .venv/bin/python

Start (or restart) Claude Code in that project. TokenKeeper will:

  1. Create a .rag/ directory for index data
  2. Index all markdown, JSON, and code files
  3. Expose 4 MCP tools for search and management

Add .rag/ to your project's .gitignore.

Verify

Ask Claude Code:

Check the indexing status

Then test a search:

Search the knowledge base for "authentication flow and session management"

MCP Tools

Tool Purpose
search_knowledge Hybrid semantic + keyword search across indexed content
indexing_status Check if indexing is complete, in progress, or failed
reindex_documents Trigger manual reindexing (all or specific files)
get_index_stats Index statistics — file count, chunk count, timestamps

search_knowledge Parameters

Param Type Default Description
query string required Natural language search query
top_k int 10 Results to return (1-50)
alpha float 0.5 Hybrid weight: 0.0 = keyword only, 1.0 = semantic only
mode string "hybrid" "hybrid", "semantic", or "keyword"

Configuration

TokenKeeper auto-creates .rag/.rag-config.json on first run:

{
  "content_mode": "docs",
  "chunk_size": 1000,
  "overlap": 200,
  "alpha": 0.5,
  "mode": "hybrid",
  "watch_enabled": true,
  "debounce_seconds": 3.0
}
Setting Default Description
content_mode "docs" "docs" (md/json), "code" (source files), or "both"
chunk_size 1000 Characters per chunk (100-10000)
overlap 200 Character overlap between chunks
alpha 0.5 Hybrid search weight
mode "hybrid" Search strategy
watch_enabled true Auto-reindex on file changes

Architecture

TokenKeeper/
  src/tokenkeeper/
    server.py          # FastMCP server + lifespan
    indexer.py         # Discovery -> ingestion -> embedding -> storage
    search.py          # Hybrid search with RRF fusion
    embeddings.py      # Ollama (local) or Google Gemini (cloud)
    storage.py         # ChromaDB persistent client
    bm25_index.py      # BM25 keyword index
    watcher.py         # File system monitoring with debounce
    config.py          # Pydantic configuration
    health.py          # Startup health checks

Stack: Python 3.10+ | FastMCP | ChromaDB 1.5.0 | Ollama | BM25

Embedding Providers

Ollama (Default, Local)

  • Model: nomic-embed-text (768 dimensions)
  • No API key needed
  • Runs on CPU (no GPU required)

Google Gemini (Optional, Cloud)

  • Model: gemini-embedding-001 (3072 dimensions)
  • Requires GOOGLE_API_KEY environment variable
  • Higher quality embeddings, but requires internet

File Types Indexed

Mode Extensions
"docs" .md, .mdx, .json
"code" .ts, .tsx, .js, .jsx, .py, .mjs, .go, .rs, .java, .rb, .c, .cpp, .h
"both" All of the above

Always excluded: node_modules/, .git/, .next/, __pycache__/, .rag/, dist/, build/

Performance

Metric Value
First index (500 files) ~3-5 minutes
Subsequent startups ~5 seconds (cached)
Search latency ~150ms per query
Storage ~100-200 MB per 2000-file project

Testing

# All tests (skip Ollama-dependent if not running)
uv run pytest tests/ -v --tb=short

# Token savings benchmark
uv run pytest tests/test_practical_token_savings.py -v -s

# Agent comparison (RAG vs traditional)
uv run pytest tests/test_agent_comparison.py -v -s

Troubleshooting

Issue Fix
"Ollama connection refused" Run ollama serve to start the server
"nomic-embed-text not found" Run ollama pull nomic-embed-text
Claude Code doesn't show RAG tools Ensure .mcp.json is in project root, restart Claude Code
0 chunks indexed Check TOKENKEEPER_PROJECT env var points to your project root
Slow first index Normal — subsequent starts load cached ChromaDB in ~5 seconds
Search returns irrelevant results Try mode: "keyword" or lower alpha to 0.3

Docs

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenkeeper-0.1.0.tar.gz (306.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenkeeper-0.1.0-py3-none-any.whl (54.3 kB view details)

Uploaded Python 3

File details

Details for the file tokenkeeper-0.1.0.tar.gz.

File metadata

  • Download URL: tokenkeeper-0.1.0.tar.gz
  • Upload date:
  • Size: 306.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenkeeper-0.1.0.tar.gz
Algorithm Hash digest
SHA256 81f1264f2016084a275465267f41c355b75fb5bb81dc0bfccb15efb8d6ce8341
MD5 6978e31083d1bbe75d397a46aa947129
BLAKE2b-256 9de4cc7d345a7a841d39b97bfeca33369aae089db193b1d97957ff27e46146a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenkeeper-0.1.0.tar.gz:

Publisher: publish.yml on admin-sosys/TokenKeeper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenkeeper-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tokenkeeper-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 54.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenkeeper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 96cba667a80bf324cdf1608ca7db0eb8079d9de88a33acc4cedcfd1b233623d9
MD5 bbf7b19b6cac67da5002be7af0ce3518
BLAKE2b-256 b68151a07f21e53ecd9bd25a8ba25603b483f729ae4a4d7c3f583ab9ac1d4f3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenkeeper-0.1.0-py3-none-any.whl:

Publisher: publish.yml on admin-sosys/TokenKeeper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page