Local RAG memory for Claude Code -- reduce prompt tokens by 80%
Project description
TokenKeeper
Local RAG memory for Claude Code. Reduce prompt token consumption by ~80% on knowledge-heavy projects.
TokenKeeper is an MCP server that indexes your project's documents and code, then exposes semantic search tools to Claude Code. Instead of loading entire files into context, your agents query for only the relevant chunks.
The Problem
On a project with 34 phases of planning docs, a single agent cycle loads 141K tokens (70% of context) just for background knowledge — before it starts working. Quality degrades as context fills up.
The Solution
TokenKeeper replaces "load everything" with "query what's relevant":
| Traditional | With TokenKeeper | |
|---|---|---|
| Prompt tokens | 141,345 | 26,959 |
| Context used | 70.7% | 13.5% |
| Tokens saved | — | 114,386 (80.9%) |
Your agents stay in the high-quality zone of their context window.
How It Works
Your project files
|
v
[Indexer] --> Chunks with embeddings --> ChromaDB (persistent vectors)
|
v
Claude Code agent --> search_knowledge("topic") --> Top-k relevant chunks
- Hybrid search — semantic similarity (vector) + keyword matching (BM25), merged via Reciprocal Rank Fusion
- Local-first — Ollama for embeddings, ChromaDB for storage. No cloud, no API keys required
- Auto-indexing — file watcher detects changes and re-indexes automatically
- Per-project isolation — each project gets its own
.rag/directory
Quick Start
Package name: TokenKeeper is the project brand name. The PyPI package is
tokenkeeper:pip install tokenkeeperUntil published to PyPI, install from source with
uv sync.
Prerequisites
Install
git clone https://github.com/admin-sosys/TokenKeeper.git
cd TokenKeeper
uv sync
ollama pull nomic-embed-text
Add to Any Project
Create .mcp.json in your project root:
{
"mcpServers": {
"tokenkeeper": {
"command": "/path/to/TokenKeeper/.venv/bin/python",
"args": ["-m", "tokenkeeper"],
"env": {
"TOKENKEEPER_PROJECT": "${workspaceFolder}"
}
}
}
}
Windows: Use
.venv\Scripts\python.exeinstead of.venv/bin/python
Start (or restart) Claude Code in that project. TokenKeeper will:
- Create a
.rag/directory for index data - Index all markdown, JSON, and code files
- Expose 4 MCP tools for search and management
Add .rag/ to your project's .gitignore.
Verify
Ask Claude Code:
Check the indexing status
Then test a search:
Search the knowledge base for "authentication flow and session management"
MCP Tools
| Tool | Purpose |
|---|---|
search_knowledge |
Hybrid semantic + keyword search across indexed content |
indexing_status |
Check if indexing is complete, in progress, or failed |
reindex_documents |
Trigger manual reindexing (all or specific files) |
get_index_stats |
Index statistics — file count, chunk count, timestamps |
search_knowledge Parameters
| Param | Type | Default | Description |
|---|---|---|---|
query |
string | required | Natural language search query |
top_k |
int | 10 | Results to return (1-50) |
alpha |
float | 0.5 | Hybrid weight: 0.0 = keyword only, 1.0 = semantic only |
mode |
string | "hybrid" | "hybrid", "semantic", or "keyword" |
Configuration
TokenKeeper auto-creates .rag/.rag-config.json on first run:
{
"content_mode": "docs",
"chunk_size": 1000,
"overlap": 200,
"alpha": 0.5,
"mode": "hybrid",
"watch_enabled": true,
"debounce_seconds": 3.0
}
| Setting | Default | Description |
|---|---|---|
content_mode |
"docs" |
"docs" (md/json), "code" (source files), or "both" |
chunk_size |
1000 |
Characters per chunk (100-10000) |
overlap |
200 |
Character overlap between chunks |
alpha |
0.5 |
Hybrid search weight |
mode |
"hybrid" |
Search strategy |
watch_enabled |
true |
Auto-reindex on file changes |
Architecture
TokenKeeper/
src/tokenkeeper/
server.py # FastMCP server + lifespan
indexer.py # Discovery -> ingestion -> embedding -> storage
search.py # Hybrid search with RRF fusion
embeddings.py # Ollama (local) or Google Gemini (cloud)
storage.py # ChromaDB persistent client
bm25_index.py # BM25 keyword index
watcher.py # File system monitoring with debounce
config.py # Pydantic configuration
health.py # Startup health checks
Stack: Python 3.10+ | FastMCP | ChromaDB 1.5.0 | Ollama | BM25
Embedding Providers
Ollama (Default, Local)
- Model:
nomic-embed-text(768 dimensions) - No API key needed
- Runs on CPU (no GPU required)
Google Gemini (Optional, Cloud)
- Model:
gemini-embedding-001(3072 dimensions) - Requires
GOOGLE_API_KEYenvironment variable - Higher quality embeddings, but requires internet
File Types Indexed
| Mode | Extensions |
|---|---|
"docs" |
.md, .mdx, .json |
"code" |
.ts, .tsx, .js, .jsx, .py, .mjs, .go, .rs, .java, .rb, .c, .cpp, .h |
"both" |
All of the above |
Always excluded: node_modules/, .git/, .next/, __pycache__/, .rag/, dist/, build/
Performance
| Metric | Value |
|---|---|
| First index (500 files) | ~3-5 minutes |
| Subsequent startups | ~5 seconds (cached) |
| Search latency | ~150ms per query |
| Storage | ~100-200 MB per 2000-file project |
Testing
# All tests (skip Ollama-dependent if not running)
uv run pytest tests/ -v --tb=short
# Token savings benchmark
uv run pytest tests/test_practical_token_savings.py -v -s
# Agent comparison (RAG vs traditional)
uv run pytest tests/test_agent_comparison.py -v -s
Troubleshooting
| Issue | Fix |
|---|---|
| "Ollama connection refused" | Run ollama serve to start the server |
| "nomic-embed-text not found" | Run ollama pull nomic-embed-text |
| Claude Code doesn't show RAG tools | Ensure .mcp.json is in project root, restart Claude Code |
| 0 chunks indexed | Check TOKENKEEPER_PROJECT env var points to your project root |
| Slow first index | Normal — subsequent starts load cached ChromaDB in ~5 seconds |
| Search returns irrelevant results | Try mode: "keyword" or lower alpha to 0.3 |
Docs
- QUICKSTART.md — Setup, toggling, A/B testing, GSD workflow integration
- IMPLEMENTATION-GUIDE.md — Architecture deep dive, cost analysis, integration patterns
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokenkeeper-0.1.0.tar.gz.
File metadata
- Download URL: tokenkeeper-0.1.0.tar.gz
- Upload date:
- Size: 306.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81f1264f2016084a275465267f41c355b75fb5bb81dc0bfccb15efb8d6ce8341
|
|
| MD5 |
6978e31083d1bbe75d397a46aa947129
|
|
| BLAKE2b-256 |
9de4cc7d345a7a841d39b97bfeca33369aae089db193b1d97957ff27e46146a4
|
Provenance
The following attestation bundles were made for tokenkeeper-0.1.0.tar.gz:
Publisher:
publish.yml on admin-sosys/TokenKeeper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokenkeeper-0.1.0.tar.gz -
Subject digest:
81f1264f2016084a275465267f41c355b75fb5bb81dc0bfccb15efb8d6ce8341 - Sigstore transparency entry: 975662026
- Sigstore integration time:
-
Permalink:
admin-sosys/TokenKeeper@78f7fae7e88e375eada7d16953ae84c61b979342 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/admin-sosys
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@78f7fae7e88e375eada7d16953ae84c61b979342 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tokenkeeper-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tokenkeeper-0.1.0-py3-none-any.whl
- Upload date:
- Size: 54.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96cba667a80bf324cdf1608ca7db0eb8079d9de88a33acc4cedcfd1b233623d9
|
|
| MD5 |
bbf7b19b6cac67da5002be7af0ce3518
|
|
| BLAKE2b-256 |
b68151a07f21e53ecd9bd25a8ba25603b483f729ae4a4d7c3f583ab9ac1d4f3e
|
Provenance
The following attestation bundles were made for tokenkeeper-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on admin-sosys/TokenKeeper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokenkeeper-0.1.0-py3-none-any.whl -
Subject digest:
96cba667a80bf324cdf1608ca7db0eb8079d9de88a33acc4cedcfd1b233623d9 - Sigstore transparency entry: 975662027
- Sigstore integration time:
-
Permalink:
admin-sosys/TokenKeeper@78f7fae7e88e375eada7d16953ae84c61b979342 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/admin-sosys
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@78f7fae7e88e375eada7d16953ae84c61b979342 -
Trigger Event:
push
-
Statement type: