Local RAG memory for Claude Code -- reduce prompt tokens by 80%

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

astrium

These details have not been verified by PyPI

Project description

TokenKeeper

Local RAG memory for Claude Code. Reduce prompt token consumption by ~80% on knowledge-heavy projects.

TokenKeeper is an MCP server that indexes your project's documents and code, then exposes semantic search tools to Claude Code. Instead of loading entire files into context, your agents query for only the relevant chunks.

The Problem

On a project with 34 phases of planning docs, a single agent cycle loads 141K tokens (70% of context) just for background knowledge — before it starts working. Quality degrades as context fills up.

The Solution

TokenKeeper replaces "load everything" with "query what's relevant":

	Traditional	With TokenKeeper
Prompt tokens	141,345	26,959
Context used	70.7%	13.5%
Tokens saved	—	114,386 (80.9%)

Your agents stay in the high-quality zone of their context window.

How It Works

Your project files
    |
    v
[Indexer] --> Chunks with embeddings --> ChromaDB (persistent vectors)
                                              |
                                              v
Claude Code agent --> search_knowledge("topic") --> Top-k relevant chunks

Hybrid search — semantic similarity (vector) + keyword matching (BM25), merged via Reciprocal Rank Fusion
Local-first — Ollama for embeddings, ChromaDB for storage. No cloud, no API keys required
Auto-indexing — file watcher detects changes and re-indexes automatically
Per-project isolation — each project gets its own .rag/ directory

Quick Start

Package name: TokenKeeper is the project brand name. The PyPI package is tokenkeeper:
pip install tokenkeeper
Until published to PyPI, install from source with uv sync.

Prerequisites

Python 3.10+
Ollama installed and running
uv (Python package manager)

Install

git clone https://github.com/admin-sosys/TokenKeeper.git
cd TokenKeeper
uv sync
ollama pull nomic-embed-text

Add to Any Project

Create .mcp.json in your project root:

{
  "mcpServers": {
    "tokenkeeper": {
      "command": "/path/to/TokenKeeper/.venv/bin/python",
      "args": ["-m", "tokenkeeper"],
      "env": {
        "TOKENKEEPER_PROJECT": "${workspaceFolder}"
      }
    }
  }
}

Windows: Use .venv\Scripts\python.exe instead of .venv/bin/python

Start (or restart) Claude Code in that project. TokenKeeper will:

Create a .rag/ directory for index data
Index all markdown, JSON, and code files
Expose 4 MCP tools for search and management

Add .rag/ to your project's .gitignore.

Verify

Ask Claude Code:

Check the indexing status

Then test a search:

Search the knowledge base for "authentication flow and session management"

MCP Tools

Tool	Purpose
`search_knowledge`	Hybrid semantic + keyword search across indexed content
`indexing_status`	Check if indexing is complete, in progress, or failed
`reindex_documents`	Trigger manual reindexing (all or specific files)
`get_index_stats`	Index statistics — file count, chunk count, timestamps

search_knowledge Parameters

Param	Type	Default	Description
`query`	string	required	Natural language search query
`top_k`	int	10	Results to return (1-50)
`alpha`	float	0.5	Hybrid weight: 0.0 = keyword only, 1.0 = semantic only
`mode`	string	"hybrid"	`"hybrid"`, `"semantic"`, or `"keyword"`

Configuration

TokenKeeper auto-creates .rag/.rag-config.json on first run:

{
  "content_mode": "docs",
  "chunk_size": 1000,
  "overlap": 200,
  "alpha": 0.5,
  "mode": "hybrid",
  "watch_enabled": true,
  "debounce_seconds": 3.0
}

Setting	Default	Description
`content_mode`	`"docs"`	`"docs"` (md/json), `"code"` (source files), or `"both"`
`chunk_size`	`1000`	Characters per chunk (100-10000)
`overlap`	`200`	Character overlap between chunks
`alpha`	`0.5`	Hybrid search weight
`mode`	`"hybrid"`	Search strategy
`watch_enabled`	`true`	Auto-reindex on file changes

Architecture

TokenKeeper/
  src/tokenkeeper/
    server.py          # FastMCP server + lifespan
    indexer.py         # Discovery -> ingestion -> embedding -> storage
    search.py          # Hybrid search with RRF fusion
    embeddings.py      # Ollama (local) or Google Gemini (cloud)
    storage.py         # ChromaDB persistent client
    bm25_index.py      # BM25 keyword index
    watcher.py         # File system monitoring with debounce
    config.py          # Pydantic configuration
    health.py          # Startup health checks

Stack: Python 3.10+ | FastMCP | ChromaDB 1.5.0 | Ollama | BM25

Embedding Providers

Ollama (Default, Local)

Model: nomic-embed-text (768 dimensions)
No API key needed
Runs on CPU (no GPU required)

Google Gemini (Optional, Cloud)

Model: gemini-embedding-001 (3072 dimensions)
Requires GOOGLE_API_KEY environment variable
Higher quality embeddings, but requires internet

File Types Indexed

Mode	Extensions
`"docs"`	`.md`, `.mdx`, `.json`
`"code"`	`.ts`, `.tsx`, `.js`, `.jsx`, `.py`, `.mjs`, `.go`, `.rs`, `.java`, `.rb`, `.c`, `.cpp`, `.h`
`"both"`	All of the above

Always excluded: node_modules/, .git/, .next/, __pycache__/, .rag/, dist/, build/

Performance

Metric	Value
First index (500 files)	~3-5 minutes
Subsequent startups	~5 seconds (cached)
Search latency	~150ms per query
Storage	~100-200 MB per 2000-file project

Testing

# All tests (skip Ollama-dependent if not running)
uv run pytest tests/ -v --tb=short

# Token savings benchmark
uv run pytest tests/test_practical_token_savings.py -v -s

# Agent comparison (RAG vs traditional)
uv run pytest tests/test_agent_comparison.py -v -s

Troubleshooting

Issue	Fix
"Ollama connection refused"	Run `ollama serve` to start the server
"nomic-embed-text not found"	Run `ollama pull nomic-embed-text`
Claude Code doesn't show RAG tools	Ensure `.mcp.json` is in project root, restart Claude Code
0 chunks indexed	Check `TOKENKEEPER_PROJECT` env var points to your project root
Slow first index	Normal — subsequent starts load cached ChromaDB in ~5 seconds
Search returns irrelevant results	Try `mode: "keyword"` or lower `alpha` to 0.3

Docs

QUICKSTART.md — Setup, toggling, A/B testing, GSD workflow integration
IMPLEMENTATION-GUIDE.md — Architecture deep dive, cost analysis, integration patterns

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

astrium

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Feb 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenkeeper-0.1.0.tar.gz (306.1 kB view details)

Uploaded Feb 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenkeeper-0.1.0-py3-none-any.whl (54.3 kB view details)

Uploaded Feb 21, 2026 Python 3

File details

Details for the file tokenkeeper-0.1.0.tar.gz.

File metadata

Download URL: tokenkeeper-0.1.0.tar.gz
Upload date: Feb 21, 2026
Size: 306.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenkeeper-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`81f1264f2016084a275465267f41c355b75fb5bb81dc0bfccb15efb8d6ce8341`
MD5	`6978e31083d1bbe75d397a46aa947129`
BLAKE2b-256	`9de4cc7d345a7a841d39b97bfeca33369aae089db193b1d97957ff27e46146a4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenkeeper-0.1.0.tar.gz:

Publisher: publish.yml on admin-sosys/TokenKeeper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenkeeper-0.1.0.tar.gz
- Subject digest: 81f1264f2016084a275465267f41c355b75fb5bb81dc0bfccb15efb8d6ce8341
- Sigstore transparency entry: 975662026
- Sigstore integration time: Feb 21, 2026
Source repository:
- Permalink: admin-sosys/TokenKeeper@78f7fae7e88e375eada7d16953ae84c61b979342
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/admin-sosys
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@78f7fae7e88e375eada7d16953ae84c61b979342
- Trigger Event: push

File details

Details for the file tokenkeeper-0.1.0-py3-none-any.whl.

File metadata

Download URL: tokenkeeper-0.1.0-py3-none-any.whl
Upload date: Feb 21, 2026
Size: 54.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenkeeper-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`96cba667a80bf324cdf1608ca7db0eb8079d9de88a33acc4cedcfd1b233623d9`
MD5	`bbf7b19b6cac67da5002be7af0ce3518`
BLAKE2b-256	`b68151a07f21e53ecd9bd25a8ba25603b483f729ae4a4d7c3f583ab9ac1d4f3e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenkeeper-0.1.0-py3-none-any.whl:

Publisher: publish.yml on admin-sosys/TokenKeeper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenkeeper-0.1.0-py3-none-any.whl
- Subject digest: 96cba667a80bf324cdf1608ca7db0eb8079d9de88a33acc4cedcfd1b233623d9
- Sigstore transparency entry: 975662027
- Sigstore integration time: Feb 21, 2026
Source repository:
- Permalink: admin-sosys/TokenKeeper@78f7fae7e88e375eada7d16953ae84c61b979342
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/admin-sosys
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@78f7fae7e88e375eada7d16953ae84c61b979342
- Trigger Event: push

tokenkeeper 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

TokenKeeper

The Problem

The Solution

How It Works

Quick Start

Prerequisites

Install

Add to Any Project

Verify

MCP Tools

search_knowledge Parameters

Configuration

Architecture

Embedding Providers

Ollama (Default, Local)

Google Gemini (Optional, Cloud)

File Types Indexed

Performance

Testing

Troubleshooting

Docs

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance