Token-saving memory layer for AI coding assistants — local hybrid RAG (BM25 + semantic + reranker) over your project, exposed via MCP.

These details have not been verified by PyPI

Project links

Project description

Tokengram — Hybrid RAG Memory Layer for AI Assistants

Azərbaycanca versiya: README.az.md

Local MCP server that lets Claude Code (and other MCP-capable AI assistants) search your project via hybrid RAG instead of Read-ing whole files. Measured token savings: ~70% on a typical Python project.

Progressive RAM: 93 MB (BM25-only) → 700 MB (semantic) → 900 MB (+ reranker).

What it is

When you ask Claude Code a question about your code, the assistant normally reads whole files into its context window — 5–10K input tokens per query. Tokengram pre-vectorizes your project and returns only the 3–5 most relevant chunks when Claude asks. So:

⚡ ~70% fewer input tokens → cheaper API calls
🔒 Local-only — your code never leaves the machine (local embeddings, local search)
♻️ Auto re-index — saved files are picked up by the MCP server automatically
🎯 Hybrid search — semantic (meaning) + BM25 (keyword) + cross-encoder reranker

When it works

✅ Medium-to-large Python / JS / Go / Rust projects (50+ files)
✅ Contextual questions: "where is this function called from?"
✅ Refactoring (gathering all touch points)

When it doesn't (yet)

❌ Projects with < 20 files — Claude can just read them all
❌ Machines with < 4 GB RAM (embedding + reranker need headroom)
❌ Systems without Python installed

Install (~3 minutes)

Requirements

Python 3.10+
VS Code + Claude Code extension
4 GB RAM (8 GB recommended)
~500 MB disk (for model cache)

Windows

git clone <repo-url> mla-agent
cd mla-agent
./setup.ps1

macOS / Linux

git clone <repo-url> mla-agent
cd mla-agent
./setup.sh

After install

Close VS Code completely and reopen it (so it picks up the new MCP server entry).
In Claude Code, run /mcp — mla-agent should appear as connected.
Ask a question:

"Use mla-agent.search_code — where is authentication handled?"

Claude calls the MCP tool automatically, receives the top-4 matching chunks as context, and answers.

Indexing another project

Each project needs its own index.

Option 1 — run setup again with a project path

./setup.ps1 -ProjectDir "C:\path\to\my-project"

Option 2 — direct command

python main.py index /path/to/my-project --fresh

Note: chroma_db/ currently lives inside the Tokengram folder, so only one project can be indexed at a time. Multi-project support is on the roadmap.

CLI

python main.py index <dir> [--fresh]    # Index a project
python main.py stats                     # Index statistics
python main.py route <dir>               # Smart router suggestion
python main.py reset --yes               # Wipe the index
python main.py watch <dir>               # Auto re-index (Ctrl+C to stop)
python main.py ask "question"            # CLI-only (requires API key)

MCP tools

Claude Code discovers these automatically and calls them on demand:

Tool	What it does
`search_code(query, top_k, mode, file_type)`	3-tier search (see below)
`index_directory(path, fresh)`	Index a project
`index_file(path)`	Re-index a single file
`get_stats()`	Index statistics
`route_project(path)`	Project-size-aware recommendation

`search_code` modes (progressive RAM)

`mode`	RAM delta	First call	Best for
`"fast"` (default)	0 MB	~50 ms	Function / class / file-name lookup, syntactic search
`"semantic"`	+400–700 MB	30–60 s cold	Conceptual queries ("how does auth work?")
`"rerank"`	+200 MB	+30 s first	Maximum ranking quality

Ideally Claude defaults to "fast". If a syntactic result is enough, it stops. For conceptual answers it escalates to "semantic". Reranker is reserved for the highest-stakes picks.

Slash commands

Drop-in actions inside Claude Code (each wraps the back-end Python script of the same name; the VS Code status-bar QuickPick calls them too):

Command	What it does
`/MlaOn`	Enable Read-enforcement (block large indexed files)
`/MlaOff`	Disable enforcement
`/MlaRefreshBar`	Force the status bar to refresh
`/MlaResetBar`	Reset savings history (old history is backed up)
`/MlaArch`	Show the auto-generated architecture doc
`/MlaSync`	Manually flush session log → history
`/MlaReindex`	Full re-index after extension list changes

See SLASH_COMMANDS.md for behaviour details and the 4-source token-tracking model.

Supported file types (36 extensions)

.py .sol .sql .md
.ts .tsx .js .jsx .mjs .cjs
.go .rs .java .kt .cs .cpp .cc .cxx .c .h .hpp
.rb .php .swift .dart
.html .css .scss
.json .yaml .yml .toml
.sh .bash .ps1 .tf

Excluded by default: node_modules/, dist/, build/, target/, .venv/, .next/, … plus lock files (package-lock.json, Cargo.lock, go.sum, …), minified bundles (*.min.js, *.bundle.js), source maps, and any file > 1 MB. All filters live in mla_constants.py.

Localization

User-facing strings are localized. Two languages are bundled out of the box: English (default) and Azerbaijani.

# Linux / macOS
export MLA_LANG=az   # Azerbaijani
export MLA_LANG=en   # English (default)

# Windows PowerShell
$env:MLA_LANG = 'az'

The VS Code extension picks its locale from vscode.env.language (your VS Code display language).

To add another language: copy messages/en.json to messages/<code>.json, translate the values, and add the code to _SUPPORTED in i18n.py. For the extension, add a new entry to MESSAGES and to detectLocale() in vscode-extension/src/i18n.ts.

VS Code status bar

The widget at the bottom-right shows a live X.Xk saved · YY% counter based on the last 24 hours of activity. Hovering reveals a breakdown (search / Grep / Read-block) plus lifetime totals.

Click the widget to open a QuickPick with all 6 actions (toggle, refresh, arch doc, sync, re-index, reset) — no terminal typing needed.

Auto re-index

Two parallel mechanisms feed the same queue (.mla_pending.txt); the MCP server's background worker drains it:

1. Built-in watcher (default; works in every IDE)

The MCP server uses watchdog to watch the project folder. VS Code, Antigravity, Cursor, plain editors — all good. No hooks needed.

Override the watched directory: MLA_WATCH_DIR=/path/to/project
Disable: MLA_DISABLE_WATCHER=1
Debounce: 2 s (rapid saves are coalesced)

2. Claude Code hooks (Claude Code only)

PostToolUse (~50 ms) after every Edit / Write / MultiEdit — queues the file
Stop — drains the queue and updates the architecture doc

Both mechanisms cooperate: the drainer hashes + deduplicates, so no file is re-indexed twice.

FAQ

Do I need an API key?

No. When you use Tokengram via the MCP server inside Claude Code, all reasoning happens in Claude. An API key is only needed for the standalone CLI command python main.py ask.

Why is the first search slow?

The first MCP search_code call takes 10–30 s — the embedding model loads into RAM. Subsequent calls are ~100 ms.

What's the disk footprint?

Embedding model: ~130 MB (bge-small-en-v1.5)
Reranker model: ~90 MB (ms-marco-MiniLM-L-6-v2)
ChromaDB index: ~5–10 KB per file
Total for a 100-file project: ~220 MB.

Something broke?

Check PROBLEMS.md — 14 issues encountered during development and how each was resolved. The test pass record is in TEST_REPORT.md (66/66 green).

Project layout

mla-agent/
├── config.py              # All parameters (re-exports from mla_constants)
├── mla_constants.py       # Single source for extension / dir / file filters
├── i18n.py                # tr() — localized message lookup
├── messages/
│   ├── en.json            # English strings (default)
│   └── az.json            # Azerbaijani strings
├── document_loader.py     # AST + regex + fallback chunking
├── vector_store.py        # ChromaDB wrapper
├── search_engine.py       # Hybrid (semantic + BM25 + reranker)
├── llm_agent.py           # Standalone CLI agent (optional)
├── smart_router.py        # Project-size-aware mode selector
├── file_watcher.py        # Standalone watchdog (optional)
├── main.py                # CLI entry point
├── mcp_server.py          # MCP server for Claude Code
├── hooks/                 # All Stop / PreToolUse / PostToolUse hooks
├── tests/                 # 5 phase suites, 66 cases total
├── vscode-extension/      # Status-bar widget + QuickPick menu
├── .mcp.json              # Claude Code MCP registration
├── .claude/
│   ├── settings.json      # Hook configuration
│   └── commands/          # Slash command definitions
├── setup.ps1, setup.sh    # Install scripts
├── TEST_PLAN.md, TEST_REPORT.md
└── PROBLEMS.md

License

Commercial. See vscode-extension/LICENSE.txt. Feedback: n.ilkin.humbatov@gmail.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokengram-0.1.0-py3-none-any.whl (736.2 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file tokengram-0.1.0-py3-none-any.whl.

File metadata

Download URL: tokengram-0.1.0-py3-none-any.whl
Upload date: May 18, 2026
Size: 736.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for tokengram-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8fa4056ce365b99fe4bcebcf82864214df9a906f2a3e21a2a460592ad4ed5ff5`
MD5	`c34658714ae6d5a5a5f05e5ba1f7c572`
BLAKE2b-256	`44bd4909875205a2c3288091efb5b328ebc2670c4ce999ef1f278565e239da5e`

See more details on using hashes here.

tokengram 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tokengram — Hybrid RAG Memory Layer for AI Assistants

What it is

When it works

When it doesn't (yet)

Install (~3 minutes)

Requirements

Windows

macOS / Linux

After install

Indexing another project

Option 1 — run setup again with a project path

Option 2 — direct command

CLI

MCP tools

search_code modes (progressive RAM)

Slash commands

Supported file types (36 extensions)

Localization

VS Code status bar

Auto re-index

1. Built-in watcher (default; works in every IDE)

2. Claude Code hooks (Claude Code only)

FAQ

Do I need an API key?

Why is the first search slow?

What's the disk footprint?

Something broke?

Project layout

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

`search_code` modes (progressive RAM)