Open-source local-first context compression and token reduction pipeline for Claude Code with hybrid retrieval (BM25 + vectors), reranking, and AST-aware chunking.
Project description
Token Reducer
Cut Claude API costs by 90%+ with intelligent context compression
The open-source alternative to expensive context management tools.
The Problem
Every time you use Claude with a large codebase, you're paying for thousands of tokens that aren't relevant to your query. Most context management tools either:
- Send everything (expensive)
- Truncate blindly (loses important context)
- Require heavy Language Servers (slow, resource-intensive)
The Solution
Token Reducer is a local-first, intelligent context compression pipeline that:
- Reduces tokens by 90-98% while preserving semantic relevance
- Runs entirely locally — no API calls, no data leaving your machine
- Works in milliseconds — faster than Language Server alternatives
- Understands code semantically — AST parsing, not just text matching
┌─────────────────┐ ┌───────────────┐ ┌──────────────────┐
│ Your Codebase │────▶│ Token Reducer │────▶│ Compressed │
│ (50,000 tokens)│ │ Pipeline │ │ Context (500t) │
└─────────────────┘ └───────────────┘ └──────────────────┘
│
┌─────────┴─────────┐
│ - AST Chunking │
│ - BM25 + Vector │
│ - TextRank │
│ - Import Graph │
│ - 2-Hop Symbols │
└───────────────────┘
Easy Install
Option 1 — Claude Code /plugin Command (Recommended)
Step 1: Register the marketplace (one-time setup):
/plugin marketplace add Madhan230205/token-reducer
This registers the marketplace as Madhan230205-token-reducer.
Step 2: Install:
/plugin install token-reducer@Madhan230205-token-reducer
For project-scoped install:
/plugin install token-reducer@Madhan230205-token-reducer --scope project
Already ran Step 1 before? Just run
/plugin install token-reducer@Madhan230205-token-reducer— no need to add the marketplace again.
Option 2 — Git Clone (Manual)
# 1. Clone into your Claude plugins folder
git clone https://github.com/Madhan230205/token-reducer.git ~/.claude/plugins/token-reducer
# 2. Install dependencies (optional but recommended for best results)
pip install -r ~/.claude/plugins/token-reducer/requirements-optional.txt
Windows users: Replace
~/.claude/plugins/with%USERPROFILE%\.claude\plugins\
Then open ~/.claude/settings.json and add:
{
"plugins": ["~/.claude/plugins/token-reducer"]
}
Restart Claude Code. Done.
What requirements-optional.txt installs:
| Package | Purpose |
|---|---|
sentence-transformers |
Neural embeddings for smarter retrieval |
hnswlib / faiss-cpu |
Fast approximate nearest-neighbor search |
tree-sitter + language grammars |
AST-based code chunking (Python, JS, TS, Go, Rust, Java, C/C++, Ruby) |
If you skip this step, Token Reducer still works using hash embeddings and regex chunking — no ML libraries required.
Option 3 — Zero-Dependency Quick Start
No pip, no ML libs — runs immediately after cloning:
git clone https://github.com/Madhan230205/token-reducer.git
cd token-reducer
python scripts/context_pipeline.py run \
--inputs ./src \
--query "Find auth logic" \
--embedding-backend hash \
--db .cache/index.db
Features
Core Pipeline
- Hybrid Retrieval — BM25 + semantic vector search with intelligent fallback
- AST-Based Chunking — Tree-sitter parsing for Python, TypeScript, Go, Rust, Java, and more
- TextRank Compression — Graph-based sentence scoring for intelligent summarization
- Sub-100ms Queries — SQLite FTS5 + HNSW indexes for instant results
- Local-First — Everything runs on your machine, no external APIs
LSP-Killer Features
- Import Graph — Automatically maps file dependencies without Language Server
- 2-Hop Symbol Expansion — Auto "go-to-definition" for referenced functions
- Diff Protocol — SEARCH/REPLACE edit format with automatic application
- Semantic Clustering — Groups similar chunks to avoid redundancy
Enterprise Ready
- Fully Configurable — 40+ tunable parameters in
settings.json - Embedding Flexibility — ML models or hash fallback (zero dependencies)
- Query Caching — Intelligent TTL-based caching for repeated queries
- Session Memory — Tracks context across conversation turns
Documentation
How It Works
Query → FTS(BM25) → (Vector fallback if needed) → Merge → Top 5 → Compress
Full pipeline:
PREPROCESS → INDEX → RETRIEVE → RE-RANK → COMPRESS → CONTEXT PACKET
Basic Usage
# Index your codebase
python scripts/context_pipeline.py index --inputs ./src --db .cache/index.db
# Query with compression
python scripts/context_pipeline.py query \
--query "How does authentication work?" \
--db .cache/index.db \
--json
# One-shot: index + query
python scripts/context_pipeline.py run \
--inputs ./src \
--query "Find the database connection logic" \
--db .cache/index.db
Configuration
All settings in settings.json:
{
"tokenReducer": {
"chunkSizeWords": 220,
"embeddingModel": "jinaai/jina-embeddings-v2-base-code",
"hybridMode": "fallback",
"astChunkingEnabled": true,
"textRankEnabled": true,
"lspFeatures": {
"importGraphEnabled": true,
"twoHopExpansionEnabled": true
}
}
}
Full Configuration Reference
| Setting | Default | Description |
|---|---|---|
chunkSizeWords |
220 | Target words per chunk |
embeddingBackend |
"ml" | "ml" for neural, "hash" for zero-dep |
embeddingModel |
jina-v2-code | Code-optimized embeddings |
hybridMode |
"fallback" | "fallback" or "always" for vector |
astChunkingEnabled |
true | Use tree-sitter AST parsing |
textRankEnabled |
true | Graph-based sentence scoring |
importGraphEnabled |
true | Track file dependencies |
twoHopExpansionEnabled |
true | Auto-expand referenced symbols |
compressionWordBudget |
350 | Max words in compressed output |
Zero-Dependency Mode
Run without any ML libraries:
python scripts/context_pipeline.py run \
--inputs ./src \
--query "Find auth logic" \
--embedding-backend hash \
--db .cache/index.db
Apply Code Edits
python scripts/apply_diff.py --input claude_response.txt --dir ./src
python scripts/apply_diff.py --input response.txt --dry-run
Architecture
Technology Stack
- Storage: SQLite with FTS5 + custom embeddings table
- Chunking: Tree-sitter AST parsing with regex fallback
- Embeddings: Jina Code v2 (or zero-dependency hash embeddings)
- ANN Search: HNSW via hnswlib (with FAISS fallback)
- Compression: TextRank + query-relevance scoring
Repository Structure
token-reducer/
├── .claude-plugin/plugin.json
├── .mcp.json
├── .env.example
├── settings.json
├── requirements-optional.txt
├── scripts/
├── hooks/
├── commands/
├── agents/
├── skills/
└── evals/
Contributing
If anyone is interested in contributing, this project is open to contributions. Please see contribute.md for contribution guidelines.
git clone https://github.com/Madhan230205/token-reducer.git
cd token-reducer
pip install -e ".[dev]"
python scripts/context_pipeline.py self-test
License
MIT License — see LICENSE for details.
Acknowledgments
- Tree-sitter for AST parsing
- Sentence Transformers for embeddings
- SQLite FTS5 for blazing-fast text search
- hnswlib for approximate nearest neighbors
Star this repo if Token Reducer saves you money!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file claude_token_reducer-1.4.0.tar.gz.
File metadata
- Download URL: claude_token_reducer-1.4.0.tar.gz
- Upload date:
- Size: 43.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f49a56b79a604e6ac0397f667735cbf8b9388b7124fa30a96be7102bc0569045
|
|
| MD5 |
94f139ec1ce7c3b12d0dcfb31481eb9e
|
|
| BLAKE2b-256 |
3561e333808affab23f600d698a7e3dd0c258f26b0dfdd87abe5a2170da425bf
|
Provenance
The following attestation bundles were made for claude_token_reducer-1.4.0.tar.gz:
Publisher:
publish.yml on Madhan230205/token-reducer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
claude_token_reducer-1.4.0.tar.gz -
Subject digest:
f49a56b79a604e6ac0397f667735cbf8b9388b7124fa30a96be7102bc0569045 - Sigstore transparency entry: 1228548639
- Sigstore integration time:
-
Permalink:
Madhan230205/token-reducer@d403d3c171fd4ed44a003859f4a138ce529da6d2 -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/Madhan230205
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d403d3c171fd4ed44a003859f4a138ce529da6d2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file claude_token_reducer-1.4.0-py3-none-any.whl.
File metadata
- Download URL: claude_token_reducer-1.4.0-py3-none-any.whl
- Upload date:
- Size: 47.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0904efd9f3eb7545dc04194404223b259498a6e5d6c6781b8b545a12b397ded1
|
|
| MD5 |
d84aebdef0a18167eafead3e32d897fb
|
|
| BLAKE2b-256 |
f450b8af928f176385a690aaeb6fb61443c8cbce10be9091ecbfa34ee4a32c9f
|
Provenance
The following attestation bundles were made for claude_token_reducer-1.4.0-py3-none-any.whl:
Publisher:
publish.yml on Madhan230205/token-reducer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
claude_token_reducer-1.4.0-py3-none-any.whl -
Subject digest:
0904efd9f3eb7545dc04194404223b259498a6e5d6c6781b8b545a12b397ded1 - Sigstore transparency entry: 1228548643
- Sigstore integration time:
-
Permalink:
Madhan230205/token-reducer@d403d3c171fd4ed44a003859f4a138ce529da6d2 -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/Madhan230205
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d403d3c171fd4ed44a003859f4a138ce529da6d2 -
Trigger Event:
push
-
Statement type: