Skip to main content

Open-source local-first context compression and token reduction pipeline for Claude Code with hybrid retrieval (BM25 + vectors), reranking, and AST-aware chunking.

Project description

Token Reducer

Cut Claude API costs by 90%+ with intelligent context compression

Claude Code Plugin License: MIT Release Python 3.11+ SQLite

The open-source alternative to expensive context management tools.

Easy InstallFeaturesDocumentationContributing


The Problem

Every time you use Claude with a large codebase, you're paying for thousands of tokens that aren't relevant to your query. Most context management tools either:

  • Send everything (expensive)
  • Truncate blindly (loses important context)
  • Require heavy Language Servers (slow, resource-intensive)

The Solution

Token Reducer is a local-first, intelligent context compression pipeline that:

  • Reduces tokens by 90-98% while preserving semantic relevance
  • Runs entirely locally — no API calls, no data leaving your machine
  • Works in milliseconds — faster than Language Server alternatives
  • Understands code semantically — AST parsing, not just text matching
┌─────────────────┐     ┌───────────────┐     ┌──────────────────┐
│  Your Codebase  │────▶│ Token Reducer │────▶│  Compressed      │
│  (50,000 tokens)│     │   Pipeline    │     │  Context (500t)  │
└─────────────────┘     └───────────────┘     └──────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │  - AST Chunking   │
                    │  - BM25 + Vector  │
                    │  - TextRank       │
                    │  - Import Graph   │
                    │  - 2-Hop Symbols  │
                    └───────────────────┘

Easy Install

Option 1 — Claude Code /plugin Command (Recommended)

Step 1: Register the marketplace (one-time setup):

/plugin marketplace add Madhan230205/token-reducer

This registers the marketplace as Madhan230205-token-reducer.

Step 2: Install:

/plugin install token-reducer@Madhan230205-token-reducer

For project-scoped install:

/plugin install token-reducer@Madhan230205-token-reducer --scope project

Already ran Step 1 before? Just run /plugin install token-reducer@Madhan230205-token-reducer — no need to add the marketplace again.


Option 2 — Git Clone (Manual)

# 1. Clone into your Claude plugins folder
git clone https://github.com/Madhan230205/token-reducer.git ~/.claude/plugins/token-reducer

# 2. Install dependencies (optional but recommended for best results)
pip install -r ~/.claude/plugins/token-reducer/requirements-optional.txt

Windows users: Replace ~/.claude/plugins/ with %USERPROFILE%\.claude\plugins\

Then open ~/.claude/settings.json and add:

{
  "plugins": ["~/.claude/plugins/token-reducer"]
}

Restart Claude Code. Done.


What requirements-optional.txt installs:

Package Purpose
sentence-transformers Neural embeddings for smarter retrieval
hnswlib / faiss-cpu Fast approximate nearest-neighbor search
tree-sitter + language grammars AST-based code chunking (Python, JS, TS, Go, Rust, Java, C/C++, Ruby)

If you skip this step, Token Reducer still works using hash embeddings and regex chunking — no ML libraries required.


Option 3 — Zero-Dependency Quick Start

No pip, no ML libs — runs immediately after cloning:

git clone https://github.com/Madhan230205/token-reducer.git
cd token-reducer
python scripts/context_pipeline.py run \
  --inputs ./src \
  --query "Find auth logic" \
  --embedding-backend hash \
  --db .cache/index.db

Features

Core Pipeline

  • Hybrid Retrieval — BM25 + semantic vector search with intelligent fallback
  • AST-Based Chunking — Tree-sitter parsing for Python, TypeScript, Go, Rust, Java, and more
  • TextRank Compression — Graph-based sentence scoring for intelligent summarization
  • Sub-100ms Queries — SQLite FTS5 + HNSW indexes for instant results
  • Local-First — Everything runs on your machine, no external APIs

LSP-Killer Features

  • Import Graph — Automatically maps file dependencies without Language Server
  • 2-Hop Symbol Expansion — Auto "go-to-definition" for referenced functions
  • Diff Protocol — SEARCH/REPLACE edit format with automatic application
  • Semantic Clustering — Groups similar chunks to avoid redundancy

Enterprise Ready

  • Fully Configurable — 40+ tunable parameters in settings.json
  • Embedding Flexibility — ML models or hash fallback (zero dependencies)
  • Query Caching — Intelligent TTL-based caching for repeated queries
  • Session Memory — Tracks context across conversation turns

Documentation

How It Works

Query → FTS(BM25) → (Vector fallback if needed) → Merge → Top 5 → Compress

Full pipeline:

PREPROCESS → INDEX → RETRIEVE → RE-RANK → COMPRESS → CONTEXT PACKET

Basic Usage

# Index your codebase
python scripts/context_pipeline.py index --inputs ./src --db .cache/index.db

# Query with compression
python scripts/context_pipeline.py query \
  --query "How does authentication work?" \
  --db .cache/index.db \
  --json

# One-shot: index + query
python scripts/context_pipeline.py run \
  --inputs ./src \
  --query "Find the database connection logic" \
  --db .cache/index.db

Configuration

All settings in settings.json:

{
  "tokenReducer": {
    "chunkSizeWords": 220,
    "embeddingModel": "jinaai/jina-embeddings-v2-base-code",
    "hybridMode": "fallback",
    "astChunkingEnabled": true,
    "textRankEnabled": true,
    "lspFeatures": {
      "importGraphEnabled": true,
      "twoHopExpansionEnabled": true
    }
  }
}
Full Configuration Reference
Setting Default Description
chunkSizeWords 220 Target words per chunk
embeddingBackend "ml" "ml" for neural, "hash" for zero-dep
embeddingModel jina-v2-code Code-optimized embeddings
hybridMode "fallback" "fallback" or "always" for vector
astChunkingEnabled true Use tree-sitter AST parsing
textRankEnabled true Graph-based sentence scoring
importGraphEnabled true Track file dependencies
twoHopExpansionEnabled true Auto-expand referenced symbols
compressionWordBudget 350 Max words in compressed output

Zero-Dependency Mode

Run without any ML libraries:

python scripts/context_pipeline.py run \
  --inputs ./src \
  --query "Find auth logic" \
  --embedding-backend hash \
  --db .cache/index.db

Apply Code Edits

python scripts/apply_diff.py --input claude_response.txt --dir ./src
python scripts/apply_diff.py --input response.txt --dry-run

Architecture

Technology Stack

  • Storage: SQLite with FTS5 + custom embeddings table
  • Chunking: Tree-sitter AST parsing with regex fallback
  • Embeddings: Jina Code v2 (or zero-dependency hash embeddings)
  • ANN Search: HNSW via hnswlib (with FAISS fallback)
  • Compression: TextRank + query-relevance scoring

Repository Structure

token-reducer/
├── .claude-plugin/plugin.json
├── .mcp.json
├── .env.example
├── settings.json
├── requirements-optional.txt
├── scripts/
├── hooks/
├── commands/
├── agents/
├── skills/
└── evals/

Contributing

If anyone is interested in contributing, this project is open to contributions. Please see contribute.md for contribution guidelines.

git clone https://github.com/Madhan230205/token-reducer.git
cd token-reducer
pip install -e ".[dev]"
python scripts/context_pipeline.py self-test

License

MIT License — see LICENSE for details.


Acknowledgments


Star this repo if Token Reducer saves you money!

Report BugRequest FeatureDiscussions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_token_reducer-1.4.0.tar.gz (43.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_token_reducer-1.4.0-py3-none-any.whl (47.1 kB view details)

Uploaded Python 3

File details

Details for the file claude_token_reducer-1.4.0.tar.gz.

File metadata

  • Download URL: claude_token_reducer-1.4.0.tar.gz
  • Upload date:
  • Size: 43.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for claude_token_reducer-1.4.0.tar.gz
Algorithm Hash digest
SHA256 f49a56b79a604e6ac0397f667735cbf8b9388b7124fa30a96be7102bc0569045
MD5 94f139ec1ce7c3b12d0dcfb31481eb9e
BLAKE2b-256 3561e333808affab23f600d698a7e3dd0c258f26b0dfdd87abe5a2170da425bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_token_reducer-1.4.0.tar.gz:

Publisher: publish.yml on Madhan230205/token-reducer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file claude_token_reducer-1.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for claude_token_reducer-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0904efd9f3eb7545dc04194404223b259498a6e5d6c6781b8b545a12b397ded1
MD5 d84aebdef0a18167eafead3e32d897fb
BLAKE2b-256 f450b8af928f176385a690aaeb6fb61443c8cbce10be9091ecbfa34ee4a32c9f

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_token_reducer-1.4.0-py3-none-any.whl:

Publisher: publish.yml on Madhan230205/token-reducer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page