High-performance MCP server for minimizing LLM token usage and API costs via structural code analysis and precision chunking
Project description
🚀 Token Optimizer MCP
A high-performance Model Context Protocol (MCP) server that slashes LLM token consumption and API overhead. Using structural code summarization, intelligent file chunking, and semantic vector search, it ensures your AI agent receives high-fidelity context with minimal payload—maximizing both speed and cost-efficiency.
📖 Introduction
Working with Agentic LLMs and large codebases usually means burning through massive context windows, leading to exorbitant API costs and slow response times.
Token Optimizer MCP solves this by providing a suite of highly-optimized tools that intercept requests and ensure the LLM only receives exactly what it needs. It leverages structural chunking, code skeleton extraction, unified Git diffing, and strict character limits to compress payloads—all while maintaining the semantic context the LLM requires to function.
Additionally, the built-in Token Tracker persistently calculates exactly how many tokens (and estimated dollars) you save on every request.
🛠️ Installation
Requirements: Python 3.10+. You can install the package via pip. Choose the installation method based on whether you need semantic search capabilities.
1. Base Installation (Recommended)
This installs the core MCP server and all standard token-optimization tools. It is lightweight and sufficient for most users.
pip install token-optimizer-mcp
2. Installation with Embeddings
Use this command if you want to enable the Semantic Search (Vector Store) feature. This installation includes additional dependencies like faiss-cpu and numpy.
[!TIP] Already installed the base package? No problem. Running this command will simply "upgrade" your installation by adding the missing embedding-related dependencies.
pip install "token-optimizer-mcp[embeddings]"
⚡ Features
- Smart File Chunking: Never send a 10,000 line file again. Read exact line ranges with hard caps.
- Structural Summarization: Extract classes, imports, and function signatures to give the LLM a topological map of a file without the payload of raw code.
- Trimmed Search: Recursive codebase searching that returns only file paths and micro-snippets.
- Delta Diffing: Returns only the modified lines (
git diff), ensuring unmodified code is never re-processed. - Memory Summarization: A persistent JSON store for the LLM to stash compressed conversational context, preventing the need to replay huge histories.
- Semantic Code Search (Embeddings):
- On-the-fly Indexing: Automatically chunks and indexes your codebase into a vector space.
- Similarity Retrieval: Allows the LLM to find relevant code sections by meaning rather than just keywords, using an optimized FAISS vector store.
- Efficient Context: Bridges the gap between "knowing the file exists" and "finding the exact relevant snippet" without reading the whole repo.
🚀 Usage
Once installed, the CLI tool acts as both the MCP Server entry point and a management interface.
1. MCP Client Configuration
To hook the optimizer up to your agent (e.g., Claude Desktop, Cursor, or Codex, Antigravity), simply define the environment variables and point the command to token-optimizer-mcp:
{
"mcpServers": {
"token-optimizer": {
"command": "token-optimizer-mcp",
"args": ["run"],
"env": {
"ENABLE_TOKEN_TRACKING": "true",
"TOKEN_COST_PER_1K": "0.003"
}
}
}
}
2. Management CLI Commands
You can run these commands manually in your terminal to manage the server's cache and view your savings.
| Command | Description |
|---|---|
token-optimizer-mcp run |
Starts the MCP server via stdio (Standard). |
token-optimizer-mcp run --sse |
Starts the MCP server via SSE (HTTP). |
token-optimizer-mcp stats |
Prints your lifetime token and monetary savings. |
token-optimizer-mcp reset-stats |
Wipes the historical token tracker data. |
token-optimizer-mcp clear-memory |
Purges the persisted LLM memory stash. |
[!NOTE] The
runcommands are typically handled automatically by your MCP client (like Claude Desktop) once configured. You generally don't need to run them manually unless you are testing or debugging.
Example: Checking your savings
$ token-optimizer-mcp stats
📊 Lifetime Token Savings
======================================
Total Requests: 24
Tokens Used: 8,421
Tokens Saved: 114,290
Estimated Savings: $0.3428
Tracking Since: 2024-05-12T10:00:00Z
--------------------------------------
⚙️ Configuration File (Environment Variables)
The optimizer is configured entirely via Environment Variables. Edit these in your mcpServers config block to tune the aggressiveness of the tokenizer.
| Variable | Default | Description |
|---|---|---|
PROJECT_ROOT |
. |
Absolute path to the repository you are analyzing. |
MAX_FILE_LINES |
300 |
Hard cap on the number of lines returned by file read tools. |
MAX_OUTPUT_CHARS |
8000 |
Global cutoff limit applied to all tool responses. |
MAX_PREVIEW_CHARS |
200 |
Snippet length returned alongside search hits. |
MAX_SUMMARY_CHARS |
2000 |
Truncation limit for structural file summaries. |
ENABLE_TOKEN_TRACKING |
true |
Toggle the persistent ~/.cache token tracker on/off. |
TOKEN_COST_PER_1K |
0.003 |
Critical for Cost Estimation. This value represents the cost (in USD) of 1,000 input tokens for the model you are using. The tracker uses this to calculate your monetary savings. |
💡 Understanding TOKEN_COST_PER_1K
Token savings are calculated using a standard heuristic: 1 token ≈ 4 characters. To get accurate dollar savings, you should match this value to your specific model's pricing:
- Claude Sonnet 4.6:
0.003(Default) - Claude Opus 4.6:
0.015 - Google Gemini 3 Pro:
0.0025 - GPT-5-codex:
0.0020
[!IMPORTANT] Check your provider's current pricing. The values above are illustrative and model pricing changes frequently. Always verify the latest "Input Token" price from your AI provider's official documentation.
[!NOTE] The metric used is always Input/Prompt tokens, as that is where the optimization (and savings) occur.
Optional Embeddings Variables
(Only relevant if installed via [embeddings])
EMBEDDING_CHUNK_SIZE(Default:50): Lines per chunk for vectorization.FAISS_INDEX_PATH: Absolute path to store the FAISS.faissdatabase.
🧠 Under the Hood
Because Claude, chatGPT, and most Agent frameworks don't know they have unlimited context, they frequently ask for the entire app.js file just to check an import. Token Optimizer MCP intercepts this behavior:
- Tool Invocation: The LLM calls
read_file_chunk(path="app.js", start_line=1, end_line=5000). - Cap Enforcement: The server intercepts the request limits it to
MAX_FILE_LINES(e.g. 300). - Savings Calculation:
token_tracker.pycalculates the token delta between sending the 5000 lines vs the returned 300 lines using an industry-standard heuristic (len(str) / 4). - Piggybacked Stats: The server returns the payload alongside the
_token_statsobject. The LLM sees exactly how many tokens it just saved by being constrained.
🐛 Bug Reports & Feature Requests
If you encounter a bug or unexpected behavior, please open an issue to report it.
Likewise, if you need a new feature or have an idea for an improvement, you can request it by opening a feature request on GitHub.
📝 License
Distributed under the MIT License. See LICENSE for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file token_optimizer_mcp-0.1.0.tar.gz.
File metadata
- Download URL: token_optimizer_mcp-0.1.0.tar.gz
- Upload date:
- Size: 20.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8206c838f1bcc300a526b1ffde4dc3572ed25f47f3c11f6d1f4addfba70d6a30
|
|
| MD5 |
b2e3377e95efb653b3cd0126a7c3c6ba
|
|
| BLAKE2b-256 |
78ea9d2974353cd955dd7fd0afe22e5390db8be96ede4e31bc4ab4b33e0fd7a8
|
File details
Details for the file token_optimizer_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: token_optimizer_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42c27bc1b1896ee7d0f8e321f121c6a87d2385f6b46e875e4fbd45c69e15c852
|
|
| MD5 |
f5e836648e311d848638fb1e7f5c0571
|
|
| BLAKE2b-256 |
0bbd3904bc049b88bdbfac647ede0bfdc42efa3085b7c0bd7254d8ba2665c55b
|