High-performance MCP server for minimizing LLM token usage and API costs via structural code analysis and precision chunking

These details have not been verified by PyPI

Project links

Project description

🚀 Token Optimizer MCP

A high-performance Model Context Protocol (MCP) server that slashes LLM token consumption and API overhead. Using structural code summarization, intelligent file chunking, and semantic vector search, it ensures your AI agent receives high-fidelity context with minimal payload—maximizing both speed and cost-efficiency.

📖 Introduction

Working with Agentic LLMs and large codebases usually means burning through massive context windows, leading to exorbitant API costs and slow response times.

Token Optimizer MCP solves this by providing a suite of highly-optimized tools that intercept requests and ensure the LLM only receives exactly what it needs. It leverages structural chunking, code skeleton extraction, unified Git diffing, and strict character limits to compress payloads—all while maintaining the semantic context the LLM requires to function.

Additionally, the built-in Token Tracker persistently calculates exactly how many tokens (and estimated dollars) you save on every request.

🛠️ Installation

Requirements: Python 3.10+. You can install the package via pip. Choose the installation method based on whether you need semantic search capabilities.

1. Base Installation (Recommended)

This installs the core MCP server and all standard token-optimization tools. It is lightweight and sufficient for most users.

pip install token-optimizer-mcp

2. Installation with Embeddings

Use this command if you want to enable the Semantic Search (Vector Store) feature. This installation includes additional dependencies like faiss-cpu and numpy.

[!TIP] Already installed the base package? No problem. Running this command will simply "upgrade" your installation by adding the missing embedding-related dependencies.

pip install "token-optimizer-mcp[embeddings]"

⚡ Features

Smart File Chunking: Never send a 10,000 line file again. Read exact line ranges with hard caps.
Structural Summarization: Extract classes, imports, and function signatures to give the LLM a topological map of a file without the payload of raw code.
Trimmed Search: Recursive codebase searching that returns only file paths and micro-snippets.
Delta Diffing: Returns only the modified lines (git diff), ensuring unmodified code is never re-processed.
Memory Summarization: A persistent JSON store for the LLM to stash compressed conversational context, preventing the need to replay huge histories.
Semantic Code Search (Embeddings):
- On-the-fly Indexing: Automatically chunks and indexes your codebase into a vector space.
- Similarity Retrieval: Allows the LLM to find relevant code sections by meaning rather than just keywords, using an optimized FAISS vector store.
- Efficient Context: Bridges the gap between "knowing the file exists" and "finding the exact relevant snippet" without reading the whole repo.

🚀 Usage

Once installed, the CLI tool acts as both the MCP Server entry point and a management interface.

1. MCP Client Configuration

To hook the optimizer up to your agent (e.g., Claude Desktop, Cursor, or Codex, Antigravity), simply define the environment variables and point the command to token-optimizer-mcp:

{
  "mcpServers": {
    "token-optimizer": {
      "command": "token-optimizer-mcp",
      "args": ["run"],
      "env": {
        "ENABLE_TOKEN_TRACKING": "true",
        "TOKEN_COST_PER_1K": "0.003"
      }
    }
  }
}

2. Management CLI Commands

You can run these commands manually in your terminal to manage the server's cache and view your savings.

Command	Description
`token-optimizer-mcp run`	Starts the MCP server via `stdio` (Standard).
`token-optimizer-mcp run --sse`	Starts the MCP server via SSE (HTTP).
`token-optimizer-mcp stats`	Prints your lifetime token and monetary savings.
`token-optimizer-mcp reset-stats`	Wipes the historical token tracker data.
`token-optimizer-mcp clear-memory`	Purges the persisted LLM memory stash.

[!NOTE] The run commands are typically handled automatically by your MCP client (like Claude Desktop) once configured. You generally don't need to run them manually unless you are testing or debugging.

Example: Checking your savings

$ token-optimizer-mcp stats

📊 Lifetime Token Savings
======================================
Total Requests:      24
Tokens Used:         8,421
Tokens Saved:        114,290
Estimated Savings:   $0.3428
Tracking Since:      2024-05-12T10:00:00Z
--------------------------------------

⚙️ Configuration File (Environment Variables)

The optimizer is configured entirely via Environment Variables. Edit these in your mcpServers config block to tune the aggressiveness of the tokenizer.

Variable	Default	Description
`PROJECT_ROOT`	`.`	Absolute path to the repository you are analyzing.
`MAX_FILE_LINES`	`300`	Hard cap on the number of lines returned by file read tools.
`MAX_OUTPUT_CHARS`	`8000`	Global cutoff limit applied to all tool responses.
`MAX_PREVIEW_CHARS`	`200`	Snippet length returned alongside search hits.
`MAX_SUMMARY_CHARS`	`2000`	Truncation limit for structural file summaries.
`ENABLE_TOKEN_TRACKING`	`true`	Toggle the persistent `~/.cache` token tracker on/off.
`TOKEN_COST_PER_1K`	`0.003`	Critical for Cost Estimation. This value represents the cost (in USD) of 1,000 input tokens for the model you are using. The tracker uses this to calculate your monetary savings.

💡 Understanding `TOKEN_COST_PER_1K`

Token savings are calculated using a standard heuristic: 1 token ≈ 4 characters. To get accurate dollar savings, you should match this value to your specific model's pricing:

Claude Sonnet 4.6: 0.003 (Default)
Claude Opus 4.6: 0.015
Google Gemini 3 Pro: 0.0025
GPT-5-codex: 0.0020

[!IMPORTANT] Check your provider's current pricing. The values above are illustrative and model pricing changes frequently. Always verify the latest "Input Token" price from your AI provider's official documentation.

[!NOTE] The metric used is always Input/Prompt tokens, as that is where the optimization (and savings) occur.

Optional Embeddings Variables

(Only relevant if installed via [embeddings])

EMBEDDING_CHUNK_SIZE (Default: 50): Lines per chunk for vectorization.
FAISS_INDEX_PATH: Absolute path to store the FAISS .faiss database.

🧠 Under the Hood

Because Claude, chatGPT, and most Agent frameworks don't know they have unlimited context, they frequently ask for the entire app.js file just to check an import. Token Optimizer MCP intercepts this behavior:

Tool Invocation: The LLM calls read_file_chunk(path="app.js", start_line=1, end_line=5000).
Cap Enforcement: The server intercepts the request limits it to MAX_FILE_LINES (e.g. 300).
Savings Calculation: token_tracker.py calculates the token delta between sending the 5000 lines vs the returned 300 lines using an industry-standard heuristic (len(str) / 4).
Piggybacked Stats: The server returns the payload alongside the _token_stats object. The LLM sees exactly how many tokens it just saved by being constrained.

🐛 Bug Reports & Feature Requests

If you encounter a bug or unexpected behavior, please open an issue to report it.

Likewise, if you need a new feature or have an idea for an improvement, you can request it by opening a feature request on GitHub.

📝 License

Distributed under the MIT License. See LICENSE for more information.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

token_optimizer_mcp-0.1.0.tar.gz (20.7 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

token_optimizer_mcp-0.1.0-py3-none-any.whl (27.7 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file token_optimizer_mcp-0.1.0.tar.gz.

File metadata

Download URL: token_optimizer_mcp-0.1.0.tar.gz
Upload date: Feb 28, 2026
Size: 20.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for token_optimizer_mcp-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8206c838f1bcc300a526b1ffde4dc3572ed25f47f3c11f6d1f4addfba70d6a30`
MD5	`b2e3377e95efb653b3cd0126a7c3c6ba`
BLAKE2b-256	`78ea9d2974353cd955dd7fd0afe22e5390db8be96ede4e31bc4ab4b33e0fd7a8`

See more details on using hashes here.

File details

Details for the file token_optimizer_mcp-0.1.0-py3-none-any.whl.

File metadata

Download URL: token_optimizer_mcp-0.1.0-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 27.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for token_optimizer_mcp-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`42c27bc1b1896ee7d0f8e321f121c6a87d2385f6b46e875e4fbd45c69e15c852`
MD5	`f5e836648e311d848638fb1e7f5c0571`
BLAKE2b-256	`0bbd3904bc049b88bdbfac647ede0bfdc42efa3085b7c0bd7254d8ba2665c55b`

See more details on using hashes here.

token-optimizer-mcp 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 Token Optimizer MCP

📖 Introduction

🛠️ Installation

1. Base Installation (Recommended)

2. Installation with Embeddings

⚡ Features

🚀 Usage

1. MCP Client Configuration

2. Management CLI Commands

Example: Checking your savings

⚙️ Configuration File (Environment Variables)

💡 Understanding TOKEN_COST_PER_1K

Optional Embeddings Variables

🧠 Under the Hood

🐛 Bug Reports & Feature Requests

📝 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

💡 Understanding `TOKEN_COST_PER_1K`