Skip to main content

High-performance MCP server for minimizing LLM token usage and API costs via structural code analysis and precision chunking

Project description

🚀 Token Optimizer MCP

A high-performance Model Context Protocol (MCP) server that slashes LLM token consumption and API overhead. Using structural code summarization, intelligent file chunking, and semantic vector search, it ensures your AI agent receives high-fidelity context with minimal payload—maximizing both speed and cost-efficiency.

Python >= 3.10 License: MIT MCP SDK


📖 Introduction

Working with Agentic LLMs and large codebases usually means burning through massive context windows, leading to exorbitant API costs and slow response times.

Token Optimizer MCP solves this by providing a suite of highly-optimized tools that intercept requests and ensure the LLM only receives exactly what it needs. It leverages structural chunking, code skeleton extraction, unified Git diffing, and strict character limits to compress payloads—all while maintaining the semantic context the LLM requires to function.

Additionally, the built-in Token Tracker persistently calculates exactly how many tokens (and estimated dollars) you save on every request.


🛠️ Installation

Requirements: Python 3.10+. You can install the package via pip. Choose the installation method based on whether you need semantic search capabilities.

1. Base Installation (Recommended)

This installs the core MCP server and all standard token-optimization tools. It is lightweight and sufficient for most users.

pip install token-optimizer-mcp

2. Installation with Embeddings

Use this command if you want to enable the Semantic Search (Vector Store) feature. This installation includes additional dependencies like faiss-cpu and numpy.

[!TIP] Already installed the base package? No problem. Running this command will simply "upgrade" your installation by adding the missing embedding-related dependencies.

pip install "token-optimizer-mcp[embeddings]"

⚡ Features

  • Smart File Chunking: Never send a 10,000 line file again. Read exact line ranges with hard caps.
  • Structural Summarization: Extract classes, imports, and function signatures to give the LLM a topological map of a file without the payload of raw code.
  • Trimmed Search: Recursive codebase searching that returns only file paths and micro-snippets.
  • Delta Diffing: Returns only the modified lines (git diff), ensuring unmodified code is never re-processed.
  • Memory Summarization: A persistent JSON store for the LLM to stash compressed conversational context, preventing the need to replay huge histories.
  • Semantic Code Search (Embeddings):
    • On-the-fly Indexing: Automatically chunks and indexes your codebase into a vector space.
    • Similarity Retrieval: Allows the LLM to find relevant code sections by meaning rather than just keywords, using an optimized FAISS vector store.
    • Efficient Context: Bridges the gap between "knowing the file exists" and "finding the exact relevant snippet" without reading the whole repo.

🚀 Usage

Once installed, the CLI tool acts as both the MCP Server entry point and a management interface.

1. MCP Client Configuration

To hook the optimizer up to your agent (e.g., Claude Desktop, Cursor, or Codex, Antigravity), simply define the environment variables and point the command to token-optimizer-mcp:

{
  "mcpServers": {
    "token-optimizer": {
      "command": "token-optimizer-mcp",
      "args": ["run"],
      "env": {
        "ENABLE_TOKEN_TRACKING": "true",
        "TOKEN_COST_PER_1K": "0.003"
      }
    }
  }
}

2. Management CLI Commands

You can run these commands manually in your terminal to manage the server's cache and view your savings.

Command Description
token-optimizer-mcp run Starts the MCP server via stdio (Standard).
token-optimizer-mcp run --sse Starts the MCP server via SSE (HTTP).
token-optimizer-mcp stats Prints your lifetime token and monetary savings.
token-optimizer-mcp reset-stats Wipes the historical token tracker data.
token-optimizer-mcp clear-memory Purges the persisted LLM memory stash.

[!NOTE] The run commands are typically handled automatically by your MCP client (like Claude Desktop) once configured. You generally don't need to run them manually unless you are testing or debugging.

Example: Checking your savings

$ token-optimizer-mcp stats

📊 Lifetime Token Savings
======================================
Total Requests:      24
Tokens Used:         8,421
Tokens Saved:        114,290
Estimated Savings:   $0.3428
Tracking Since:      2024-05-12T10:00:00Z
--------------------------------------

⚙️ Configuration File (Environment Variables)

The optimizer is configured entirely via Environment Variables. Edit these in your mcpServers config block to tune the aggressiveness of the tokenizer.

Variable Default Description
PROJECT_ROOT . Absolute path to the repository you are analyzing.
MAX_FILE_LINES 300 Hard cap on the number of lines returned by file read tools.
MAX_OUTPUT_CHARS 8000 Global cutoff limit applied to all tool responses.
MAX_PREVIEW_CHARS 200 Snippet length returned alongside search hits.
MAX_SUMMARY_CHARS 2000 Truncation limit for structural file summaries.
ENABLE_TOKEN_TRACKING true Toggle the persistent ~/.cache token tracker on/off.
TOKEN_COST_PER_1K 0.003 Critical for Cost Estimation. This value represents the cost (in USD) of 1,000 input tokens for the model you are using. The tracker uses this to calculate your monetary savings.

💡 Understanding TOKEN_COST_PER_1K

Token savings are calculated using a standard heuristic: 1 token ≈ 4 characters. To get accurate dollar savings, you should match this value to your specific model's pricing:

  • Claude Sonnet 4.6: 0.003 (Default)
  • Claude Opus 4.6: 0.015
  • Google Gemini 3 Pro: 0.0025
  • GPT-5-codex: 0.0020

[!IMPORTANT] Check your provider's current pricing. The values above are illustrative and model pricing changes frequently. Always verify the latest "Input Token" price from your AI provider's official documentation.

[!NOTE] The metric used is always Input/Prompt tokens, as that is where the optimization (and savings) occur.

Optional Embeddings Variables

(Only relevant if installed via [embeddings])

  • EMBEDDING_CHUNK_SIZE (Default: 50): Lines per chunk for vectorization.
  • FAISS_INDEX_PATH: Absolute path to store the FAISS .faiss database.

🧠 Under the Hood

Because Claude, chatGPT, and most Agent frameworks don't know they have unlimited context, they frequently ask for the entire app.js file just to check an import. Token Optimizer MCP intercepts this behavior:

  1. Tool Invocation: The LLM calls read_file_chunk(path="app.js", start_line=1, end_line=5000).
  2. Cap Enforcement: The server intercepts the request limits it to MAX_FILE_LINES (e.g. 300).
  3. Savings Calculation: token_tracker.py calculates the token delta between sending the 5000 lines vs the returned 300 lines using an industry-standard heuristic (len(str) / 4).
  4. Piggybacked Stats: The server returns the payload alongside the _token_stats object. The LLM sees exactly how many tokens it just saved by being constrained.

🐛 Bug Reports & Feature Requests

If you encounter a bug or unexpected behavior, please open an issue to report it.

Likewise, if you need a new feature or have an idea for an improvement, you can request it by opening a feature request on GitHub.


📝 License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

token_optimizer_mcp-0.1.0.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

token_optimizer_mcp-0.1.0-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file token_optimizer_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: token_optimizer_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for token_optimizer_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8206c838f1bcc300a526b1ffde4dc3572ed25f47f3c11f6d1f4addfba70d6a30
MD5 b2e3377e95efb653b3cd0126a7c3c6ba
BLAKE2b-256 78ea9d2974353cd955dd7fd0afe22e5390db8be96ede4e31bc4ab4b33e0fd7a8

See more details on using hashes here.

File details

Details for the file token_optimizer_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for token_optimizer_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 42c27bc1b1896ee7d0f8e321f121c6a87d2385f6b46e875e4fbd45c69e15c852
MD5 f5e836648e311d848638fb1e7f5c0571
BLAKE2b-256 0bbd3904bc049b88bdbfac647ede0bfdc42efa3085b7c0bd7254d8ba2665c55b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page