tokenoptim

Reduce LLM costs by 10-30% through tokenizer-aware prompt compression

These details have not been verified by PyPI

Project links

Project description

TokenOptim

Reduce LLM costs by 10-30% through tokenizer-aware prompt compression. Works with any LLM provider.

If you find TokenOptim useful, consider giving it a star on GitHub — it helps others discover the project and motivates continued development.

Quick Start
Compression Examples
How It Works
Dashboard
Configuration
Advisor Utilities
Supported Models
Installation Options
Project Structure
Development
Limitations
Contributing
License

Quick Start

pip install tokenoptim

Compress a prompt

import tokenoptim

result = tokenoptim.optimize("your long prompt here", model="gpt-4")
print(result.text)           # compressed text
print(result.savings_pct)    # e.g. 28.0
print(result.cost_saved_usd) # e.g. 0.000630

Use with any provider

import tokenoptim
from openai import OpenAI

client = OpenAI()
result = tokenoptim.optimize("your long prompt here", model="gpt-4")

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": result.text}],
)

Works the same way with Anthropic, DeepSeek, Mistral, Google, or any other provider:

import tokenoptim
from anthropic import Anthropic

client = Anthropic()
result = tokenoptim.optimize("your long prompt here", model="claude-sonnet-4")

response = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=1024,
    messages=[{"role": "user", "content": result.text}],
)

Compress chat messages

import tokenoptim

optimized = tokenoptim.optimize_messages([
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in detail."},
], model="gpt-4")

# Use optimized.messages directly in your API call
response = client.chat.completions.create(
    model="gpt-4",
    messages=optimized.messages,
)
print(f"Saved {optimized.tokens_saved} tokens ({optimized.savings_pct}%)")

Track cumulative savings

import tokenoptim

with tokenoptim.session(model="gpt-4") as s:
    s.optimize("first prompt")
    s.optimize("second prompt")
    s.optimize_messages([{"role": "user", "content": "third prompt"}])

print(f"Saved {s.total_tokens_saved} tokens across {s.call_count} calls")
print(f"Cost saved: ${s.total_cost_saved_usd:.4f}")
print(f"Avg savings: {s.avg_savings_pct}%")

API Reference

Function	Description
`tokenoptim.optimize(text, model=, ...)`	Compress a string. Returns `CompressionResult`
`tokenoptim.optimize_messages(messages, model=, ...)`	Compress chat messages. Returns `MessagesResult`
`tokenoptim.session(model=, ...)`	Context manager tracking cumulative stats
`tokenoptim.suggest_cache_split(text, model=)`	Suggest prefix-caching split point. Returns `CacheSplitResult`
`tokenoptim.suggest_output_format(text)`	Detect verbose output patterns. Returns `list[OutputFormatSuggestion]`
`tokenoptim.compare_models(text, models=)`	Compare token counts and costs. Returns `list[ModelCostComparison]`

Options available on all functions:

Parameter	Default	Description
`model`	`"gpt-4"`	Target model (for tokenizer and cost calculation)
`enable_contractions`	`True`	Apply contractions (`do not` -> `don't`)
`enable_filler_removal`	`True`	Strip filler phrases
`enable_phrase_shortening`	`True`	Replace verbose phrases (`due to the fact that` -> `because`)
`enable_numeric_normalization`	`True`	Normalize numbers (`1,000` -> `1000`, `3.00` -> `3`)
`enable_separator_removal`	`True`	Remove separator lines (`---`, `===`) and boilerplate phrases
`enable_html_stripping`	`False`	Strip HTML/XML tags and unescape entities
`enable_code_comment_stripping`	`False`	Strip `# ...` and `// ...` end-of-line comments
`enable_json_minification`	`False`	Minify JSON blocks and inline objects
`enable_duplicate_removal`	`False`	Remove consecutive duplicate lines/paragraphs
`enable_abbreviations`	`False`	Replace common long words (`configuration` -> `config`)
`enable_markdown_stripping`	`False`	Strip markdown formatting (preserves code blocks)
`enable_semantic_dedup`	`False`	Remove near-duplicate sentences via TF-IDF similarity
`semantic_dedup_threshold`	`0.8`	Cosine similarity threshold for semantic dedup
`enable_indentation_compaction`	`False`	Reduce 4-space/tab indentation to 2-space
`enable_url_shortening`	`False`	Replace URLs with domain only (preserves code blocks)
`enable_article_trimming`	`False`	Remove redundant articles after prepositions/verbs
`enable_list_compaction`	`False`	Convert short bullet/numbered lists to comma-separated
`enable_xml_minification`	`False`	Minify XML in fenced blocks and inline
`enable_yaml_minification`	`False`	Minify YAML in fenced blocks (strip comments, reduce indent)
`track`	`True`	Log metrics to the dashboard database

Lower-level Compressor class

For direct control without metrics tracking:

from tokenoptim import Compressor

c = Compressor(model="gpt-4")
result = c.compress("your prompt text here")

print(f"Original:   {result.original_tokens} tokens")
print(f"Compressed: {result.compressed_tokens} tokens")
print(f"Saved:      {result.savings_pct}%")
print(f"Cost saved: ${result.cost_saved_usd:.6f}")

Compression Examples

System prompt with fillers and contractions

Before (38 tokens):

You are a helpful coding assistant. Please note that you should provide
concise and accurate code. It is important to mention that you should
not make up APIs. You should always include error handling.

After (29 tokens):

You're a helpful coding assistant. You should provide concise and accurate
code. You shouldn't make up APIs. You should always include error handling.

Savings: 24% — filler removal (please note that, it is important to mention that) and contractions (You are → You're, should not → shouldn't).

Filler-heavy requirements prompt

Before (38 tokens):

It is important to note that I need a REST API. Please note that it should
handle authentication. It should be noted that rate limiting is required.
As previously mentioned we are using PostgreSQL.

After (21 tokens):

I need a REST API. It should handle authentication. Rate limiting is
required. We're using PostgreSQL.

Savings: 45% — four filler phrases stripped, plus contractions.

Code-adjacent prompt

Before (24 tokens):

Write a Python function that does not raise an exception. It is important
to note that the function should return a list.

After (18 tokens):

Write a Python function that doesn't raise an exception. The function
should return a list.

Savings: 25% — contractions and filler removal apply; code structure is preserved.

Unicode and whitespace cleanup

Before (35 tokens):

The model\u2019s predictions are   very   accurate.    We have not tested
the   edge cases yet,  but  we  should  not  skip  them.

After (24 tokens):

The model's predictions are very accurate. We've not tested the edge
cases yet, but we shouldn't skip them.

Savings: 31% — smart quotes normalized, extra whitespace collapsed, contractions applied.

Chat messages (multi-message compression)

Before (61 tokens):

messages = [
    {"role": "system", "content": "You are an expert Python developer. Please note that you should write clean code. You should not use global variables. You should always add type hints."},
    {"role": "user", "content": "It is important to note that I need a function to parse JSON. The function does not need to handle errors. It is worth noting that performance matters."},
]

After (47 tokens):

messages = [
    {"role": "system", "content": "You're an expert Python developer. You should write clean code. You shouldn't use global variables. You should always add type hints."},
    {"role": "user", "content": "I need a function to parse JSON. The function doesn't need to handle errors. Performance matters."},
]

Savings: 23% — each message is compressed independently; fillers and contractions stack up across the conversation.

How It Works

TokenOptim applies 24 compression strategies in order:

Line ending normalization — normalize \r\n and \r to \n (always on)
Unicode normalization — NFC normalize, replace smart quotes/dashes with ASCII
Indentation compaction — reduce 4-space/tab indentation to 2-space (opt-in)
Whitespace normalization — collapse multiple spaces, tabs, blank lines
JSON minification — minify fenced and inline JSON blocks (opt-in)
XML minification — minify fenced and inline XML (opt-in)
YAML minification — strip comments, reduce indent, remove blank lines in fenced YAML (opt-in)
Redundant punctuation — !!! → !, ???? → ?
HTML/XML stripping — remove tags and unescape entities (opt-in)
Markdown stripping — remove formatting while preserving code blocks (opt-in)
URL shortening — replace full URLs with domain only, strip www. (opt-in)
Filler removal — strip phrases like "please note that", "basically", "it is important to mention that"
Separator/boilerplate removal — remove lines of ---, ===, etc. and phrases like "please find below"
Duplicate line removal — remove consecutive duplicate lines and paragraphs (opt-in)
List compaction — convert short bullet/numbered lists to comma-separated (opt-in)
Verbose phrase shortening — due to the fact that → because, in order to → to, prior to → before
Abbreviations — configuration → config, documentation → docs, database → db (opt-in)
Article trimming — remove redundant the/a/an after prepositions and common verbs (opt-in)
Contractions — do not → don't, it is → it's (configurable)
Numeric normalization — 1,000,000 → 1000000, 3.00 → 3, 007 → 7
Code comment stripping — remove # ... and // ... end-of-line comments (opt-in)
Semantic deduplication — remove near-duplicate sentences using TF-IDF cosine similarity (opt-in)
Trailing whitespace — strip per-line trailing spaces
Tokenizer-specific — model-aware optimizations (e.g., \n \n → \n\n saves 2 tokens in tiktoken)

All strategies preserve semantic meaning. Code and structured data pass through with minimal changes.

Dashboard

Launch the real-time savings dashboard:

tokenoptim dashboard

Open http://localhost:8383 to see:

Total tokens and cost saved
Savings over time charts
Per-model breakdown
Recent requests log
ROI calculator

Dashboard overview Requests and ROI calculator

Terminal stats

tokenoptim stats

Configuration

from tokenoptim import Compressor

# Disable contractions (for formal prompts)
c = Compressor(model="gpt-4", enable_contractions=False)

# Disable filler removal
c = Compressor(model="gpt-4", enable_filler_removal=False)

# Add custom filler phrases
c = Compressor(model="gpt-4", custom_fillers=["in my opinion", "to be honest"])

# Enable HTML stripping (opt-in — useful for web-scraped content)
c = Compressor(model="gpt-4", enable_html_stripping=True)

# Enable code comment stripping (opt-in — useful for code-heavy prompts)
c = Compressor(model="gpt-4", enable_code_comment_stripping=True)

# Disable verbose phrase shortening
c = Compressor(model="gpt-4", enable_phrase_shortening=False)

# Enable JSON minification (opt-in — useful for prompts with JSON data)
c = Compressor(model="gpt-4", enable_json_minification=True)

# Enable markdown stripping (opt-in — useful for web-scraped markdown)
c = Compressor(model="gpt-4", enable_markdown_stripping=True)

# Enable abbreviations (opt-in — replaces common long words)
c = Compressor(model="gpt-4", enable_abbreviations=True)

# Enable semantic deduplication (opt-in — removes near-duplicate sentences)
c = Compressor(model="gpt-4", enable_semantic_dedup=True, semantic_dedup_threshold=0.8)

# Enable indentation compaction (opt-in — reduces 4-space/tab to 2-space)
c = Compressor(model="gpt-4", enable_indentation_compaction=True)

# Enable URL shortening (opt-in — replaces URLs with domain only)
c = Compressor(model="gpt-4", enable_url_shortening=True)

# Enable article trimming (opt-in — removes redundant the/a/an)
c = Compressor(model="gpt-4", enable_article_trimming=True)

# Enable list compaction (opt-in — converts short lists to comma-separated)
c = Compressor(model="gpt-4", enable_list_compaction=True)

# Enable XML minification (opt-in — minifies XML in fenced blocks)
c = Compressor(model="gpt-4", enable_xml_minification=True)

# Enable YAML minification (opt-in — strips comments, reduces indent in YAML blocks)
c = Compressor(model="gpt-4", enable_yaml_minification=True)

DeepSeek

import tokenoptim
from openai import OpenAI

client = OpenAI(base_url="https://api.deepseek.com", api_key="your-key")
result = tokenoptim.optimize("your prompt here", model="deepseek-chat")

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": result.text}],
)

Advisor Utilities

TokenOptim includes advisory functions that help you optimize LLM costs beyond compression:

Suggest cache-friendly splits

import tokenoptim

result = tokenoptim.suggest_cache_split("""
You are a helpful assistant specialized in Python.
Always provide working code examples.
Answer the user's question about: {topic}
""")

print(f"Static prefix: {result.static_tokens} tokens")
print(f"Dynamic suffix: {result.dynamic_tokens} tokens")
print(f"Cache savings estimate: {result.cache_savings_estimate:.0%}")

Suggest concise output formats

suggestions = tokenoptim.suggest_output_format(
    "Please explain in detail how neural networks work and provide a detailed analysis."
)
for s in suggestions:
    print(f"Pattern: '{s.current_pattern}' → {s.suggestion} (saves ~{s.estimated_savings_pct}%)")

Compare model costs

comparisons = tokenoptim.compare_models(
    "Your prompt text here",
    models=["gpt-4", "gpt-4o", "gpt-3.5-turbo", "claude-3-5-sonnet", "deepseek-chat"],
)
for c in comparisons:
    print(f"{c.model:20s} {c.tokens:5d} tokens  ${c.cost_per_call:.6f}/call  ({c.provider})")

Supported Models

Provider	Models	Tokenizer
OpenAI	gpt-5.2, gpt-5.2-pro, gpt-5.1, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4, gpt-4o, gpt-3.5-turbo, o3, o4-mini, o1	tiktoken
Anthropic	claude-3-opus/sonnet/haiku, claude-3.5-*, claude-opus-4, claude-sonnet-4	tiktoken (approx)
DeepSeek	deepseek-chat, deepseek-reasoner, deepseek-v3, deepseek-r1	transformers / tiktoken fallback
Mistral	mistral-large, mistral-small, codestral, mixtral	tiktoken (approx)
Google	gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash	tiktoken (approx)
Meta	llama-3, llama-2	tiktoken (approx)
Qwen	qwen-max, qwen-plus, qwen-turbo	tiktoken (approx)
Local	Any HuggingFace model	transformers / fallback

Installation Options

# Core (includes tiktoken for token counting)
pip install tokenoptim

# With local model support (HuggingFace transformers)
pip install tokenoptim[local]

# Development
pip install tokenoptim[dev]

Project Structure

tokenoptim/
├── src/tokenoptim/
│   ├── compressor.py          # Core compression engine
│   ├── tokenizers.py          # Tokenizer registry & pricing
│   ├── metrics/               # Usage tracking (SQLite)
│   │   ├── collector.py
│   │   ├── models.py
│   │   └── db.py
│   ├── server/                # FastAPI dashboard backend
│   │   ├── app.py
│   │   └── routes.py
│   ├── api.py                 # Public Python API (optimize, session)
│   └── advisor.py             # Advisory utilities (cache split, model compare)
├── dashboard/                 # React dashboard (Vite + TailwindCSS)
└── tests/                     # pytest suite (198 tests)

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run dashboard dev server (frontend)
cd dashboard && npm install && npm run dev

# Run API server (backend)
tokenoptim dashboard

Limitations

English only — contractions, filler removal, and verbose phrase shortening are designed for English text. Unicode normalization, whitespace cleanup, numeric normalization, and HTML/code comment stripping work with any language.
No semantic compression — TokenOptim applies rule-based transformations only. It does not paraphrase, summarize, or use ML models.
Tokenizer approximation — for providers without a public tokenizer (Anthropic, Mistral, Google, Meta, Qwen), token counts are approximated using tiktoken's cl100k_base encoding.

Contributing

Found a bug or have an idea? Open an issue or submit a PR. If TokenOptim saved you tokens (and money), a star goes a long way!

Adding a new compression strategy

Want to contribute a new strategy? Here's how — it only takes 3 steps:

1. Add your strategy to src/tokenoptim/compressor.py:

# Module-level data (if needed)
_MY_REPLACEMENTS: dict[str, str] = {
    "long phrase": "short",
}

# In the Compressor class:

# Add a constructor param (opt-in strategies default to False)
def __init__(self, ..., enable_my_strategy: bool = False):
    self.enable_my_strategy = enable_my_strategy

# Add a method
@staticmethod
def _apply_my_strategy(text: str) -> str:
    for old, new in _MY_REPLACEMENTS.items():
        text = text.replace(old, new)
    return text

# Wire it into the compress() pipeline at the right position
if self.enable_my_strategy:
    compressed = self._apply_my_strategy(compressed)

2. Propagate the param through src/tokenoptim/api.py:

Add enable_my_strategy: bool = False to optimize(), optimize_messages(), the Session dataclass, and session() — then pass it to the Compressor constructor in each.

3. Add tests in tests/test_compressor.py:

class TestMyStrategy:
    def test_basic(self):
        c = Compressor(model="gpt-4", enable_my_strategy=True)
        result = c.compress("some long phrase here")
        assert "short" in result.text

    def test_disabled_by_default(self, compressor):
        result = compressor.compress("some long phrase here")
        assert "long phrase" in result.text

Run PYTHONPATH=src python3 -m pytest tests/ -v and make sure everything passes.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenoptim-0.1.0.tar.gz (1.1 MB view details)

Uploaded Feb 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenoptim-0.1.0-py3-none-any.whl (30.8 kB view details)

Uploaded Feb 10, 2026 Python 3

File details

Details for the file tokenoptim-0.1.0.tar.gz.

File metadata

Download URL: tokenoptim-0.1.0.tar.gz
Upload date: Feb 10, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for tokenoptim-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`82b5d367a1eaa3709696a0324bff64ce7ada59c7f7e67d5436ae414eb49938ef`
MD5	`dc7afaf1dd6f816a672f7f88146b8402`
BLAKE2b-256	`a5b80521f37780a32839b96c9e60791d9161397e851e1b61bbd26942cca736bc`

See more details on using hashes here.

File details

Details for the file tokenoptim-0.1.0-py3-none-any.whl.

File metadata

Download URL: tokenoptim-0.1.0-py3-none-any.whl
Upload date: Feb 10, 2026
Size: 30.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for tokenoptim-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`48f12aa5dad10e1d29ea54c7d3f9dc0263009429c30a6f8b311ccb0a5cdf57ec`
MD5	`60de8409ee0a3961a62ad35efa4bc879`
BLAKE2b-256	`d023589c459320a4b38356971a1808dc22c4205e1b0b7c0539d62dd7fa627425`

See more details on using hashes here.

tokenoptim 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Table of Contents

Quick Start

Compress a prompt

Use with any provider

Compress chat messages

Track cumulative savings

API Reference

Lower-level Compressor class

Compression Examples

System prompt with fillers and contractions

Filler-heavy requirements prompt

Code-adjacent prompt

Unicode and whitespace cleanup

Chat messages (multi-message compression)

How It Works

Dashboard

Terminal stats

Configuration

DeepSeek

Advisor Utilities

Suggest cache-friendly splits

Suggest concise output formats

Compare model costs

Supported Models

Installation Options

Project Structure

Development

Limitations

Contributing

Adding a new compression strategy

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes