LLM token optimization CLI - compress prompts, save costs

Project description

TokenCrush

LLM token optimization CLI - compress prompts, save costs.

TokenCrush uses LLMLingua-2 to compress your prompts before sending them to LLMs, reducing token usage by 50-80% while preserving response quality.

Features

Prompt Compression: Reduce tokens by 2-5x using Microsoft's LLMLingua-2
Multi-Provider Support: Works with OpenAI, Anthropic (Claude), and Google (Gemini)
Simple CLI: One command to compress and chat
Cost Savings: Significantly reduce API costs

Installation

pip install tokencrush

Or with uv:

uv add tokencrush

Quick Start

1. Configure API Keys

# Set API keys (stored in ~/.config/tokencrush/config.toml)
tokencrush config set openai sk-your-openai-key
tokencrush config set anthropic sk-ant-your-anthropic-key
tokencrush config set google your-google-api-key

# Or use environment variables
export OPENAI_API_KEY=sk-your-key

2. Compress a Prompt

tokencrush compress "Your very long prompt text here..."

Output:

┌─────────────────────────────────────┐
│        Compression Result           │
├──────────────────┬──────────────────┤
│ Original Tokens  │ 500              │
│ Compressed Tokens│ 150              │
│ Compression Ratio│ 30.0%            │
│ Tokens Saved     │ 350              │
└──────────────────┴──────────────────┘

3. Chat with Compression

# Compress and send to GPT-4
tokencrush chat "Explain quantum computing in detail" --model gpt-4

# Use Claude
tokencrush chat "Write a poem" --model claude-3-sonnet

# Use Gemini
tokencrush chat "Summarize this" --model gemini-1.5-pro

CLI Reference

`tokencrush compress`

Compress text to reduce tokens.

tokencrush compress <text> [OPTIONS]

Options:
  -r, --rate FLOAT    Compression rate (0.0-1.0). Default: 0.5
  -g, --gpu           Use GPU for faster compression

`tokencrush chat`

Compress and send to LLM.

tokencrush chat <prompt> [OPTIONS]

Options:
  -m, --model TEXT    Model to use. Default: gpt-4
  -r, --rate FLOAT    Compression rate. Default: 0.5
  -g, --gpu           Use GPU for compression
  --no-compress       Skip compression

`tokencrush config`

Manage API keys.

tokencrush config set <provider> <key>   # Save API key
tokencrush config show                    # Show configured keys

Supported Models

Provider	Models
OpenAI	gpt-4, gpt-4-turbo, gpt-3.5-turbo
Anthropic	claude-3-opus, claude-3-sonnet, claude-3-haiku
Google	gemini-1.5-pro, gemini-1.5-flash

How It Works

Compression: TokenCrush uses LLMLingua-2's trained model to identify and remove redundant tokens while preserving meaning.
API Call: The compressed prompt is sent to your chosen LLM provider.
Response: You get the same quality response with fewer input tokens = lower cost.

Development

# Clone the repository
git clone https://github.com/xodn348/tokencrush.git
cd tokencrush

# Install with dev dependencies
uv sync --all-extras

# Run tests
uv run pytest -v

# Build package
uv build

License

MIT License - see LICENSE for details.

Acknowledgments

LLMLingua by Microsoft Research
LiteLLM for multi-provider support

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokencrush-0.1.0.tar.gz (188.5 kB view details)

Uploaded Jan 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokencrush-0.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Jan 28, 2026 Python 3

File details

Details for the file tokencrush-0.1.0.tar.gz.

File metadata

Download URL: tokencrush-0.1.0.tar.gz
Upload date: Jan 28, 2026
Size: 188.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for tokencrush-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b184f344d738ac2e2dd65cbe95f0f59808b7cc9997f70cc70e0a8c4bf76b0f94`
MD5	`c75db37679b335cef0cc9cd32b96dbd7`
BLAKE2b-256	`7b70d55d97797c7a4970031d42d956d164df353eab51f2d98836f8ddace21181`

See more details on using hashes here.

File details

Details for the file tokencrush-0.1.0-py3-none-any.whl.

File metadata

Download URL: tokencrush-0.1.0-py3-none-any.whl
Upload date: Jan 28, 2026
Size: 8.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for tokencrush-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3a07677e4bf5e177333ab070ffd9764bea97cc096b330cc1d45608e0236d1ac`
MD5	`7c4ab78ce5d80b60517434c6f8fffb29`
BLAKE2b-256	`c6e2debad9fc8ee8703e93e954ecbc0511712cb73ad994caf722887e24d2ad9e`

See more details on using hashes here.

tokencrush 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TokenCrush

Features

Installation

Quick Start

1. Configure API Keys

2. Compress a Prompt

3. Chat with Compression

CLI Reference

`tokencrush compress`

`tokencrush chat`

`tokencrush config`

Supported Models

How It Works

Development

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes