Blazing-fast LLM token compression engine. Built in Rust. Reduce API costs 2-5x with mathematically provable zero quality loss.
Project description
TokenPress
Blazing-fast LLM token compression engine, built in Rust.
2–5× fewer tokens. Same or better output quality. Any model, any language.
Why TokenPress?
LLM API calls are expensive. Most prompt tokens are redundant — filler words, predictable syntax, boilerplate. TokenPress uses information theory to score every token by its surprise value, keeps only what matters, and sends a compressed prompt to the LLM.
- 18.6 M tok/s Rust core (PyO3 + rayon)
- 3.3× compression with near-zero quality loss
- Drop-in wrappers for OpenAI & Anthropic — zero code changes
- 6 selection strategies: ratio, top-k, threshold, percentile, IQR, token merging
- Language & model agnostic — works on any text, any LLM
Install
pip install tokenpress # core
pip install "tokenpress[models]" # + scoring model (distilgpt2)
pip install "tokenpress[all]" # + OpenAI & Anthropic wrappers
Quick Start
import tokenpress
result = tokenpress.compress("Your very long prompt …", ratio=0.3)
print(result.compressed_text) # compressed prompt
print(f"{result.compression_ratio:.1f}× smaller, {result.savings_percentage:.0f}% tokens saved")
Drop-in OpenAI wrapper:
import openai, tokenpress
client = tokenpress.wrap(openai.OpenAI(), ratio=0.3)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": very_long_prompt}],
)
# same API, 3× cheaper
CLI:
tokenpress compress document.txt -r 0.3 -o compressed.txt
tokenpress bench document.txt
How It Works
Each token is scored by self-information: $I(x_i) = -\log_2 P(x_i \mid x_{<i})$
High surprise → keep. Low surprise → remove. Pure math, no heuristics.
Built on research from Selective Context (EMNLP '23), TRIM (COLING '25), Token Merging (ICLR '23), H2O (NeurIPS '24), and LLMLingua-2 (ACL '24).
Links
- GitHub — full docs, benchmarks, architecture
- Benchmarks — 7 reproducible benchmark suites
- License — Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokenpress-0.1.0.tar.gz.
File metadata
- Download URL: tokenpress-0.1.0.tar.gz
- Upload date:
- Size: 130.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d56a371f03b194ed4642aaad8cc425d731a4cd5b9dd277d32513eb828ee75f1
|
|
| MD5 |
e097d38e0adfb2eaa2c5cfa721d6f5d8
|
|
| BLAKE2b-256 |
d52c10ab0882f1fb37a0aa23c0547de0c494d6e37e38038dd2d198ae41191b33
|
File details
Details for the file tokenpress-0.1.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: tokenpress-0.1.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 163.5 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67e03991d755b2e11e73741c1881eb47e2e118863d144af5a8f57d662a13b686
|
|
| MD5 |
539d3d586a34cb69fe723763074797ab
|
|
| BLAKE2b-256 |
f4b0efeae5be489c907396c60577295d54f75539714518b6e60d4fc183801062
|