Skip to main content

Blazing-fast LLM token compression engine. Built in Rust. Reduce API costs 2-5x with mathematically provable zero quality loss.

Project description

TokenPress

TokenPress

Blazing-fast LLM token compression engine, built in Rust.
2–5× fewer tokens. Same or better output quality. Any model, any language.


Why TokenPress?

LLM API calls are expensive. Most prompt tokens are redundant — filler words, predictable syntax, boilerplate. TokenPress uses information theory to score every token by its surprise value, keeps only what matters, and sends a compressed prompt to the LLM.

  • 18.6 M tok/s Rust core (PyO3 + rayon)
  • 3.3× compression with near-zero quality loss
  • Drop-in wrappers for OpenAI & Anthropic — zero code changes
  • 6 selection strategies: ratio, top-k, threshold, percentile, IQR, token merging
  • Language & model agnostic — works on any text, any LLM

Install

pip install tokenpress              # core
pip install "tokenpress[models]"    # + scoring model (distilgpt2)
pip install "tokenpress[all]"       # + OpenAI & Anthropic wrappers

Quick Start

import tokenpress

result = tokenpress.compress("Your very long prompt …", ratio=0.3)
print(result.compressed_text)       # compressed prompt
print(f"{result.compression_ratio:.1f}× smaller, {result.savings_percentage:.0f}% tokens saved")

Drop-in OpenAI wrapper:

import openai, tokenpress

client = tokenpress.wrap(openai.OpenAI(), ratio=0.3)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": very_long_prompt}],
)
# same API, 3× cheaper

CLI:

tokenpress compress document.txt -r 0.3 -o compressed.txt
tokenpress bench document.txt

How It Works

Each token is scored by self-information: $I(x_i) = -\log_2 P(x_i \mid x_{<i})$

High surprise → keep. Low surprise → remove. Pure math, no heuristics.

Built on research from Selective Context (EMNLP '23), TRIM (COLING '25), Token Merging (ICLR '23), H2O (NeurIPS '24), and LLMLingua-2 (ACL '24).

Links

  • GitHub — full docs, benchmarks, architecture
  • Benchmarks — 7 reproducible benchmark suites
  • License — Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenpress-0.1.0.tar.gz (130.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenpress-0.1.0-cp311-cp311-win_amd64.whl (163.5 kB view details)

Uploaded CPython 3.11Windows x86-64

File details

Details for the file tokenpress-0.1.0.tar.gz.

File metadata

  • Download URL: tokenpress-0.1.0.tar.gz
  • Upload date:
  • Size: 130.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for tokenpress-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3d56a371f03b194ed4642aaad8cc425d731a4cd5b9dd277d32513eb828ee75f1
MD5 e097d38e0adfb2eaa2c5cfa721d6f5d8
BLAKE2b-256 d52c10ab0882f1fb37a0aa23c0547de0c494d6e37e38038dd2d198ae41191b33

See more details on using hashes here.

File details

Details for the file tokenpress-0.1.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for tokenpress-0.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 67e03991d755b2e11e73741c1881eb47e2e118863d144af5a8f57d662a13b686
MD5 539d3d586a34cb69fe723763074797ab
BLAKE2b-256 f4b0efeae5be489c907396c60577295d54f75539714518b6e60d4fc183801062

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page