Skip to main content

Blazing-fast LLM token compression engine. Built in Rust. Reduce API costs 2-5x with mathematically provable zero quality loss.

Project description

TokenPress

TokenPress

Blazing-fast LLM token compression engine, built in Rust.
2–5× fewer tokens. Same or better output quality. Any model, any language.

PyPI Downloads License CI Rust Python


Why TokenPress?

LLM API calls are expensive. Most prompt tokens are redundant — filler words, predictable syntax, boilerplate. TokenPress uses information theory to score every token by its surprise value, keeps only what matters, and sends a compressed prompt to the LLM.

  • 18.6 M tok/s Rust core (PyO3 + rayon)
  • 3.3× compression with near-zero quality loss
  • Drop-in wrappers for OpenAI & Anthropic — zero code changes
  • 6 selection strategies: ratio, top-k, threshold, percentile, IQR, token merging
  • Language & model agnostic — works on any text, any LLM

Install

pip install tokenpress              # core
pip install "tokenpress[models]"    # + scoring model (distilgpt2)
pip install "tokenpress[all]"       # + OpenAI & Anthropic wrappers

Quick Start

import tokenpress

result = tokenpress.compress("Your very long prompt …", ratio=0.3)
print(result.compressed_text)       # compressed prompt
print(f"{result.compression_ratio:.1f}× smaller, {result.savings_percentage:.0f}% tokens saved")

Drop-in OpenAI wrapper:

import openai, tokenpress

client = tokenpress.wrap(openai.OpenAI(), ratio=0.3)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": very_long_prompt}],
)
# same API, 3× cheaper

CLI:

tokenpress compress document.txt -r 0.3 -o compressed.txt
tokenpress bench document.txt

How It Works

Each token is scored by self-information: $I(x_i) = -\log_2 P(x_i \mid x_{<i})$

High surprise → keep. Low surprise → remove. Pure math, no heuristics.

Built on research from Selective Context (EMNLP '23), TRIM (COLING '25), Token Merging (ICLR '23), H2O (NeurIPS '24), and LLMLingua-2 (ACL '24).

Links

  • GitHub — full docs, benchmarks, architecture
  • Benchmarks — 7 reproducible benchmark suites
  • License — Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenpress-0.1.1.tar.gz (131.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenpress-0.1.1-cp311-cp311-win_amd64.whl (163.8 kB view details)

Uploaded CPython 3.11Windows x86-64

File details

Details for the file tokenpress-0.1.1.tar.gz.

File metadata

  • Download URL: tokenpress-0.1.1.tar.gz
  • Upload date:
  • Size: 131.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for tokenpress-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d58a619639e9be544c47de2d72e2c44fcb42b26020e073505da3066540b140ba
MD5 b25b9ca871e2767104653d9df111f0a3
BLAKE2b-256 dd8a162aa2dbd041d09f98802568b908e30c5b76b6152f4cbe71bc2eadad5756

See more details on using hashes here.

File details

Details for the file tokenpress-0.1.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for tokenpress-0.1.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 11e31420ced7d9b1e3a5c0245c91015a4d4885a802da7691461f261ec406e240
MD5 b77ea4883ade9579078a6e03e23c75cb
BLAKE2b-256 c4ba9cc2fa4a8fa12d3596dac17897e7561c9fe077cfe0d8ad34fe3349658743

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page