Skip to main content

Token compression for LLM prompts

Project description

UNTOKEN

Token compression for LLM prompts. Reduces prompt length by ~70% while preserving semantic content.

UNTOKEN uses a learned token selector: given a sequence of N tokens, it returns a subsequence of ~0.3N tokens ranked by contextual importance. The model is a fine-tuned DistilBERT encoder with a lightweight importance head trained via an adversarial autoencoder objective.

Install

pip install untoken

Requires Python 3.10+ and PyTorch 2.1+. Works on CPU and GPU.

Quick Start

from untoken import Untoken

# Load from HuggingFace Hub
ut = Untoken("pacifio/untoken-v1")

text = """
The quarterly earnings report showed a significant increase in revenue,
driven primarily by strong performance in the cloud computing division.
Operating margins improved by 3.2 percentage points year-over-year,
reflecting continued efficiency gains and disciplined cost management
across all business segments. The board approved a share buyback program
worth $2 billion, signaling confidence in the company's long-term outlook.
"""

compressed = ut.compress(text)
print(compressed)
# quarterly earnings report showed significant increase revenue driven
# strong performance cloud computing division operating margins improved
# 3.2 percentage points year-over-year efficiency gains disciplined cost
# management business segments board approved share buyback $2 billion
# confidence company long-term outlook

Return Stats

compressed, stats = ut.compress(text, ratio=0.3, return_stats=True)

print(compressed)
print(stats)
# {
#   "original_tokens": 128,
#   "compressed_tokens": 39,
#   "ratio": 0.305,
#   "savings_pct": 69.5
# }

Adjustable Compression Ratio

The ratio parameter controls the fraction of tokens retained. Lower values compress more aggressively.

# Keep 50% of tokens (lighter compression)
compressed = ut.compress(text, ratio=0.5)

# Keep 20% of tokens (aggressive compression)
compressed = ut.compress(text, ratio=0.2)

No retraining required — the ratio is applied at inference time via top-k selection.

CLI

# Compress a file
untoken --model pacifio/untoken-v1 --input prompt.txt --ratio 0.3

# Output includes compression stats
# [512 → 154 tokens, 69.9% savings]

Long Documents

Documents exceeding 480 tokens are automatically chunked at sentence boundaries and compressed independently. No truncation occurs.

with open("long_document.txt") as f:
    text = f.read()

# Works on arbitrarily long inputs
compressed = ut.compress(text, ratio=0.3)

Batch Usage

texts = [doc1, doc2, doc3, ...]

compressed_texts = [ut.compress(t, ratio=0.3) for t in texts]

Evaluation (CNN/DailyMail, n=200, ratio=0.3)

Method Cosine Sim ROUGE-L Compression Ratio
UNTOKEN 0.878 0.459 0.304
Random drop 0.723 0.429 0.303
Stopword removal 0.933 0.824 0.761

UNTOKEN achieves +15.5pp cosine similarity over random token dropping at an equivalent compression ratio. Stopword removal retains 76% of tokens and is not a comparable operating point.

Model

The shipped artifact is a single ~300MB model: a DistilBERT encoder (66M parameters) with a 2-layer MLP importance head. The reconstructor and discriminator used during training are discarded at inference.

Model on HuggingFace: pacifio/untoken-v1

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

untoken-0.1.0.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

untoken-0.1.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file untoken-0.1.0.tar.gz.

File metadata

  • Download URL: untoken-0.1.0.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for untoken-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b0e1bbe99918a5aad5d02f9fdabf7f2c75a9c8968b25caa8207cbd46e843c9cc
MD5 8ebc1e5b048082b801c5a3a533c4a671
BLAKE2b-256 f3bf9fb95f30227e396232e00e8708aacad39f1073d9da17a0aeb31ea110886a

See more details on using hashes here.

File details

Details for the file untoken-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: untoken-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for untoken-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5bf3ae1ce22eff12dd6ccb07f2c0d2c11757f027d82d34debc5342c300b94305
MD5 6aa57cb5feb6904aba10089f23a4d6da
BLAKE2b-256 d89a5fe8842417696891d696cc05948b225a4a3e26f84a69d91590d031ddc8ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page