Token compression for LLM prompts
Project description
UNTOKEN
Token compression for LLM prompts. Reduces prompt length by ~70% while preserving semantic content.
UNTOKEN uses a learned token selector: given a sequence of N tokens, it returns a subsequence of ~0.3N tokens ranked by contextual importance. The model is a fine-tuned DistilBERT encoder with a lightweight importance head trained via an adversarial autoencoder objective.
Install
pip install untoken
Requires Python 3.10+ and PyTorch 2.1+. Works on CPU and GPU.
Quick Start
from untoken import Untoken
# Load from HuggingFace Hub
ut = Untoken("pacifio/untoken-v1")
text = """
The quarterly earnings report showed a significant increase in revenue,
driven primarily by strong performance in the cloud computing division.
Operating margins improved by 3.2 percentage points year-over-year,
reflecting continued efficiency gains and disciplined cost management
across all business segments. The board approved a share buyback program
worth $2 billion, signaling confidence in the company's long-term outlook.
"""
compressed = ut.compress(text)
print(compressed)
# quarterly earnings report showed significant increase revenue driven
# strong performance cloud computing division operating margins improved
# 3.2 percentage points year-over-year efficiency gains disciplined cost
# management business segments board approved share buyback $2 billion
# confidence company long-term outlook
Return Stats
compressed, stats = ut.compress(text, ratio=0.3, return_stats=True)
print(compressed)
print(stats)
# {
# "original_tokens": 128,
# "compressed_tokens": 39,
# "ratio": 0.305,
# "savings_pct": 69.5
# }
Adjustable Compression Ratio
The ratio parameter controls the fraction of tokens retained. Lower values compress more aggressively.
# Keep 50% of tokens (lighter compression)
compressed = ut.compress(text, ratio=0.5)
# Keep 20% of tokens (aggressive compression)
compressed = ut.compress(text, ratio=0.2)
No retraining required — the ratio is applied at inference time via top-k selection.
CLI
# Compress a file
untoken --model pacifio/untoken-v1 --input prompt.txt --ratio 0.3
# Output includes compression stats
# [512 → 154 tokens, 69.9% savings]
Long Documents
Documents exceeding 480 tokens are automatically chunked at sentence boundaries and compressed independently. No truncation occurs.
with open("long_document.txt") as f:
text = f.read()
# Works on arbitrarily long inputs
compressed = ut.compress(text, ratio=0.3)
Batch Usage
texts = [doc1, doc2, doc3, ...]
compressed_texts = [ut.compress(t, ratio=0.3) for t in texts]
Evaluation (CNN/DailyMail, n=200, ratio=0.3)
| Method | Cosine Sim | ROUGE-L | Compression Ratio |
|---|---|---|---|
| UNTOKEN | 0.878 | 0.459 | 0.304 |
| Random drop | 0.723 | 0.429 | 0.303 |
| Stopword removal | 0.933 | 0.824 | 0.761 |
UNTOKEN achieves +15.5pp cosine similarity over random token dropping at an equivalent compression ratio. Stopword removal retains 76% of tokens and is not a comparable operating point.
Model
The shipped artifact is a single ~300MB model: a DistilBERT encoder (66M parameters) with a 2-layer MLP importance head. The reconstructor and discriminator used during training are discarded at inference.
Model on HuggingFace: pacifio/untoken-v1
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file untoken-0.1.0.tar.gz.
File metadata
- Download URL: untoken-0.1.0.tar.gz
- Upload date:
- Size: 11.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0e1bbe99918a5aad5d02f9fdabf7f2c75a9c8968b25caa8207cbd46e843c9cc
|
|
| MD5 |
8ebc1e5b048082b801c5a3a533c4a671
|
|
| BLAKE2b-256 |
f3bf9fb95f30227e396232e00e8708aacad39f1073d9da17a0aeb31ea110886a
|
File details
Details for the file untoken-0.1.0-py3-none-any.whl.
File metadata
- Download URL: untoken-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bf3ae1ce22eff12dd6ccb07f2c0d2c11757f027d82d34debc5342c300b94305
|
|
| MD5 |
6aa57cb5feb6904aba10089f23a4d6da
|
|
| BLAKE2b-256 |
d89a5fe8842417696891d696cc05948b225a4a3e26f84a69d91590d031ddc8ae
|