Token compression for LLM prompts

These details have not been verified by PyPI

Project links

Homepage

Project description

UNTOKEN

Token compression for LLM prompts. Reduces prompt length by ~70% while preserving semantic content.

UNTOKEN uses a learned token selector: given a sequence of N tokens, it returns a subsequence of ~0.3N tokens ranked by contextual importance. The model is a fine-tuned DistilBERT encoder with a lightweight importance head trained via an adversarial autoencoder objective.

Install

pip install untoken

Requires Python 3.10+ and PyTorch 2.1+. Works on CPU and GPU.

Quick Start

from untoken import Untoken

# Load from HuggingFace Hub
ut = Untoken("pacifio/untoken-v1")

text = """
The quarterly earnings report showed a significant increase in revenue,
driven primarily by strong performance in the cloud computing division.
Operating margins improved by 3.2 percentage points year-over-year,
reflecting continued efficiency gains and disciplined cost management
across all business segments. The board approved a share buyback program
worth $2 billion, signaling confidence in the company's long-term outlook.
"""

compressed = ut.compress(text)
print(compressed)
# quarterly earnings report showed significant increase revenue driven
# strong performance cloud computing division operating margins improved
# 3.2 percentage points year-over-year efficiency gains disciplined cost
# management business segments board approved share buyback $2 billion
# confidence company long-term outlook

Return Stats

compressed, stats = ut.compress(text, ratio=0.3, return_stats=True)

print(compressed)
print(stats)
# {
#   "original_tokens": 128,
#   "compressed_tokens": 39,
#   "ratio": 0.305,
#   "savings_pct": 69.5
# }

Adjustable Compression Ratio

The ratio parameter controls the fraction of tokens retained. Lower values compress more aggressively.

# Keep 50% of tokens (lighter compression)
compressed = ut.compress(text, ratio=0.5)

# Keep 20% of tokens (aggressive compression)
compressed = ut.compress(text, ratio=0.2)

No retraining required — the ratio is applied at inference time via top-k selection.

CLI

# Compress a file
untoken --model pacifio/untoken-v1 --input prompt.txt --ratio 0.3

# Output includes compression stats
# [512 → 154 tokens, 69.9% savings]

Long Documents

Documents exceeding 480 tokens are automatically chunked at sentence boundaries and compressed independently. No truncation occurs.

with open("long_document.txt") as f:
    text = f.read()

# Works on arbitrarily long inputs
compressed = ut.compress(text, ratio=0.3)

Batch Usage

texts = [doc1, doc2, doc3, ...]

compressed_texts = [ut.compress(t, ratio=0.3) for t in texts]

Evaluation (CNN/DailyMail, n=200, ratio=0.3)

Method	Cosine Sim	ROUGE-L	Compression Ratio
UNTOKEN	0.878	0.459	0.304
Random drop	0.723	0.429	0.303
Stopword removal	0.933	0.824	0.761

UNTOKEN achieves +15.5pp cosine similarity over random token dropping at an equivalent compression ratio. Stopword removal retains 76% of tokens and is not a comparable operating point.

Model

The shipped artifact is a single ~300MB model: a DistilBERT encoder (66M parameters) with a 2-layer MLP importance head. The reconstructor and discriminator used during training are discarded at inference.

Model on HuggingFace: pacifio/untoken-v1

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.2.1

Mar 10, 2026

0.2.0

Mar 10, 2026

This version

0.1.0

Mar 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

untoken-0.1.0.tar.gz (11.0 kB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

untoken-0.1.0-py3-none-any.whl (7.0 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file untoken-0.1.0.tar.gz.

File metadata

Download URL: untoken-0.1.0.tar.gz
Upload date: Mar 4, 2026
Size: 11.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for untoken-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b0e1bbe99918a5aad5d02f9fdabf7f2c75a9c8968b25caa8207cbd46e843c9cc`
MD5	`8ebc1e5b048082b801c5a3a533c4a671`
BLAKE2b-256	`f3bf9fb95f30227e396232e00e8708aacad39f1073d9da17a0aeb31ea110886a`

See more details on using hashes here.

File details

Details for the file untoken-0.1.0-py3-none-any.whl.

File metadata

Download URL: untoken-0.1.0-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 7.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for untoken-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5bf3ae1ce22eff12dd6ccb07f2c0d2c11757f027d82d34debc5342c300b94305`
MD5	`6aa57cb5feb6904aba10089f23a4d6da`
BLAKE2b-256	`d89a5fe8842417696891d696cc05948b225a4a3e26f84a69d91590d031ddc8ae`

See more details on using hashes here.

untoken 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

UNTOKEN

Install

Quick Start

Return Stats

Adjustable Compression Ratio

CLI

Long Documents

Batch Usage

Evaluation (CNN/DailyMail, n=200, ratio=0.3)

Model

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes