Slash LLM input tokens by 70-80% — compress prompts, code, and conversations for Claude, GPT-4, and any LLM without losing meaning

These details have not been verified by PyPI

Project links

Project description

tokenpruner

Slash LLM input tokens by 70–80% without losing meaning.

tokenpruner compresses prompts, code context, and multi-turn conversations before they are sent to Claude, GPT-4, or any LLM — reducing cost and latency with zero external model dependencies.

from tokenpruner import TextPruner, PruningConfig, PruningStrategy

pruner = TextPruner(PruningConfig(strategy=PruningStrategy.COMPOSITE))
result = pruner.prune(my_12000_token_prompt)
print(result)
# PruningResult(original=12400tok, pruned=2700tok, saved=78%)

Why tokenpruner?

Pain point	tokenpruner solution
Long codebase context sent to Claude	Code minification strips comments + whitespace (40–60% reduction)
Repeated boilerplate in system prompts	Template stripping removes redundant instructions
Near-duplicate RAG chunks	Jaccard-based deduplication before embedding
Long conversation history	Smart sliding-window + semantic compression of older turns
Uncontrolled token spend	Rate limiter + circuit breaker protect every API call
Batch processing at scale	Async batch with bounded concurrency and per-item error isolation

Installation

pip install tokenpruner

# Optional: exact token counts via tiktoken
pip install "tokenpruner[tiktoken]"

Quick start

Compress a single prompt

from tokenpruner import TextPruner, PruningConfig, PruningStrategy

# COMPOSITE runs: template_strip → code_minify → dedup → semantic → sliding_window
pruner = TextPruner(PruningConfig(
    strategy=PruningStrategy.COMPOSITE,
    target_ratio=0.25,   # keep 25% of tokens = 75% reduction
))

result = pruner.prune(long_prompt)
print(f"Saved {result.reduction_ratio:.0%} — {result.tokens_saved} tokens")

Prune a full conversation

from tokenpruner import ConversationPruner, Message, PruningConfig

msgs = [Message(role=m["role"], content=m["content"]) for m in conversation]

pruner = ConversationPruner(
    PruningConfig(max_tokens=8000),
    keep_recent_turns=3,  # last 3 exchanges verbatim
)
result = pruner.prune(msgs)
pruned_dicts = [{"role": m.role, "content": m.content} for m in result.pruned_messages]

Drop-in Claude adapter

import anthropic
from tokenpruner import PruningConfig, PruningStrategy
from tokenpruner.adapters.claude import ClaudeAdapter

client = anthropic.Anthropic()
adapter = ClaudeAdapter(client, config=PruningConfig(max_tokens=8000))

response, meta = adapter.messages_create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": giant_codebase_dump}],
    system="You are a code reviewer.",
)
print(f"Tokens saved: {meta['messages_reduction']:.0%}")

Drop-in OpenAI adapter

import openai
from tokenpruner import PruningConfig
from tokenpruner.adapters.openai import OpenAIAdapter

client = openai.OpenAI()
adapter = OpenAIAdapter(client, config=PruningConfig(max_tokens=6000))

response, meta = adapter.chat_completions_create(
    model="gpt-4o",
    messages=[{"role": "user", "content": long_context}],
)

Strategies

Strategy	Best for	Typical reduction
`COMPOSITE`	General prompts, mixed content	60–80%
`CODE_MINIFY`	Code files, diffs	40–60%
`SEMANTIC`	Long documents, RAG chunks	40–70%
`DEDUP`	Repeated passages, few-shot examples	30–70%
`TEMPLATE_STRIP`	System prompts with boilerplate	20–40%
`SLIDING_WINDOW`	Long conversation history	Configurable
`TRUNCATE`	Hard budget enforcement	Configurable
`SUMMARY`	Semantic + dedup combined	50–75%

Advanced features

Smart Cache (LRU + TTL)

from tokenpruner import SmartCache, TextPruner

cache: SmartCache = SmartCache(maxsize=256, ttl=300)
pruner = TextPruner()

@cache.memoize
def cached_prune(text: str):
    return pruner.prune(text)

cached_prune(prompt)   # computes
cached_prune(prompt)   # cache hit — free
print(cache.stats())   # {'hits': 1, 'misses': 1, 'hit_rate': 0.5, ...}

Pipeline with audit log and retry

from tokenpruner import PruningPipeline, PruningConfig, PruningStrategy, TextPruner

make_pruner = lambda s: lambda t: TextPruner(PruningConfig(strategy=s)).prune(t).pruned_text

pipeline = (
    PruningPipeline()
    .add_step("dedup", make_pruner(PruningStrategy.DEDUP))
    .add_step("semantic", make_pruner(PruningStrategy.SEMANTIC))
    .with_retry(n=2, backoff=0.5)
)

result, audit = pipeline.run(long_text)
print(audit)  # per-step token counts, duration, errors

# Async
result, audit = await pipeline.arun(long_text)

Declarative validator

from tokenpruner import PruningValidator

validator = (
    PruningValidator()
    .require_min_length(10)
    .require_max_tokens(100_000)
    .require_no_secrets()
    .add_rule("no_pii", lambda t: "@" not in t, "Email address detected")
)

errors = validator.validate(prompt)   # {} = valid
await validator.avalidate(prompt)     # async version
validator.validate_or_raise(prompt)   # raises ValidationError if invalid

Async batch processing

from tokenpruner import async_batch, batch, PruningConfig

# Sync
results = batch(list_of_1000_prompts, concurrency=16)

# Async
results = await async_batch(list_of_1000_prompts, concurrency=32)

# Per-item errors are isolated — one bad item doesn't abort the batch
for r in results:
    if isinstance(r, Exception):
        print("Item failed:", r)
    else:
        print(f"Saved {r.reduction_ratio:.0%}")

Rate limiter

from tokenpruner import RateLimiter, get_rate_limiter

# Global limiter
limiter = RateLimiter(rate=10, capacity=10)
with limiter:
    result = pruner.prune(text)

async with limiter:
    result = pruner.prune(text)

# Per-key limiting (e.g. per user/API key)
limiter = get_rate_limiter("user-abc", rate=5, capacity=5)
print(limiter.stats())

Streaming for large documents

from tokenpruner import stream_prune, async_stream_prune

# Sync streaming
for chunk_result in stream_prune(huge_document, chunk_size=2000):
    send_to_llm(chunk_result.pruned_text)

# Async streaming (cancellation-safe)
async for chunk_result in async_stream_prune(huge_document, chunk_size=2000):
    await send_to_llm(chunk_result.pruned_text)

Diff engine

from tokenpruner import diff_prune

diff = diff_prune(original_prompt)
print(diff.summary())
# Pruning Summary
#   Original : 12,400 tokens, 312 lines
#   Pruned   : 2,730 tokens, 68 lines
#   Removed  : 9,670 tokens (78.0%)
#   ...

data = diff.to_json()   # machine-readable dict

Circuit breaker

from tokenpruner import CircuitBreaker

cb = CircuitBreaker(failure_threshold=5, reset_timeout=30)

@cb.protect
def call_llm_api(prompt: str) -> str:
    ...  # your API call

# Or inline
result = cb.call(pruner.prune, long_text)

print(cb.stats())
# {'state': 'closed', 'failures': 0, 'total_calls': 42, 'rejected_calls': 0}

Compression techniques

tokenpruner applies these evidence-based techniques:

Template stripping — removes You are a helpful assistant and empty XML tags (20–40%)
Code minification — strips comments, normalises whitespace (40–60% on code)
Jaccard deduplication — removes near-duplicate sentences (30–70%)
Heuristic semantic scoring — keeps high-value sentences (keyword density, position, structure)
Sliding window — retains a prefix anchor + most-recent suffix
Hard truncation — deterministic budget enforcement

Optional: exact token counting

from tokenpruner.utils.tokenizers import count_tokens_exact

# Uses tiktoken if installed, falls back to fast estimate
n = count_tokens_exact("hello world", model="cl100k_base")

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenpruner-1.0.0.tar.gz (28.7 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenpruner-1.0.0-py3-none-any.whl (25.2 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file tokenpruner-1.0.0.tar.gz.

File metadata

Download URL: tokenpruner-1.0.0.tar.gz
Upload date: Apr 10, 2026
Size: 28.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for tokenpruner-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`e1ccf7e8faaf9cf29b08cb750cf00d05aa2e2ab81ef37330c950d1e39af154b5`
MD5	`a39d2275c0cfdd5144b18f7f045c821a`
BLAKE2b-256	`0d8649c0d971819b65fc3f7490fed27f3aae3e58f1e029ef04dc1ab3a1039f5c`

See more details on using hashes here.

File details

Details for the file tokenpruner-1.0.0-py3-none-any.whl.

File metadata

Download URL: tokenpruner-1.0.0-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 25.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for tokenpruner-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c6ed8cee89a85cabe437baf246333b65bf09ec935cb9fff72b31b29f811cb63e`
MD5	`d781659d8cf7a9765a6559c99f208122`
BLAKE2b-256	`cf4def0b0b984e5ff569b9bdf8326ad7b45fd8e1974f80eb6dcf2416fc493c37`

See more details on using hashes here.

tokenpruner 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tokenpruner

Why tokenpruner?

Installation

Quick start

Compress a single prompt

Prune a full conversation

Drop-in Claude adapter

Drop-in OpenAI adapter

Strategies

Advanced features

Smart Cache (LRU + TTL)

Pipeline with audit log and retry

Declarative validator

Async batch processing

Rate limiter

Streaming for large documents

Diff engine

Circuit breaker

Compression techniques

Optional: exact token counting

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes