Reduce LLM token usage with token counting, prompt compression, caching, and context filtering.
Project description
contpress
A practical Python toolkit for making every LLM token count.
contpress combines:
- Token counting and trimming with model-aware encodings.
- Token budget enforcement for input, output reserve, system prompts, tools, RAG context, and history.
- Compact prompt building for consistent, low-waste prompt blocks.
- Dependency-free extractive compression for safe first-pass prompt reduction.
- RAG context filtering with keyword and sentence relevance modes.
- Compact JSON, CSV, and table formatting for reducing structured-data tokens.
- Conversation memory pruning that keeps system prompts, recent messages, decisions, constraints, and relevant context.
- Output contract generation for concise response schemas.
- Prompt cache-aware formatting to keep stable prompt blocks grouped.
- Prompt and response caching surfaces including exact cache and optional semantic cache support.
- Usage reports with original tokens, optimized tokens, saved tokens, ratios, and methods.
- Optional dependencies so the base package stays lightweight.
Why Preflight Optimization for LLM Prompts?
LLM prompts often grow through repeated instructions, irrelevant retrieved chunks, verbose JSON, oversized chat history, and unbounded output requests:
Task, instructions, context, tools, and history
-> Count tokens against the target model
-> Reserve output budget
-> Format prompt blocks compactly
-> Filter retrieved context
-> Compress or trim only when needed
-> Return optimized text
-> Report savings and methods
-> Feed the result to any LLM client
contpress is designed to reduce token usage before a request is sent:
- Oversized prompts from untrimmed documents, code, logs, and retrieved chunks.
- Messy repeated instructions that waste tokens and reduce prompt clarity.
- Verbose structured data where compact JSON or tables are enough.
- RAG context bloat from chunks that are only loosely related to the query.
- Long conversation histories with filler, confirmations, and stale context.
- Unclear output budgets where responses are allowed to grow without a contract.
Architecture
Prompt inputs
- task
- instructions
- context
- conversation history
- output contract
|
v
Preflight optimization
- token counting
- budget enforcement
- compact prompt layout
- extractive compression
- RAG context filtering
- compact JSON / CSV / table formatting
- memory pruning
|
v
OptimizedPrompt
- text
- report dict
- original token count
- optimized token count
- saved tokens
- methods used
Install
pip install contpress
For LLMLingua prompt compression:
pip install "contpress[compress]"
For semantic cache support:
pip install "contpress[semantic]"
For RAG ecosystem integrations:
pip install "contpress[rag]"
For all optional integrations:
pip install "contpress[all]"
For development:
pip install -e ".[dev,all]"
pytest -q
python -m build
Quick Start
Optimize a Prompt
from contpress import ContextPress
cp = ContextPress(
model="gpt-4o-mini",
max_input_tokens=4000,
max_output_tokens=500,
)
optimized = cp.optimize(
task="Answer the user's question using the provided context.",
context=long_context,
instructions=[
"Be concise.",
"Use only relevant facts.",
"Return risks if uncertain.",
],
)
print(optimized.text)
print(optimized.report)
Token Counting
from contpress import TokenCounter
counter = TokenCounter(model="gpt-4o-mini")
print(counter.count("hello world"))
print(counter.fits("long text", budget=8000))
print(counter.trim("long text", max_tokens=1000))
Usage Report
from contpress import UsageReport
report = UsageReport(
model="gpt-4o-mini",
input_tokens_before=10200,
input_tokens_after=3400,
output_tokens_limit=500,
methods=["sentence_filter", "compact_json", "trim"],
)
print(report.summary())
CLI
Count tokens in a file:
contpress count README.md --model gpt-4o-mini
Trim a file to a maximum token count:
contpress trim prompt.txt --max-tokens 2000
Compress a prompt:
contpress compress prompt.txt --target-tokens 1000
Compact JSON:
contpress compact data.json
Generate a budget report:
contpress report prompt.txt --budget 8000
Main Features
1. Token Counting
Count, fit-check, and trim text using the target model encoding:
from contpress import TokenCounter
counter = TokenCounter(model="gpt-4o-mini")
tokens = counter.count(prompt)
2. Budget Enforcement
Reserve output tokens and account for system prompt or tool schema overhead:
from contpress import TokenBudget
budget = TokenBudget(
model="gpt-4o-mini",
max_input_tokens=8000,
reserve_output_tokens=1000,
system_prompt="You are concise.",
)
print(budget.input_budget)
3. Compact Prompt Builder
Build repeatable prompt blocks without verbose formatting:
from contpress import PromptBuilder
prompt = (
PromptBuilder()
.role("senior Python engineer")
.task("Refactor this code")
.constraints(["Preserve behaviour", "No new dependencies", "Keep diff small"])
.context(code)
.output(["patch", "risk notes", "test plan"])
.build()
)
4. Compact Structured Data
Reduce JSON and tabular context before sending it to an LLM:
from contpress import compact_json, compact_table, drop_nulls, shorten_keys
payload = drop_nulls(data)
payload = shorten_keys(payload, {"description": "d", "priority": "p"})
text = compact_json(payload)
5. Extractive Compression
Dependency-free compression keeps query-relevant sentences and preserves useful signals such as numbers, URLs, headings, code identifiers, and requirements:
from contpress import ExtractiveCompressor
short = ExtractiveCompressor().compress(
text=long_context,
query="How do I reduce LLM token usage?",
max_tokens=1200,
)
6. LLMLingua Compression
Use Microsoft LLMLingua when you install the compression extra:
from contpress.compressors import LLMLinguaCompressor
compressed = LLMLinguaCompressor().compress(
prompt=long_prompt,
instruction="Preserve code, numbers, entities, requirements, and constraints.",
target_tokens=1000,
)
Prompt compression can harm exact reasoning, code, legal wording, medical text, or maths. It is not always a free speedup; preprocessing overhead can outweigh gains for shorter prompts or mismatched model and hardware conditions.
7. RAG Context Filtering
Filter retrieved chunks before building the final prompt:
from contpress import ContextFilter
filtered = ContextFilter(model="gpt-4o-mini").filter(
query=user_question,
chunks=retrieved_chunks,
max_tokens=2500,
)
8. Conversation Memory Pruning
Keep system prompts, recent messages, relevant history, constraints, decisions, preferences, and file names:
from contpress import ConversationPruner
messages = ConversationPruner().prune(
messages=chat_history,
current_query="What changed in the latest code?",
max_tokens=3000,
)
9. Output Contracts
Generate compact response contracts:
from contpress import OutputContract
contract = OutputContract(
fields={"summary": "one sentence", "risks": "short list"},
).prompt()
10. Prompt Cache Layout
Group stable and volatile blocks to improve prompt-cache friendliness:
from contpress import PromptCacheLayout
prompt = (
PromptCacheLayout()
.stable("System", "You are a concise assistant.")
.stable("Rules", "Use only provided context.")
.volatile("User", user_question)
.build()
)
11. Tool and Agent Trace Compaction
Compact tool schemas and agent traces before placing them in context:
from contpress import AgentTraceCompactor, ToolSchemaCompactor
compact_schema = ToolSchemaCompactor(drop_descriptions=True).compact(tool_schema)
compact_trace = AgentTraceCompactor().compact(events)
Configuration
Tune prompt budgets with TokenBudget:
from contpress import TokenBudget
budget = TokenBudget(
model="gpt-4o-mini",
max_input_tokens=8000,
reserve_output_tokens=1000,
system_prompt="You are concise.",
tool_schema=compact_schema,
rag_context_ratio=0.6,
history_ratio=0.3,
)
Tune optimization with ContextPress:
from contpress import ContextPress
cp = ContextPress(
model="gpt-4o-mini",
max_input_tokens=6000,
reserve_output_tokens=800,
compression="extractive",
)
Examples
from contpress import ContextPress
cp = ContextPress(
model="gpt-4o-mini",
max_input_tokens=6000,
reserve_output_tokens=800,
)
optimized = cp.optimize(
task="Summarise the key issues in this codebase.",
context=repo_summary,
instructions=[
"Focus on bugs, security, maintainability, and performance.",
"Do not repeat obvious file names.",
"Return concise bullet points.",
],
)
print(optimized.report)
contpress count README.md
contpress report prompt.txt --budget 8000
Project Structure
src/contextpress/
__init__.py # Public API
core.py # ContextPress and OptimizedPrompt
tokenizer.py # TokenCounter
budgets.py # TokenBudget
builder.py # PromptBuilder
formatters.py # Compact JSON, CSV, and table helpers
reports.py # UsageReport
contracts.py # OutputContract
prompt_cache.py # PromptCacheLayout
tools.py # ToolSchemaCompactor and AgentTraceCompactor
cli.py # Command-line interface
py.typed # Typing marker
compressors/ # Extractive, sentence, LLMLingua, reports, diffs
rag/ # Chunking, reranking, context filtering
cache/ # Exact cache, semantic cache surface, stores
memory/ # Conversation pruning and summarization
tests/
test_*.py # Unit tests
.github/
workflows/
ci.yml # Tests and package build
publish.yml # PyPI publishing workflow
pyproject.toml # Project metadata and dependencies
contpress.png # Project logo
Development
# Install with dev extras
pip install -e ".[dev,all]"
# Run tests
pytest -q
# Build package
python -m build
Publishing
GitHub Actions includes:
CI: runs tests and builds the package on pushes and pull requests.Publish to PyPI: builds, checks, and publishes distributions when av*tag is pushed.
The publish workflow uses PyPI trusted publishing. Configure the PyPI project
with this GitHub repository, the pypi environment, and the
.github/workflows/publish.yml workflow before pushing a version tag.
Trusted publishing settings on PyPI must match:
- PyPI project name:
contpress - GitHub owner:
Arkay92 - GitHub repository:
ContextPress - Workflow name:
publish.yml - Environment name:
pypi
If publishing fails with 403 Invalid API Token: OIDC scoped token is not valid for project 'contpress', the workflow ran correctly but PyPI did not accept
the trusted publisher for that project. Delete and recreate the trusted publisher
on PyPI with the exact values above, including the pypi environment, then push
a new version tag such as v0.1.5.
License
MIT
Contributing
Contributions are welcome. Open an issue with the model, prompt shape, expected budget, and the optimization behavior you expected.
Citation
If you use contpress in research, please cite:
@software{contextpress2026,
title={contpress: A Practical Python Toolkit for Making Every LLM Token Count},
author={Arkay92},
url={https://github.com/Arkay92/ContextPress},
year={2026},
version={v0.1.5},
}
Acknowledgments
- tiktoken for fast model-aware tokenization.
- LLMLingua for optional prompt compression.
- LangChain and LlamaIndex for RAG compression patterns.
- FAISS and sentence-transformers for semantic cache building blocks.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contpress-0.1.5.tar.gz.
File metadata
- Download URL: contpress-0.1.5.tar.gz
- Upload date:
- Size: 768.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbbef65b98314de036661cfe038fbe9e7074d25e52be6635b7c7b03999a6f613
|
|
| MD5 |
cc1b4cb15573644b09462336a5614e86
|
|
| BLAKE2b-256 |
b1fe518e0a7446a639aaebd699446f9497bfcaa855a211e7eec3e4c30e58ed7c
|
Provenance
The following attestation bundles were made for contpress-0.1.5.tar.gz:
Publisher:
publish.yml on Arkay92/ContextPress
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
contpress-0.1.5.tar.gz -
Subject digest:
fbbef65b98314de036661cfe038fbe9e7074d25e52be6635b7c7b03999a6f613 - Sigstore transparency entry: 1814337098
- Sigstore integration time:
-
Permalink:
Arkay92/ContextPress@b74ce9373f937e299b423975f797891616fb3768 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/Arkay92
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b74ce9373f937e299b423975f797891616fb3768 -
Trigger Event:
push
-
Statement type:
File details
Details for the file contpress-0.1.5-py3-none-any.whl.
File metadata
- Download URL: contpress-0.1.5-py3-none-any.whl
- Upload date:
- Size: 24.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
144fbc3859a65ea1f89b595289d1a5ed237c61ad9e14bb41c689474c6e2adc5e
|
|
| MD5 |
d0a8d3c8b0eb4e76312f0607be026da7
|
|
| BLAKE2b-256 |
dfa84ee9f9d441156c95ca96e444283197a4b65f577cc2ba92067b58dfdc180b
|
Provenance
The following attestation bundles were made for contpress-0.1.5-py3-none-any.whl:
Publisher:
publish.yml on Arkay92/ContextPress
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
contpress-0.1.5-py3-none-any.whl -
Subject digest:
144fbc3859a65ea1f89b595289d1a5ed237c61ad9e14bb41c689474c6e2adc5e - Sigstore transparency entry: 1814337348
- Sigstore integration time:
-
Permalink:
Arkay92/ContextPress@b74ce9373f937e299b423975f797891616fb3768 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/Arkay92
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b74ce9373f937e299b423975f797891616fb3768 -
Trigger Event:
push
-
Statement type: