Python SDK for Compresr - Intelligent prompt compression service
Project description
Compresr Python SDK
Query-aware LLM context compression — reduce LLM API costs by 30-70%.
Install
pip install compresr
Get an API key at compresr.ai → Dashboard → API Keys.
Quick start
from compresr import CompressionClient
client = CompressionClient(api_key="cmp_your_api_key")
result = client.compress(
context="Long passage to compress...",
query="What is the main conclusion?",
target_compression_ratio=0.5,
)
print(f"Original: {result.data.original_tokens} tokens")
print(f"Compressed: {result.data.compressed_tokens} tokens")
print(f"Saved: {result.data.tokens_saved} tokens")
print(result.data.compressed_context)
The default model is latte_v1 (query-aware). Pass any other model name your
account has access to via compression_model_name="..." — the backend
validates.
Batch
Compress up to 100 contexts in one call. Pass a single query (applied to all) or a list of one query per context:
batch = client.compress_batch(
contexts=["Doc 1...", "Doc 2...", "Doc 3..."],
queries="What is self-attention?",
target_compression_ratio=0.5,
)
print(f"Total saved: {batch.data.total_tokens_saved} tokens")
Async + streaming
result = await client.compress_async(context="...", query="...")
for chunk in client.compress_stream(context="...", query="..."):
print(chunk.content, end="")
LLM-agnostic agent client
One CompressionClient, three provider-shape facades, one engine. Construct
the client with llm= and you get an agent surface where every tool output
is compressed automatically before the LLM sees it.
import os
from compresr import CompressionClient, WebSearchTool
client = CompressionClient(
api_key=os.environ["COMPRESR_API_KEY"],
llm="anthropic", # or "openai", "google_genai"
llm_api_key=os.environ["ANTHROPIC_API_KEY"],
compression={"target_compression_ratio": 0.5, "min_tokens": 300},
)
Use llm="anthropic:claude-haiku-4-5" if you want a default — but the
call-site model= always wins.
Three equivalent surfaces sit on the same client — the model lives at the call site, just like Anthropic's and OpenAI's own SDKs:
# Anthropic shape
client.messages.create(model="claude-haiku-4-5", max_tokens=512,
messages=[...], tools=[...])
# OpenAI shape
client.chat.completions.create(model="gpt-4o-mini", messages=[...], tools=[...])
# Native — returns a NormalizedResult
client.run(prompt="...", model="claude-haiku-4-5", tools=[...], max_tokens=512)
Behind all three sits LangChain 1.0's create_agent + CompresrToolMiddleware.
Tool outputs above min_tokens flow through client.compress(...) first.
Built-in web search
search = WebSearchTool.tavily(
api_key=os.environ["TAVILY_API_KEY"],
max_results=5,
allowed_domains=["nytimes.com", "reuters.com"], # optional
)
# Brave: WebSearchTool.brave(api_key=..., max_results=5)
Bring your own tool
Any @tool-decorated function works — its string output is compressed for you:
from langchain_core.tools import tool
@tool
def kb_lookup(topic: str) -> str:
"""Look up the internal policy on the given topic."""
return INTERNAL_KB.get(topic, "Not found.")
client.messages.create(model="claude-haiku-4-5", max_tokens=256,
messages=[{"role": "user", "content": "Refund policy?"}],
tools=[kb_lookup])
Switch providers with one line: llm="openai" instead of
llm="anthropic" (then pass the model at the call site). Tools and
code are unchanged.
Per-call LLM knobs
Pass temperature, top_p, max_tokens, stop_sequences,
presence_penalty, frequency_penalty, seed, etc. to any facade — they're
forwarded to the underlying chat model via .bind(...) per call, so the
cached chat model is never mutated:
client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
temperature=0.2,
top_p=0.9,
messages=[...],
)
Gemini's max_output_tokens is aliased automatically when targeting
llm="google_genai:...".
Why not provider-native server search? Anthropic's web_search_20250305,
OpenAI's web_search_preview, and Gemini's google_search run server-side
and return encrypted/opaque content that Compresr cannot read or compress.
Use Tavily or Brave so the result is plaintext we can compress.
Compression options
| Param | Purpose |
|---|---|
query |
Question the LLM is trying to answer — drives latte_v1 compression |
target_compression_ratio |
0-1 strength (e.g. 0.5 = remove 50%) or >1 for Nx factor (4 = 4x). Backend max: 200 |
coarse |
True for paragraph-level (default, faster), False for token-level (fine-grained) |
heuristic_chunking |
Structure-preserving chunking |
disable_placeholders |
Disable placeholder tokens in output |
Error handling
from compresr.exceptions import (
CompresrError,
AuthenticationError,
RateLimitError,
ValidationError,
)
try:
result = client.compress(context="...", query="...")
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limit exceeded")
except ValidationError as e:
print(f"Invalid request: {e}")
except CompresrError as e:
print(f"API error: {e}")
Framework integrations
The agents layer ships in the base install — pip install compresr is enough to get CompressionClient, all three provider chat models (Anthropic / OpenAI / Gemini), and both web search tools (Tavily + Brave).
Genuinely optional integrations beyond the agents layer:
| Extra | Pulls in |
|---|---|
compresr[langgraph] |
langgraph (LangGraph checkpoint serializer, store, handoff tool) |
compresr[llamaindex] |
llama-index-core (node postprocessor, memory block, tool wrapper) |
compresr[litellm] |
litellm[proxy] (LiteLLM proxy guardrail) |
compresr[all] |
all three above |
pip install "compresr[langgraph]"
Old compresr[agents] / compresr[agents-anthropic] / compresr[agents-all] / compresr[langchain] install commands still resolve (no-op extras kept for back-compat) — everything they used to pull in is now in the base install.
LangChain — middleware + tool wrapper + retriever
from langchain.agents import create_agent
from compresr.integrations.langchain import (
CompresrToolMiddleware,
wrap_tool_with_compression,
CompresrExtractor,
)
agent = create_agent(
model=model,
tools=[web_search],
middleware=[CompresrToolMiddleware(
api_key=os.environ["COMPRESR_API_KEY"],
query_arg="query",
)],
)
LangGraph — compression as a graph node
from compresr.integrations.langgraph import make_compresr_node
graph.add_node("compress", make_compresr_node(
api_key=os.environ["COMPRESR_API_KEY"],
context_key="retrieved_text",
query_key="user_question",
))
LlamaIndex — node postprocessor for RAG
from compresr.integrations.llamaindex import CompresrNodePostprocessor
query_engine = index.as_query_engine(
node_postprocessors=[CompresrNodePostprocessor(
api_key=os.environ["COMPRESR_API_KEY"],
)],
)
Unified query API
Every integration that accepts a query exposes the same three knobs:
| Param | Purpose |
|---|---|
query |
Static query — same for every call |
query_extractor |
Callable that derives the query from the call context |
query_arg / query_key |
Name of the tool arg / state key to use as the query |
Priority: query > query_extractor > query_arg/query_key > smart-pick
from common arg keys (query, question, search_query, ...) > last human
message in history.
Tutorials
Runnable Jupyter notebooks under tutorial/:
01_quickstart.ipynb— coreCompressionClient.02_langchain.ipynb— middleware + tool wrapper + retriever.03_langgraph.ipynb— compression node in a 3-node graph.04_llamaindex.ipynb— node postprocessor + tool wrapper.05_compresr_agents.ipynb— agent client (Anthropic/OpenAI/native shapes) with auto-compressed tool output.
Requirements
- Python 3.9+
httpx >= 0.27.0pydantic >= 2.10.0- Optional:
langchain>=1.0,langgraph>=0.2,llama-index-core>=0.11(install the matching extra)
License
Apache 2.0 — see LICENSE.
Support
- Docs: compresr.ai/docs
- Issues: GitHub
- Email: support@compresr.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file compresr-2.6.4.tar.gz.
File metadata
- Download URL: compresr-2.6.4.tar.gz
- Upload date:
- Size: 70.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9adf3352ee7d7bfd505d0e38f5dd9f02ef5fd8bedda16fa732a0c1016d7335f
|
|
| MD5 |
86cf5ef7f752c3737fc326694c605f90
|
|
| BLAKE2b-256 |
1e05183deb9a4c0e44c594cbcc76736fa0517b980af7b62fd8241f84f2ab872e
|
File details
Details for the file compresr-2.6.4-py3-none-any.whl.
File metadata
- Download URL: compresr-2.6.4-py3-none-any.whl
- Upload date:
- Size: 89.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c93406bcc8960b06dfde56217f69b0615af76c9f42ee9edb0c489eb4c857d49
|
|
| MD5 |
a39442f3f5d0d953df162e1c7710dc35
|
|
| BLAKE2b-256 |
d78535948812b0814d9eac7c076e3f78540f109d87dfc2c4570441ca2c1be239
|