Skip to main content

Python SDK for Compresr - Intelligent prompt compression service

Project description

Compresr Python SDK

Query-aware LLM context compression — reduce LLM API costs by 30-70%.

Install

pip install compresr

Get an API key at compresr.ai → Dashboard → API Keys.

Quick start

from compresr import CompressionClient

client = CompressionClient(api_key="cmp_your_api_key")

result = client.compress(
    context="Long passage to compress...",
    query="What is the main conclusion?",
    target_compression_ratio=0.5,
)

print(f"Original:   {result.data.original_tokens} tokens")
print(f"Compressed: {result.data.compressed_tokens} tokens")
print(f"Saved:      {result.data.tokens_saved} tokens")
print(result.data.compressed_context)

The default model is latte_v1 (query-aware). Pass any other model name your account has access to via compression_model_name="..." — the backend validates.

Batch

Compress up to 100 contexts in one call. Pass a single query (applied to all) or a list of one query per context:

batch = client.compress_batch(
    contexts=["Doc 1...", "Doc 2...", "Doc 3..."],
    queries="What is self-attention?",
    target_compression_ratio=0.5,
)

print(f"Total saved: {batch.data.total_tokens_saved} tokens")

Async + streaming

result = await client.compress_async(context="...", query="...")

for chunk in client.compress_stream(context="...", query="..."):
    print(chunk.content, end="")

LLM-agnostic agent client

One CompressionClient, three provider-shape facades, one engine. Construct the client with llm= and you get an agent surface where every tool output is compressed automatically before the LLM sees it.

import os
from compresr import CompressionClient, WebSearchTool

client = CompressionClient(
    api_key=os.environ["COMPRESR_API_KEY"],
    llm="anthropic",                        # or "openai", "google_genai"
    llm_api_key=os.environ["ANTHROPIC_API_KEY"],
    compression={"target_compression_ratio": 0.5, "min_tokens": 300},
)

Use llm="anthropic:claude-haiku-4-5" if you want a default — but the call-site model= always wins.

Three equivalent surfaces sit on the same client — the model lives at the call site, just like Anthropic's and OpenAI's own SDKs:

# Anthropic shape
client.messages.create(model="claude-haiku-4-5", max_tokens=512,
                       messages=[...], tools=[...])

# OpenAI shape
client.chat.completions.create(model="gpt-4o-mini", messages=[...], tools=[...])

# Native — returns a NormalizedResult
client.run(prompt="...", model="claude-haiku-4-5", tools=[...], max_tokens=512)

Behind all three sits LangChain 1.0's create_agent + CompresrToolMiddleware. Tool outputs above min_tokens flow through client.compress(...) first.

Built-in web search

search = WebSearchTool.tavily(
    api_key=os.environ["TAVILY_API_KEY"],
    max_results=5,
    allowed_domains=["nytimes.com", "reuters.com"],   # optional
)
# Brave: WebSearchTool.brave(api_key=..., max_results=5)

Bring your own tool

Any @tool-decorated function works — its string output is compressed for you:

from langchain_core.tools import tool

@tool
def kb_lookup(topic: str) -> str:
    """Look up the internal policy on the given topic."""
    return INTERNAL_KB.get(topic, "Not found.")

client.messages.create(model="claude-haiku-4-5", max_tokens=256,
                       messages=[{"role": "user", "content": "Refund policy?"}],
                       tools=[kb_lookup])

Switch providers with one line: llm="openai" instead of llm="anthropic" (then pass the model at the call site). Tools and code are unchanged.

Per-call LLM knobs

Pass temperature, top_p, max_tokens, stop_sequences, presence_penalty, frequency_penalty, seed, etc. to any facade — they're forwarded to the underlying chat model via .bind(...) per call, so the cached chat model is never mutated:

client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    temperature=0.2,
    top_p=0.9,
    messages=[...],
)

Gemini's max_output_tokens is aliased automatically when targeting llm="google_genai:...".

Why not provider-native server search? Anthropic's web_search_20250305, OpenAI's web_search_preview, and Gemini's google_search run server-side and return encrypted/opaque content that Compresr cannot read or compress. Use Tavily or Brave so the result is plaintext we can compress.

Compression options

Param Purpose
query Question the LLM is trying to answer — drives latte_v1 compression
target_compression_ratio 0-1 strength (e.g. 0.5 = remove 50%) or >1 for Nx factor (4 = 4x). Backend max: 200
coarse True for paragraph-level (default, faster), False for token-level (fine-grained)
heuristic_chunking Structure-preserving chunking
disable_placeholders Disable placeholder tokens in output

Error handling

from compresr.exceptions import (
    CompresrError,
    AuthenticationError,
    RateLimitError,
    ValidationError,
)

try:
    result = client.compress(context="...", query="...")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit exceeded")
except ValidationError as e:
    print(f"Invalid request: {e}")
except CompresrError as e:
    print(f"API error: {e}")

Framework integrations

The agents layer ships in the base install — pip install compresr is enough to get CompressionClient, all three provider chat models (Anthropic / OpenAI / Gemini), and both web search tools (Tavily + Brave).

Genuinely optional integrations beyond the agents layer:

Extra Pulls in
compresr[langgraph] langgraph (LangGraph checkpoint serializer, store, handoff tool)
compresr[llamaindex] llama-index-core (node postprocessor, memory block, tool wrapper)
compresr[litellm] litellm[proxy] (LiteLLM proxy guardrail)
compresr[all] all three above
pip install "compresr[langgraph]"

Old compresr[agents] / compresr[agents-anthropic] / compresr[agents-all] / compresr[langchain] install commands still resolve (no-op extras kept for back-compat) — everything they used to pull in is now in the base install.

LangChain — middleware + tool wrapper + retriever

from langchain.agents import create_agent
from compresr.integrations.langchain import (
    CompresrToolMiddleware,
    wrap_tool_with_compression,
    CompresrExtractor,
)

agent = create_agent(
    model=model,
    tools=[web_search],
    middleware=[CompresrToolMiddleware(
        api_key=os.environ["COMPRESR_API_KEY"],
        query_arg="query",
    )],
)

LangGraph — compression as a graph node

from compresr.integrations.langgraph import make_compresr_node

graph.add_node("compress", make_compresr_node(
    api_key=os.environ["COMPRESR_API_KEY"],
    context_key="retrieved_text",
    query_key="user_question",
))

LlamaIndex — node postprocessor for RAG

from compresr.integrations.llamaindex import CompresrNodePostprocessor

query_engine = index.as_query_engine(
    node_postprocessors=[CompresrNodePostprocessor(
        api_key=os.environ["COMPRESR_API_KEY"],
    )],
)

Unified query API

Every integration that accepts a query exposes the same three knobs:

Param Purpose
query Static query — same for every call
query_extractor Callable that derives the query from the call context
query_arg / query_key Name of the tool arg / state key to use as the query

Priority: query > query_extractor > query_arg/query_key > smart-pick from common arg keys (query, question, search_query, ...) > last human message in history.

Tutorials

Runnable Jupyter notebooks under tutorial/:

  • 01_quickstart.ipynb — core CompressionClient.
  • 02_langchain.ipynb — middleware + tool wrapper + retriever.
  • 03_langgraph.ipynb — compression node in a 3-node graph.
  • 04_llamaindex.ipynb — node postprocessor + tool wrapper.
  • 05_compresr_agents.ipynb — agent client (Anthropic/OpenAI/native shapes) with auto-compressed tool output.

Requirements

  • Python 3.9+
  • httpx >= 0.27.0
  • pydantic >= 2.10.0
  • Optional: langchain>=1.0, langgraph>=0.2, llama-index-core>=0.11 (install the matching extra)

License

Apache 2.0 — see LICENSE.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compresr-2.6.4.tar.gz (70.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

compresr-2.6.4-py3-none-any.whl (89.6 kB view details)

Uploaded Python 3

File details

Details for the file compresr-2.6.4.tar.gz.

File metadata

  • Download URL: compresr-2.6.4.tar.gz
  • Upload date:
  • Size: 70.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for compresr-2.6.4.tar.gz
Algorithm Hash digest
SHA256 d9adf3352ee7d7bfd505d0e38f5dd9f02ef5fd8bedda16fa732a0c1016d7335f
MD5 86cf5ef7f752c3737fc326694c605f90
BLAKE2b-256 1e05183deb9a4c0e44c594cbcc76736fa0517b980af7b62fd8241f84f2ab872e

See more details on using hashes here.

File details

Details for the file compresr-2.6.4-py3-none-any.whl.

File metadata

  • Download URL: compresr-2.6.4-py3-none-any.whl
  • Upload date:
  • Size: 89.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for compresr-2.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6c93406bcc8960b06dfde56217f69b0615af76c9f42ee9edb0c489eb4c857d49
MD5 a39442f3f5d0d953df162e1c7710dc35
BLAKE2b-256 d78535948812b0814d9eac7c076e3f78540f109d87dfc2c4570441ca2c1be239

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page