Python SDK for Compresr - Intelligent prompt compression service
Project description
Compresr Python SDK
Intelligent context compression service to optimize LLM costs and performance. Reduce your LLM API costs by 30-70% through intelligent context compression.
Installation
pip install compresr
Quick Start
API Key Setup
Get your API key from compresr.ai:
- Create an account at compresr.ai
- Navigate to Dashboard → API Keys
- Click "Create New Key" and copy it (shown only once!)
Two Client Types
1. CompressionClient — Token-Level Compression
Intelligently removes less important tokens while preserving meaning. Best for compressing raw, unstructured text like system prompts, documents, or retrieved passages before sending to an LLM. Supports a tunable compression_ratio to control how aggressively to compress.
from compresr import CompressionClient
client = CompressionClient(api_key="cmp_your_api_key")
# Agnostic compression (espresso_v1, default) — no query needed
result = client.compress(
context="Your very long context that needs compression...",
)
print(f"Original: {result.data.original_tokens} tokens")
print(f"Compressed: {result.data.compressed_tokens} tokens")
print(f"Saved: {result.data.tokens_saved} tokens")
# Query-specific compression (latte_v1) — query REQUIRED
result = client.compress(
context="Python was created in 1991. JavaScript in 1995. Java in 1995.",
query="Who created Python?",
compression_model_name="latte_v1",
)
print(f"Compressed (query-relevant): {result.data.compressed_context}")
2. FilterClient — Chunk-Level Filtering
Keeps or drops entire chunks based on query relevance — no partial rewriting. Best for RAG pipelines where your retriever returns many chunks and you want to discard the irrelevant ones before stuffing them into a prompt. Fast and binary: a chunk is either in or out.
from compresr import FilterClient
client = FilterClient(api_key="cmp_your_api_key")
# Query is always required
result = client.filter(
chunks=[
"Python is a programming language created by Guido van Rossum.",
"JavaScript was created by Brendan Eich for web browsers.",
"Java was developed by James Gosling at Sun Microsystems.",
],
query="Who created Python?",
)
print(f"Filtered chunks: {result.data.compressed_context}")
print(f"Saved: {result.data.tokens_saved} tokens")
Integration with OpenAI
Agnostic compression:
from compresr import CompressionClient
from openai import OpenAI
compresr = CompressionClient(api_key="cmp_xxx")
openai_client = OpenAI(api_key="sk-xxx")
compressed = compresr.compress(
context="Your long system prompt or document...",
)
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": compressed.data.compressed_context},
{"role": "user", "content": "Analyze this data..."}
]
)
print(f"Saved {compressed.data.tokens_saved} tokens!")
Query-specific compression (RAG/QA):
from compresr import CompressionClient
from openai import OpenAI
compresr = CompressionClient(api_key="cmp_xxx")
openai_client = OpenAI(api_key="sk-xxx")
user_question = "What is machine learning?"
compressed = compresr.compress(
context="Retrieved documents with lots of information...",
query=user_question,
compression_model_name="latte_v1",
)
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": compressed.data.compressed_context},
{"role": "user", "content": user_question}
]
)
Chunk filtering for RAG pipelines:
from compresr import FilterClient
from openai import OpenAI
filter_client = FilterClient(api_key="cmp_xxx")
openai_client = OpenAI(api_key="sk-xxx")
user_question = "What is Python?"
retrieved_chunks = ["Chunk about Python...", "Chunk about Java...", "Chunk about ML..."]
filtered = filter_client.filter(
chunks=retrieved_chunks,
query=user_question,
)
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "\n".join(filtered.data.compressed_context)},
{"role": "user", "content": user_question}
]
)
Streaming Support
Both clients support real-time streaming:
from compresr import CompressionClient, FilterClient
# Compression streaming
client = CompressionClient(api_key="cmp_your_api_key")
for chunk in client.compress_stream(
context="Your long context...",
):
print(chunk.content, end="", flush=True)
# Query-specific compression streaming
for chunk in client.compress_stream(
context="Your long context...",
query="What is important here?",
compression_model_name="latte_v1",
):
print(chunk.content, end="", flush=True)
# Filter streaming
filter_client = FilterClient(api_key="cmp_your_api_key")
for chunk in filter_client.filter_stream(
chunks=["Chunk 1...", "Chunk 2..."],
query="What is relevant?",
):
print(chunk.content, end="", flush=True)
Async Support
Both clients support async/await:
import asyncio
from compresr import CompressionClient, FilterClient
async def main():
# Compression async
client = CompressionClient(api_key="cmp_your_api_key")
result = await client.compress_async(
context="Your context...",
)
# Query-specific compression async
result = await client.compress_async(
context="Your context...",
query="What matters here?",
compression_model_name="latte_v1",
)
# Filter async
filter_client = FilterClient(api_key="cmp_your_api_key")
filtered = await filter_client.filter_async(
chunks=["Chunk 1...", "Chunk 2..."],
query="What is relevant?",
)
await client.close()
await filter_client.close()
asyncio.run(main())
API Reference
Client Initialization
from compresr import CompressionClient, FilterClient
# Token-level compression
client = CompressionClient(
api_key="cmp_your_api_key", # Required
timeout=30 # Optional: request timeout in seconds
)
# Chunk-level filtering
filter_client = FilterClient(
api_key="cmp_your_api_key", # Required
timeout=30 # Optional: request timeout in seconds
)
CompressionClient Methods
| Method | Description |
|---|---|
compress() |
Compress context (latte_v1 requires query param) |
compress_async() |
Async compress |
compress_stream() |
Stream compression |
FilterClient Methods
| Method | Description |
|---|---|
filter() |
Filter chunks by query relevance (query REQUIRED) |
filter_async() |
Async filter |
filter_stream() |
Stream filtered chunks |
Response Structure
# CompressResult (same for both clients)
result.data.original_context # Original input
result.data.compressed_context # Compressed/filtered output
result.data.original_tokens # Token count before
result.data.compressed_tokens # Token count after
result.data.actual_compression_ratio # Achieved ratio (0-1)
result.data.tokens_saved # Tokens saved
result.data.duration_ms # Processing time
Available Models
Compression Models (CompressionClient)
| Model | Query | Compression Ratio | Best For |
|---|---|---|---|
espresso_v1 |
Not needed | Supported | System prompts, documents, general context — when there is no specific question |
latte_v1 |
Required | Supported | RAG / QA — compress retrieved text while preserving answer-relevant tokens |
Filter Models (FilterClient)
| Model | Query | Compression Ratio | Best For |
|---|---|---|---|
coldbrew_v1 |
Required | Not supported | RAG chunk selection — quickly drop irrelevant chunks from a retrieval set |
Error Handling
from compresr import CompressionClient, FilterClient
from compresr.exceptions import (
CompresrError,
AuthenticationError,
RateLimitError,
ValidationError,
)
client = CompressionClient(api_key="cmp_your_api_key")
try:
result = client.compress(context="Your context...")
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limit exceeded")
except ValidationError as e:
print(f"Invalid request: {e}")
except CompresrError as e:
print(f"API error: {e}")
Requirements
- Python 3.9+
httpx >= 0.27.0pydantic >= 2.10.0
License
Proprietary License
Support
- Documentation: compresr.ai/docs/overview
- Support: support@compresr.ai
- Issues: GitHub Issues
- Website: compresr.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file compresr-2.0.2.tar.gz.
File metadata
- Download URL: compresr-2.0.2.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54ac68cbe8a53eadd99514ee22248588defcc7ee21d076b7f77a8d883489a4e8
|
|
| MD5 |
9d74b034783595a2e7519018e91f1860
|
|
| BLAKE2b-256 |
5678019f2fd393385faf76d1b88d2479e76e8aa7d57cdc8e03b045585f4ca4e6
|
File details
Details for the file compresr-2.0.2-py3-none-any.whl.
File metadata
- Download URL: compresr-2.0.2-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df4516b7e8391f0063158e25dc2812dc8239f410ac3da2d2f3cc78ed4046ce33
|
|
| MD5 |
81a4b17b9a8108aef3e003597d355b6d
|
|
| BLAKE2b-256 |
726022a0638d7e01ee9fdc4223e2ec4e9f9bbd1117dfb1953a7a201bd539e57f
|