Official Python SDK for Concise - Token compression for LLMs
Project description
Concise Python SDK
Official Python client for Concise - Token compression for LLMs.
Reduce your LLM costs by 30-50% with zero context loss using GPU-accelerated compression.
Installation
pip install concise-sdk
Quick Start
Direct Compression API
from concise import Concise
client = Concise(api_key="your-api-key")
result = client.compress(
"Your long prompt here...",
level="auto"
)
print(f"Original: {result.original_tokens} tokens")
print(f"Compressed: {result.compressed_tokens} tokens")
print(f"Saved: {result.tokens_saved} tokens ({(1-result.compression_ratio)*100:.1f}%)")
print(f"Compressed text: {result.compressed_text}")
OpenAI Drop-in Replacement
Replace your OpenAI import with Concise for automatic compression:
# Before:
# from openai import OpenAI
# After:
from concise import OpenAI
client = OpenAI(api_key="your-concise-key")
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms"}
],
compression_enabled=True, # Automatic token compression
compression_level="balanced"
)
print(response.choices[0].message.content)
Features
- Direct Compression API - Compress any text before sending to LLMs
- OpenAI Drop-in - Replace
from openai import OpenAIwithfrom concise import OpenAI - Automatic Strategy Selection - Detects Python code vs natural language
- GPU-Accelerated - 285ms compression time (or instant with caching)
- Zero Context Loss - Preserves semantic meaning
- Type Hints - Full type annotations for better IDE support
Compression Levels
| Level | Reduction | Use Case |
|---|---|---|
auto |
30-50% | Automatic strategy (recommended) |
aggressive |
50% | Maximum compression, natural language |
balanced |
30% | Good trade-off |
conservative |
20% | Light compression, preserve structure |
Examples
Python Code Compression
from concise import Concise
client = Concise(api_key="your-api-key")
code = """
def fibonacci(n):
'''Calculate fibonacci number'''
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
"""
result = client.compress(code, level="auto")
# Strategy: token_compression_code
# Reduction: 39%
# Time: 27ms
Natural Language Compression
result = client.compress(
"FastAPI is a modern, fast web framework for building APIs with Python 3.8+",
level="aggressive"
)
# Strategy: token_compression_text
# Reduction: 50%
# Time: 285ms (or 0ms if cached)
Using with OpenAI
from concise import OpenAI
client = OpenAI(api_key="your-concise-key")
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": "You are a Python expert. Help users write clean, efficient code."
},
{
"role": "user",
"content": "Write a function to validate email addresses using regex"
}
],
compression_enabled=True,
compression_level="balanced"
)
print(response.choices[0].message.content)
Context Manager
from concise import Concise
with Concise(api_key="your-api-key") as client:
result = client.compress("Long text here...")
print(f"Saved {result.tokens_saved} tokens")
Environment Variable
Set CONCISE_API_KEY environment variable:
export CONCISE_API_KEY=your-api-key
from concise import Concise
# API key loaded from environment
client = Concise()
Error Handling
from concise import Concise, AuthenticationError, APIError, RateLimitError
client = Concise(api_key="your-api-key")
try:
result = client.compress("text")
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limit exceeded")
except APIError as e:
print(f"API error: {e} (status: {e.status_code})")
Performance
| Type | Strategy | Reduction | Time |
|---|---|---|---|
| Python code | python-minifier | 39% | 27ms |
| Natural language | LLMLingua-2 GPU | 50% | 285ms |
| Cached requests | Cache hit | 50% | 0ms |
Caching
Concise automatically caches compression results:
- First request: GPU compression (285ms)
- Repeated requests: Instant (0ms)
- 240,000x speedup for cached requests
API Reference
Concise
Main client for direct compression API.
__init__(api_key, base_url, timeout)
Initialize client.
Parameters:
api_key(str, optional): Your Concise API keybase_url(str, optional): API base URL (default: https://api.concise.dev/v1)timeout(int, optional): Request timeout in seconds (default: 30)
compress(text, level)
Compress text to reduce token count.
Parameters:
text(str): Text to compresslevel(str): Compression level ("auto", "aggressive", "balanced", "conservative")
Returns:
CompressionResult: Object with compression metrics
health()
Check API health status.
Returns:
dict: Status and version info
OpenAI
OpenAI-compatible client with automatic compression.
chat.completions.create()
Create chat completion with compression.
Additional Parameters:
compression_enabled(bool): Enable compression (default: True)compression_level(str): Compression level (default: "auto")
Types
CompressionResult
@dataclass
class CompressionResult:
original_text: str
compressed_text: str
original_tokens: int
compressed_tokens: int
tokens_saved: int
compression_ratio: float
strategy: str
compression_time_ms: float
cache_hit: Optional[bool]
Requirements
- Python 3.8+
- httpx>=0.25.0
Getting Your API Key
- Sign up at concise.dev
- Create an API key in the dashboard
- Use the key with this SDK
Support
- Documentation: docs.concise.dev
- Issues: github.com/concise/python-sdk/issues
- Email: support@concise.dev
License
MIT License - see LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file concise_sdk-1.0.0.tar.gz.
File metadata
- Download URL: concise_sdk-1.0.0.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43f310c5ce166d6c23c8ec118fa749061a96ea2470ab9506fc41e85a4e54be8e
|
|
| MD5 |
21a62d6b2022939008e5a458c9bb0edb
|
|
| BLAKE2b-256 |
cc35906a911110d653d4ff3467db9db6ae895f650a5f37bcbca331c372cdc842
|
File details
Details for the file concise_sdk-1.0.0-py3-none-any.whl.
File metadata
- Download URL: concise_sdk-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc0f946f921bfc854da5ef48b57ad5badced19dca25e4cee36e6f5e5f4052316
|
|
| MD5 |
9c52c895457e5874ecd29f4c85d90903
|
|
| BLAKE2b-256 |
4cdf1c8224af02b86987b2dc84fae26fa22d65f704bc11bea864faffa38e4b5c
|