Reduce LLM API costs by 40-70% with one line of code — drop-in Anthropic/OpenAI client via the Prune proxy
Project description
Prune SDK
Reduce your LLM API costs by 40–70% (blended over repeat & similar traffic) with a one-line import change — no prompt edits, same responses on cache miss.
# Before
from anthropic import Anthropic
# After
from prune import Anthropic
Supported providers
| Provider | Status |
|---|---|
| Anthropic (Claude Opus, Sonnet, Haiku) | Live via proxy |
| OpenAI (GPT-4o, GPT-4o-mini, o-series) | Live via proxy |
| Google Gemini | Planned |
Installation
pip install prune-sdk
For local backend development:
export PRUNE_BASE_URL="http://127.0.0.1:8000"
Install from source (contributors):
pip install -e "./prune-sdk[dev]"
Quick start
Anthropic (Claude)
from prune import Anthropic
client = Anthropic(
api_key="sk-ant-your-key",
prune_api_key="prune_your_key",
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude!"}],
)
print(message.content[0].text)
print(client.last_prune_metadata) # cache hit, tokens saved, etc.
OpenAI (GPT)
from prune import OpenAI
client = OpenAI(
api_key="sk-your-openai-key",
prune_api_key="prune_your_key",
)
completion = client.chat.completions.create(
model="gpt-4o-mini",
max_tokens=256,
messages=[{"role": "user", "content": "Hello!"}],
)
print(completion.choices[0].message.content)
Async
from prune import AsyncAnthropic
client = AsyncAnthropic(
api_key="sk-ant-...",
prune_api_key="prune_...",
)
message = await client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=100,
messages=[{"role": "user", "content": "Hi"}],
)
Configuration
Environment variables:
export PRUNE_API_KEY="prune_your_key"
export PRUNE_BASE_URL="https://api.prune.so" # or http://127.0.0.1:8000 for local backend
export PRUNE_FALLBACK="true" # fallback to direct API if proxy fails
Programmatic:
import prune
prune.configure(api_key="prune_your_key", base_url="https://api.prune.so")
client = prune.Anthropic(api_key="sk-ant-...")
Behavior
| Feature | Details |
|---|---|
| Proxy routing | Anthropic → /v1/proxy/anthropic/messages · OpenAI → /v1/proxy/openai/chat/completions |
| Quality | Cache miss = same payload to the provider as without Prune. Cache hit = identical prior response. |
| Savings | Exact + semantic cache; Claude system prompt caching. See docs/SAVINGS_MODEL.md. |
| Response type | Real anthropic.types.Message / ChatCompletion objects |
| Streaming | Bypasses Prune; uses official SDK directly |
| Fallback | On proxy outage (5xx / network), calls Anthropic/OpenAI directly |
| Disable Prune | Anthropic(..., enable_prune=False) |
| Prompt Pass | HTTP header X-Prune-Optimize: light or compact (optional; default off) |
Direct HTTP (no SDK)
If you only need to test the proxy, skip the SDK and POST to the backend:
curl -X POST http://127.0.0.1:8000/v1/proxy/anthropic/messages ^
-H "X-Prune-Key: prune_your_key" ^
-H "Content-Type: application/json" ^
-d "{\"model\":\"claude-sonnet-4-20250514\",\"max_tokens\":64,\"messages\":[{\"role\":\"user\",\"content\":\"Hello\"}],\"user_api_key\":\"sk-ant-...\"}"
import httpx
resp = httpx.post(
"http://127.0.0.1:8000/v1/proxy/anthropic/messages",
headers={"X-Prune-Key": "prune_your_key"},
json={
"model": "claude-sonnet-4-20250514",
"max_tokens": 64,
"messages": [{"role": "user", "content": "Hello"}],
"user_api_key": "sk-ant-...",
},
timeout=60,
)
print(resp.json())
Development
cd prune-sdk
pip install -e ".[dev]"
pytest tests/ -q
pytest tests/ -m integration # needs ANTHROPIC_API_KEY + PRUNE_API_KEY
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prune_sdk-0.1.0.tar.gz.
File metadata
- Download URL: prune_sdk-0.1.0.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfbfec5861d79ff652e2348843afb9a5331299cc8a6923140e3271c989769fbe
|
|
| MD5 |
36e9824d8e0e825d0e8b4ffa5f95714e
|
|
| BLAKE2b-256 |
9d5826cf2d2138ab1c2627ffdde5dc131ed16c41455e1d42ca068b3900209260
|
File details
Details for the file prune_sdk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: prune_sdk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
788cd39ed846f684a27812e22194569b13255c0e8dc9bcd222ed75c45ba5491e
|
|
| MD5 |
17b8ac649f082dbe210287ab5d0ce0fe
|
|
| BLAKE2b-256 |
a3e54076e8e0c4f0fd61f8858dbd6b5cda68e1dacda45b3cabde77067e9d084c
|