Skip to main content

Reduce LLM API costs by 40-70% with one line of code — drop-in Anthropic/OpenAI client via the Prune proxy

Project description

Prune SDK

Reduce your LLM API costs by 40–70% (blended over repeat & similar traffic) with a one-line import change — no prompt edits, same responses on cache miss.

# Before
from anthropic import Anthropic

# After
from prune import Anthropic

Supported providers

Provider Status
Anthropic (Claude Opus, Sonnet, Haiku) Live via proxy
OpenAI (GPT-4o, GPT-4o-mini, o-series) Live via proxy
Google Gemini Planned

Installation

pip install prune-sdk

For local backend development:

export PRUNE_BASE_URL="http://127.0.0.1:8000"

Install from source (contributors):

pip install -e "./prune-sdk[dev]"

Quick start

Anthropic (Claude)

from prune import Anthropic

client = Anthropic(
    api_key="sk-ant-your-key",
    prune_api_key="prune_your_key",
)

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude!"}],
)

print(message.content[0].text)
print(client.last_prune_metadata)  # cache hit, tokens saved, etc.

OpenAI (GPT)

from prune import OpenAI

client = OpenAI(
    api_key="sk-your-openai-key",
    prune_api_key="prune_your_key",
)

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello!"}],
)

print(completion.choices[0].message.content)

Async

from prune import AsyncAnthropic

client = AsyncAnthropic(
    api_key="sk-ant-...",
    prune_api_key="prune_...",
)

message = await client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=100,
    messages=[{"role": "user", "content": "Hi"}],
)

Configuration

Environment variables:

export PRUNE_API_KEY="prune_your_key"
export PRUNE_BASE_URL="https://api.prune.so"   # or http://127.0.0.1:8000 for local backend
export PRUNE_FALLBACK="true"                     # fallback to direct API if proxy fails

Programmatic:

import prune

prune.configure(api_key="prune_your_key", base_url="https://api.prune.so")

client = prune.Anthropic(api_key="sk-ant-...")

Behavior

Feature Details
Proxy routing Anthropic → /v1/proxy/anthropic/messages · OpenAI → /v1/proxy/openai/chat/completions
Quality Cache miss = same payload to the provider as without Prune. Cache hit = identical prior response.
Savings Exact + semantic cache; Claude system prompt caching. See docs/SAVINGS_MODEL.md.
Response type Real anthropic.types.Message / ChatCompletion objects
Streaming Bypasses Prune; uses official SDK directly
Fallback On proxy outage (5xx / network), calls Anthropic/OpenAI directly
Disable Prune Anthropic(..., enable_prune=False)
Prompt Pass HTTP header X-Prune-Optimize: light or compact (optional; default off)

Direct HTTP (no SDK)

If you only need to test the proxy, skip the SDK and POST to the backend:

curl -X POST http://127.0.0.1:8000/v1/proxy/anthropic/messages ^
  -H "X-Prune-Key: prune_your_key" ^
  -H "Content-Type: application/json" ^
  -d "{\"model\":\"claude-sonnet-4-20250514\",\"max_tokens\":64,\"messages\":[{\"role\":\"user\",\"content\":\"Hello\"}],\"user_api_key\":\"sk-ant-...\"}"
import httpx

resp = httpx.post(
    "http://127.0.0.1:8000/v1/proxy/anthropic/messages",
    headers={"X-Prune-Key": "prune_your_key"},
    json={
        "model": "claude-sonnet-4-20250514",
        "max_tokens": 64,
        "messages": [{"role": "user", "content": "Hello"}],
        "user_api_key": "sk-ant-...",
    },
    timeout=60,
)
print(resp.json())

Development

cd prune-sdk
pip install -e ".[dev]"
pytest tests/ -q
pytest tests/ -m integration  # needs ANTHROPIC_API_KEY + PRUNE_API_KEY

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prune_sdk-0.1.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prune_sdk-0.1.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file prune_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: prune_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for prune_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cfbfec5861d79ff652e2348843afb9a5331299cc8a6923140e3271c989769fbe
MD5 36e9824d8e0e825d0e8b4ffa5f95714e
BLAKE2b-256 9d5826cf2d2138ab1c2627ffdde5dc131ed16c41455e1d42ca068b3900209260

See more details on using hashes here.

File details

Details for the file prune_sdk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: prune_sdk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for prune_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 788cd39ed846f684a27812e22194569b13255c0e8dc9bcd222ed75c45ba5491e
MD5 17b8ac649f082dbe210287ab5d0ce0fe
BLAKE2b-256 a3e54076e8e0c4f0fd61f8858dbd6b5cda68e1dacda45b3cabde77067e9d084c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page