The official Python SDK for the Pellet AI API
Project description
Pellet AI Python SDK
The official Python client for the Pellet AI API.
Pellet is an intelligent LLM routing platform — send a prompt and Pellet automatically picks the best model for cost, speed, and quality. One API key, 11+ models across Groq, Together, and Fireworks.
Installation
pip install pellet-ai
Requires Python 3.9+.
Quick Start
from pellet import Pellet
client = Pellet(api_key="pk_live_...") # or set PELLET_API_KEY env var
# Pellet auto-routes to the best model
response = client.chat.completions.create(
messages=[{"role": "user", "content": "What is machine learning?"}],
)
print(response.choices[0].message.content)
print(f"Model used: {response.model}")
print(f"Task type: {response.pellet_metadata.task_type}")
print(f"Cost: ${response.pellet_metadata.cost_usd:.6f}")
Streaming
with client.chat.completions.stream(
messages=[{"role": "user", "content": "Tell me a story"}],
) as stream:
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Async
import asyncio
from pellet import AsyncPellet
async def main():
async with AsyncPellet(api_key="pk_live_...") as client:
response = await client.chat.completions.create(
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
asyncio.run(main())
Routing Control
Pellet auto-routes by default. Use pellet_config to control routing behavior:
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Write a sorting algorithm"}],
pellet_config={
"routing_mode": "quality", # "auto", "fastest", "cheapest", "quality"
"max_latency_ms": 5000, # reject models slower than this
"provider_preference": ["groq"], # prefer specific providers
},
)
Or bypass routing entirely by specifying a model:
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello!"}],
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
)
Preview routing decisions without running inference:
decision = client.routing.explain(
messages=[{"role": "user", "content": "What is ML?"}],
)
print(f"Would route to: {decision.model}")
print(f"Task type: {decision.task_type}")
print(f"Confidence: {decision.confidence:.2f}")
print(f"Alternatives: {[a.model for a in decision.alternatives]}")
Audio Transcription
transcript = client.audio.transcriptions.create(
file=open("audio.mp3", "rb"),
model="whisper-large-v3-turbo", # or "whisper-large-v3", "distil-whisper-large-v3-en"
)
print(transcript.text)
print(f"Latency: {transcript.pellet_metadata.latency_ms}ms")
Models & Health
# List all available models
models = client.models.list()
for m in models.data:
print(f"{m.id} ({m.tier}, {m.params}) — {m.providers}")
# Check platform health
health = client.health.check()
print(f"Status: {health.status}") # "ok" or "degraded"
Configuration
client = Pellet(
api_key="pk_live_...", # or set PELLET_API_KEY env var
base_url="https://...", # or set PELLET_BASE_URL env var (default: https://getpellet.io/v1)
timeout=30.0, # read timeout in seconds (default: 60)
max_retries=3, # retries on 429/5xx (default: 2)
)
# Per-request timeout override
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Quick question"}],
timeout=5.0,
)
# Explicit cleanup (or use context manager)
client.close()
Error Handling
import pellet
try:
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello"}],
)
except pellet.AuthenticationError:
print("Invalid API key")
except pellet.RateLimitError:
print("Too many requests — will auto-retry")
except pellet.InsufficientCreditsError:
print("Add credits at https://getpellet.io/dashboard/wallet")
except pellet.UpstreamError:
print("Upstream provider unavailable")
except pellet.APIStatusError as e:
print(f"HTTP {e.status_code}: {e.message}")
except pellet.APIConnectionError:
print("Network error")
Migrating from OpenAI SDK
# Before (OpenAI SDK)
from openai import OpenAI
client = OpenAI(api_key="pk_live_...", base_url="https://getpellet.io/v1")
response = client.chat.completions.create(
model="",
messages=[{"role": "user", "content": "Hello"}],
extra_body={"pellet_config": {"routing_mode": "fastest"}},
)
metadata = response.model_extra.get("pellet_metadata", {})
# After (Pellet SDK)
from pellet import Pellet
client = Pellet(api_key="pk_live_...")
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello"}],
pellet_config={"routing_mode": "fastest"},
)
metadata = response.pellet_metadata # typed!
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pellet_ai-0.1.0.tar.gz.
File metadata
- Download URL: pellet_ai-0.1.0.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6db61a03e0f0b723c1103d6f3f763d167be3137df5b4d377c68a11f0a7559b33
|
|
| MD5 |
8175f555268aa3220be7630eef2a429e
|
|
| BLAKE2b-256 |
663b8a403e0bc4c0bb4dfbbedc52a6acfdc4f3ea79383f4045a738f86bdf5968
|
File details
Details for the file pellet_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pellet_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15221271a56050d398123fff950a2dd8836e0b27d1e2e2b200ccb53ce9489ba3
|
|
| MD5 |
3f5cd70fd39c77d704437e2b116174f1
|
|
| BLAKE2b-256 |
b35e4fe00390a3d2fec65e148d764b59936ab7b89a9edd143aede734e3137965
|