Python SDK for TokenRouter - Intelligent LLM Routing API

These details have not been verified by PyPI

Project links

Project description

TokenRouter Python SDK

Official Python SDK for TokenRouter - an intelligent LLM routing service that automatically selects the most cost-effective model for your AI requests.

Features

🚀 OpenAI-Compatible Interface: Drop-in replacement for OpenAI SDK
🎯 Intelligent Routing: Automatically routes to the best model based on your prompt
💰 Cost Optimization: Save up to 70% on LLM costs
🔄 Multiple Providers: Unified interface for OpenAI, Anthropic, Mistral, Together AI
⚡ Streaming Support: Real-time streaming responses
🔒 Built-in Authentication: Secure API key management
🔁 Automatic Retries: Resilient error handling
📊 Analytics: Track usage, costs, and performance
🔀 Async Support: Both synchronous and asynchronous clients

Installation

pip install tokenrouter

For development/local testing:

cd TokenRouterSDK/python
pip install -e .

Quick Start

Basic Usage

from tokenrouter import Client

# Initialize client
client = Client(
    api_key="tr_your-api-key-here",
    base_url="https://api.tokenrouter.io"  # or http://localhost:8000 for local
)

# Simple completion
response = client.chat.create(
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    model="auto",  # Let TokenRouter choose the best model
    temperature=0.7
)

print(response.content)
print(f"Cost: ${response.cost_usd:.6f}")
print(f"Model used: {response.model}")

Async Usage

import asyncio
from tokenrouter import AsyncClient

async def main():
    # Initialize async client
    client = AsyncClient(
        api_key="tr_your-api-key-here",
        base_url="https://api.tokenrouter.io"
    )
    
    # Async completion
    response = await client.chat.create(
        messages=[
            {"role": "user", "content": "Explain quantum computing"}
        ],
        model="auto",
        max_tokens=500
    )
    
    print(response.content)
    
    # Don't forget to close the client
    await client.close()

asyncio.run(main())

Authentication

Get your API key from TokenRouter:

Sign up at tokenrouter.io
Navigate to API Keys section
Create a new API key
Add to your environment:

export TOKENROUTER_API_KEY=tr_your-api-key-here

Then in your code:

import os
from tokenrouter import Client

client = Client(api_key=os.environ["TOKENROUTER_API_KEY"])

Core Features

Automatic Model Selection

Let TokenRouter choose the best model for your use case:

# TokenRouter analyzes your prompt and selects the optimal model
response = client.chat.create(
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    model="auto"  # Automatic selection
)

Model Preferences

Specify preferred models while still benefiting from fallback:

response = client.chat.create(
    messages=[{"role": "user", "content": "Explain relativity"}],
    model_preferences=["gpt-4", "claude-3-opus"],  # Preference order
    max_tokens=1000
)

Streaming Responses

Stream responses for real-time output:

# Synchronous streaming
stream = client.chat.create(
    messages=[{"role": "user", "content": "Tell me a story"}],
    model="auto",
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# Async streaming
async def stream_response():
    stream = await async_client.chat.create(
        messages=[{"role": "user", "content": "Count to 10"}],
        stream=True
    )
    
    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

Function Calling / Tools

Use OpenAI-compatible function calling:

response = client.chat.create(
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    model="auto",
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }],
    tool_choice="auto"
)

# Handle tool calls in response
if response.tool_calls:
    for tool_call in response.tool_calls:
        print(f"Function: {tool_call.function.name}")
        print(f"Arguments: {tool_call.function.arguments}")

Direct Completions

Simple completion interface:

# Quick completion
response = client.completions("Translate 'Hello' to French")
print(response.content)

# With parameters
response = client.completions(
    "Write a Python function for binary search",
    model="auto",
    temperature=0.2,
    max_tokens=500
)

Advanced Usage

Custom Headers

Add custom headers to requests:

client = Client(
    api_key="tr_your-api-key",
    headers={
        "X-Custom-Header": "value"
    }
)

Timeout Configuration

Set custom timeout values:

client = Client(
    api_key="tr_your-api-key",
    timeout=30.0  # 30 seconds
)

Error Handling

Comprehensive error handling:

from tokenrouter import (
    Client,
    AuthenticationError,
    RateLimitError,
    InvalidRequestError,
    APIConnectionError
)

try:
    response = client.chat.create(
        messages=[{"role": "user", "content": "Hello"}],
        model="auto"
    )
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except RateLimitError as e:
    print(f"Rate limit exceeded, retry after: {e.retry_after}")
except InvalidRequestError as e:
    print(f"Invalid request: {e}")
except APIConnectionError as e:
    print(f"Connection error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Analytics & Monitoring

Track usage and costs:

# Get usage analytics
analytics = client.get_analytics()
print(f"Total requests: {analytics.total_requests}")
print(f"Total cost: ${analytics.total_cost_usd:.4f}")
print(f"Average latency: {analytics.average_latency_ms}ms")

# List available models
models = client.list_models()
for model in models:
    print(f"{model.id}: {model.provider}")

# Get model costs
costs = client.get_costs()
for model, pricing in costs.items():
    print(f"{model}: ${pricing['cost_per_1k_input']}/1k input tokens")

API Methods

Chat Completions

client.chat.create(
    messages,           # List of message dicts (required)
    model="auto",       # Model name or 'auto'
    model_preferences=None,  # List of preferred models
    temperature=0.7,    # Sampling temperature (0-2)
    max_tokens=None,    # Maximum tokens to generate
    top_p=1.0,         # Nucleus sampling parameter
    frequency_penalty=0.0,  # Frequency penalty (-2 to 2)
    presence_penalty=0.0,   # Presence penalty (-2 to 2)
    stop=None,         # Stop sequences
    stream=False,      # Enable streaming
    tools=None,        # Function/tool definitions
    tool_choice=None,  # Tool selection strategy
    response_format=None,  # Response format spec
    seed=None,         # Seed for deterministic output
    user=None          # User identifier for tracking
)

Direct Completions

client.completions(
    prompt,            # Prompt string (required)
    model="auto",      # Model name or 'auto'
    **kwargs           # Additional parameters
)

Utility Methods

# List available models
models = client.list_models()

# Get model pricing
costs = client.get_costs()

# Get usage analytics
analytics = client.get_analytics()

# Health check
health = client.health_check()

Environment Variables

# Required
TOKENROUTER_API_KEY=tr_your-api-key

# Optional
TOKENROUTER_BASE_URL=https://api.tokenrouter.io
TOKENROUTER_TIMEOUT=30
TOKENROUTER_MAX_RETRIES=3

Migration from OpenAI

TokenRouter SDK is designed as a drop-in replacement for OpenAI:

# Before (OpenAI)
import openai
openai.api_key = "sk-..."
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello"}]
)

# After (TokenRouter)
from tokenrouter import Client
client = Client(api_key="tr_...")
response = client.chat.create(
    model="auto",  # or keep "gpt-3.5-turbo"
    messages=[{"role": "user", "content": "Hello"}]
)

Best Practices

Use 'auto' model: Let TokenRouter optimize model selection
Implement retries: Use exponential backoff for transient errors
Cache responses: Store frequently requested completions
Batch similar requests: Group related prompts when possible
Monitor costs: Regularly check analytics to track spending
Use async client: For concurrent requests, use AsyncClient
Close connections: Always close async clients when done

Examples

Example: Content Generation

def generate_blog_post(topic: str) -> str:
    response = client.chat.create(
        messages=[
            {
                "role": "system",
                "content": "You are a professional blog writer"
            },
            {
                "role": "user",
                "content": f"Write a 500-word blog post about {topic}"
            }
        ],
        model="auto",
        temperature=0.8,
        max_tokens=1000
    )
    return response.content

Example: Code Generation

def generate_code(description: str) -> str:
    response = client.chat.create(
        messages=[
            {
                "role": "system",
                "content": "You are an expert programmer. Return only code without explanations."
            },
            {
                "role": "user",
                "content": description
            }
        ],
        model="auto",
        temperature=0.2,  # Lower temperature for code
        max_tokens=2000
    )
    return response.content

Example: Batch Processing

async def process_batch(prompts: list) -> list:
    async with AsyncClient(api_key="tr_...") as client:
        tasks = []
        for prompt in prompts:
            task = client.chat.create(
                messages=[{"role": "user", "content": prompt}],
                model="auto"
            )
            tasks.append(task)
        
        responses = await asyncio.gather(*tasks)
        return [r.content for r in responses]

# Usage
prompts = ["Question 1", "Question 2", "Question 3"]
results = asyncio.run(process_batch(prompts))

Example: Conversation with Context

class Conversation:
    def __init__(self, client, system_prompt=None):
        self.client = client
        self.messages = []
        if system_prompt:
            self.messages.append({"role": "system", "content": system_prompt})
    
    def add_user_message(self, content):
        self.messages.append({"role": "user", "content": content})
    
    def get_response(self, **kwargs):
        response = self.client.chat.create(
            messages=self.messages,
            model="auto",
            **kwargs
        )
        self.messages.append({"role": "assistant", "content": response.content})
        return response.content

# Usage
conv = Conversation(client, "You are a helpful tutor")
conv.add_user_message("What is calculus?")
response1 = conv.get_response()
conv.add_user_message("Can you give an example?")
response2 = conv.get_response()  # Has context of previous messages

Error Reference

Exception	Description	Resolution
`AuthenticationError`	Invalid or missing API key	Check API key validity
`RateLimitError`	Rate limit exceeded	Wait and retry with backoff
`InvalidRequestError`	Malformed request	Check request parameters
`APIConnectionError`	Network error	Check network connection
`InternalServerError`	Server error	Retry with exponential backoff

Performance Tips

Use connection pooling: The client reuses connections automatically
Batch requests: Process multiple prompts concurrently with AsyncClient
Set appropriate timeouts: Balance between reliability and speed
Cache frequently used responses: Implement local caching for common queries
Monitor token usage: Track token consumption to optimize prompts

Support

Documentation: docs.tokenrouter.io
GitHub Issues: github.com/tokenrouter/sdk-python/issues
Email: support@tokenrouter.io
Discord: discord.gg/tokenrouter

License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.2.1

Nov 16, 2025

1.2.0

Nov 7, 2025

1.1.0

Nov 3, 2025

1.0.16

Nov 3, 2025

1.0.15

Sep 17, 2025

1.0.14

Sep 17, 2025

1.0.13

Sep 16, 2025

1.0.12

Sep 16, 2025

1.0.11

Sep 16, 2025

1.0.8

Sep 4, 2025

1.0.7

Sep 3, 2025

1.0.5

Sep 3, 2025

1.0.4

Sep 2, 2025

1.0.2

Sep 2, 2025

1.0.1

Sep 2, 2025

This version

1.0.0

Aug 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenrouter-1.0.0.tar.gz (15.9 kB view details)

Uploaded Aug 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenrouter-1.0.0-py3-none-any.whl (11.6 kB view details)

Uploaded Aug 10, 2025 Python 3

File details

Details for the file tokenrouter-1.0.0.tar.gz.

File metadata

Download URL: tokenrouter-1.0.0.tar.gz
Upload date: Aug 10, 2025
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokenrouter-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`98c36bcf1021849bb3898475dddb13ce0b1656614b2f33678ea4a411d1c3a075`
MD5	`db4e3c262275c881842a429b78e2547c`
BLAKE2b-256	`eaf05c508cbcaf8b097478a69653d8c3a736e47abec21f5a5526eaf5de3c7457`

See more details on using hashes here.

File details

Details for the file tokenrouter-1.0.0-py3-none-any.whl.

File metadata

Download URL: tokenrouter-1.0.0-py3-none-any.whl
Upload date: Aug 10, 2025
Size: 11.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokenrouter-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`48d0edd75acc7f198b045a6a9e7eb11abb66ea963407aec866fec6d8bdae8cad`
MD5	`26be0e62cdd46de65a6817470ad0c3af`
BLAKE2b-256	`c713cb997ef53679078d8132de7e9dcf0f033c7b94f1ae2f090d64cecc9813e1`

See more details on using hashes here.

tokenrouter 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TokenRouter Python SDK

Features

Installation

Quick Start

Basic Usage

Async Usage

Authentication

Core Features

Automatic Model Selection

Model Preferences

Streaming Responses

Function Calling / Tools

Direct Completions

Advanced Usage

Custom Headers

Timeout Configuration

Error Handling

Analytics & Monitoring

API Methods

Chat Completions

Direct Completions

Utility Methods

Environment Variables

Migration from OpenAI

Best Practices

Examples

Example: Content Generation

Example: Code Generation

Example: Batch Processing

Example: Conversation with Context

Error Reference

Performance Tips

Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes