Official Python API for Cache AI - Semantic Caching for Large Language Models
Project description
Cache AI Python API
Official Python API for Cache AI - Semantic Caching for Large Language Models
Features
- OpenAI Compatible: Drop-in replacement for OpenAI Python SDK
- Semantic Caching: Automatic caching of similar queries using advanced semantic similarity
- Multiple Baseline LLMs: Support for OpenAI, Anthropic, Google AI, and more
- Streaming Support: Full support for streaming responses
- Type-Safe: Complete type hints for better IDE support
- Easy Integration: Minimal code changes required
Installation
pip install cacheai
Quick Start
Basic Usage
from cacheai import Client
# Initialize client
client = Client(
api_key="your-cacheai-api-key",
base_url="https://api.cacheai.tech/v1" # Optional, this is the default
)
# Create a chat completion
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.choices[0].message.content)
With Environment Variables
import os
from cacheai import Client
# Set environment variables
os.environ["CACHEAI_API_KEY"] = "your-cacheai-api-key"
os.environ["CACHEAI_BASELINE_MODEL_PROVIDER"] = "openai"
os.environ["CACHEAI_BASELINE_MODEL_API_KEY"] = "your-baseline-model-api-key"
# Initialize client (reads from environment)
client = Client()
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "What is Python?"}]
)
print(response.choices[0].message.content)
Streaming
from cacheai import Client
client = Client(api_key="your-cacheai-api-key")
stream = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Configuration
Baseline LLM Configuration
Cache AI acts as a caching layer in front of your preferred LLM provider. Configure the baseline model:
from cacheai import Client
client = Client(
api_key="your-cacheai-api-key",
baseline_model_provider="openai", # "openai", "anthropic", "google", etc.
baseline_model_api_key="your-openai-key", # Baseline LLM API key
)
# Or use environment variables:
# CACHEAI_BASELINE_MODEL_PROVIDER=openai
# CACHEAI_BASELINE_MODEL_API_KEY=sk-...
Cache Control
from cacheai import Client
# Disable caching (for debugging/testing)
client = Client(
api_key="your-cacheai-api-key",
enable_cache=False
)
# Or via environment variable:
# CACHEAI_ENABLE_CACHE=false
Advanced Usage
Context Manager
from cacheai import Client
with Client(api_key="your-api-key") as client:
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# Connection is automatically closed
Custom Timeout and Retries
from cacheai import Client
client = Client(
api_key="your-api-key",
timeout=30.0, # Request timeout in seconds
max_retries=3 # Maximum retry attempts
)
Error Handling
from cacheai import Client, CacheAIError, AuthenticationError, RateLimitError
client = Client(api_key="your-api-key")
try:
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
except AuthenticationError as e:
print(f"Invalid API key: {e}")
except RateLimitError as e:
print(f"Rate limit exceeded: {e}")
except CacheAIError as e:
print(f"API error: {e}")
API Reference
Client
Client(
api_key: Optional[str] = None, # Cache AI API key
base_url: Optional[str] = None, # API base URL
timeout: float = 60.0, # Request timeout
max_retries: int = 2, # Max retry attempts
enable_cache: bool = True, # Enable semantic caching
baseline_model_provider: Optional[str] = None, # Baseline LLM provider
baseline_model_api_key: Optional[str] = None, # Baseline LLM API key
baseline_model_base_url: Optional[str] = None # Custom baseline model URL
)
Chat Completions
client.chat.completions.create(
model: str, # Model ID
messages: List[Dict[str, str]], # Conversation messages
temperature: Optional[float] = None, # Sampling temperature (0-2)
max_tokens: Optional[int] = None, # Max tokens to generate
top_p: Optional[float] = None, # Nucleus sampling
frequency_penalty: Optional[float] = None, # Frequency penalty
presence_penalty: Optional[float] = None, # Presence penalty
stop: Optional[Union[str, List[str]]] = None, # Stop sequences
stream: bool = False # Enable streaming
) -> ChatCompletion
Environment Variables
| Variable | Description | Default |
|---|---|---|
CACHEAI_API_KEY |
Cache AI API key | (required) |
CACHEAI_BASE_URL |
API base URL | https://api.cacheai.tech/v1 |
CACHEAI_ENABLE_CACHE |
Enable semantic caching | true |
CACHEAI_BASELINE_MODEL_PROVIDER |
Baseline model provider | (optional) |
CACHEAI_BASELINE_MODEL_API_KEY |
Baseline model API key | (optional) |
CACHEAI_BASELINE_MODEL_BASE_URL |
Custom Baseline model URL | (optional) |
Migration from OpenAI
Cache AI is designed to be a drop-in replacement for OpenAI:
# Before (OpenAI)
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(...)
# After (Cache AI)
from cacheai import Client
client = Client(api_key="ca-...", baseline_model_provider="openai", baseline_model_api_key="sk-...")
response = client.chat.completions.create(...)
Examples
See the examples directory for more usage examples:
Development
Install Development Dependencies
pip install -e ".[dev]"
Run Tests
pytest
Type Checking
mypy cacheai
Code Formatting
black cacheai
ruff cacheai
Support
- Documentation: https://docs.cacheai.tech
- Issues: GitHub Issues
License
MIT License - see LICENSE file for details.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cacheai-0.2.0.tar.gz.
File metadata
- Download URL: cacheai-0.2.0.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d96844878d742104ecd1e92ee3ca698eb047fac21c72d5cd54829386ca49826f
|
|
| MD5 |
a0c0e0e703d9cbf729523f8f73f026c4
|
|
| BLAKE2b-256 |
5d146e66c63ce4b05cc0fa2cc663f2fc90b9431bef6a18b4c5be7f5e59c38773
|
File details
Details for the file cacheai-0.2.0-py3-none-any.whl.
File metadata
- Download URL: cacheai-0.2.0-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f97f9e053ee559114d34691c2c4d893f3693452e50ece4a83092689de8307b3a
|
|
| MD5 |
4b1b4711aac7eb20fd298299cf2edfd3
|
|
| BLAKE2b-256 |
67026719fca2e2bfe8f0755bc0761208f5023f9935308edb51906f7bcc6488ee
|