Minimal, reusable AI service handlers for Gemini and other LLMs
Project description
AI Proxy Core
A unified Python package providing a single interface for AI completions across multiple providers (OpenAI, Gemini, Ollama), plus image generation capabilities (v0.4.0+). Features intelligent model management, automatic provider routing, zero-config setup, and abstract image generation with DALL-E 3.
💡 Why not LangChain? Read our philosophy and architectural rationale for choosing simplicity over complexity.
🎯 What's Next? See our wrapper layer roadmap for planned features and what belongs in a clean LLM wrapper.
Installation
Basic (Google Gemini only):
pip install ai-proxy-core
With specific providers (optional dependencies):
pip install ai-proxy-core[openai] # OpenAI support (includes image generation)
pip install ai-proxy-core[anthropic] # Anthropic support (coming soon)
pip install ai-proxy-core[telemetry] # OpenTelemetry support
pip install ai-proxy-core[all] # Everything
Or install from source:
git clone https://github.com/ebowwa/ai-proxy-core.git
cd ai-proxy-core
pip install -e .
# With all extras: pip install -e ".[all]"
Quick Start
🤖 AI Integration Help:
- Using the library? Copy our user agent prompt to any LLM for instant integration guidance and code examples
- Developing the library? Use our developer agent prompt for architecture details and contribution help
Unified Interface (Recommended)
from ai_proxy_core import CompletionClient
# Single client for all providers
client = CompletionClient()
# Works with any model - auto-detects provider
response = await client.create_completion(
messages=[{"role": "user", "content": "Hello!"}],
model="gpt-4" # Auto-routes to OpenAI
)
response = await client.create_completion(
messages=[{"role": "user", "content": "Hello!"}],
model="gemini-1.5-flash" # Auto-routes to Gemini
)
response = await client.create_completion(
messages=[{"role": "user", "content": "Hello!"}],
model="llama2" # Auto-routes to Ollama
)
# All return the same standardized format
print(response["choices"][0]["message"]["content"])
Intelligent Model Selection
# Find the best model for your needs
best_model = await client.find_best_model({
"multimodal": True,
"min_context_limit": 32000,
"local_preferred": False
})
response = await client.create_completion(
messages=[{"role": "user", "content": "Describe this image"}],
model=best_model["id"]
)
Model Discovery
# List all available models across providers
models = await client.list_models()
for model in models:
print(f"{model['id']} ({model['provider']}) - {model['context_limit']:,} tokens")
# List models from specific provider
openai_models = await client.list_models(provider="openai")
Ollama Integration
Prerequisites
# Install Ollama from https://ollama.ai
# Start Ollama service
ollama serve
# Pull a model
ollama pull llama3.2
Using Ollama with CompletionClient
from ai_proxy_core import CompletionClient, ModelManager
# Option 1: Auto-detection (Ollama will be detected if running)
client = CompletionClient()
# Option 2: With custom ModelManager
manager = ModelManager()
client = CompletionClient(model_manager=manager)
# List Ollama models
models = await client.list_models(provider="ollama")
print(f"Available Ollama models: {[m['id'] for m in models]}")
# Create completion
response = await client.create_completion(
messages=[{"role": "user", "content": "Hello!"}],
model="llama3.2",
provider="ollama", # Optional, auto-detected from model name
temperature=0.7
)
Direct Ollama Usage
from ai_proxy_core import OllamaCompletions
ollama = OllamaCompletions()
# List available models
models = ollama.list_models()
print(f"Available models: {models}")
# Create completion
response = await ollama.create_completion(
messages=[{"role": "user", "content": "Explain quantum computing"}],
model="llama3.2",
temperature=0.7,
max_tokens=500
)
See examples/ollama_complete_guide.py for comprehensive examples including error handling, streaming, and advanced features.
Image Generation (v0.4.1+)
Generate Images with Explicit Model Selection
from ai_proxy_core import OpenAIImageProvider, ImageModel, ImageSize, ImageQuality, ImageStyle
# Initialize the provider
provider = OpenAIImageProvider(api_key="your-openai-api-key")
# Generate with DALL-E 3 (explicitly specify model)
response = provider.generate(
model=ImageModel.DALLE_3, # Required: specify which model
prompt="A modern app icon with turquoise background and camera symbol",
size=ImageSize.SQUARE, # 1024x1024
quality=ImageQuality.HD, # HD or STANDARD
style=ImageStyle.VIVID # VIVID or NATURAL (DALL-E 3 only)
)
# Generate with GPT-Image-1 (better instruction following)
response = provider.generate(
model=ImageModel.GPT_IMAGE_1, # Explicitly use GPT-Image-1
prompt="Create a detailed app icon following these specifications...",
size="4096x4096", # Supports up to 4K resolution
quality="high" # low, medium, high, or auto
)
# Access the generated image
with open("icon.png", "wb") as f:
f.write(response["images"]["data"])
# Token usage for GPT-Image-1
if response.get("usage"):
print(f"Tokens used: {response['usage']['total_tokens']}")
Available Models
# List available models and their capabilities
for model in provider.list_models():
print(f"Model: {model['id']}")
print(f" Sizes: {model['capabilities']['sizes']}")
print(f" Features: {model['capabilities']['features']}")
# Models:
# - dall-e-2: Multiple images, editing, 256x256 to 1024x1024
# - dall-e-3: Styles, HD quality, up to 1792x1024
# - gpt-image-1: Token pricing, 4K resolution, better instructions
Edit Images (DALL-E 2 Only)
# Image editing is only available with DALL-E 2
response = provider.edit(
image=original_image_bytes,
prompt="Add a sunset background",
model=ImageModel.DALLE_2, # Only DALL-E 2 supports editing
mask=mask_bytes, # Optional mask for inpainting
n=2 # Generate 2 variations
)
Model-Specific Features
# DALL-E 2: Generate multiple variations
response = provider.generate(
model=ImageModel.DALLE_2,
prompt="App icon variations",
n=5, # Generate 5 variations
size="512x512"
)
# DALL-E 3: Use styles for different aesthetics
response = provider.generate(
model=ImageModel.DALLE_3,
prompt="Photorealistic app icon",
style=ImageStyle.NATURAL, # or VIVID
quality=ImageQuality.HD
)
# GPT-Image-1: High resolution with compression
response = provider.generate(
model=ImageModel.GPT_IMAGE_1,
prompt="Ultra-detailed 4K app icon",
size="4096x4096",
quality="high",
output_compression=95 # Optional compression
)
Advanced Usage
Provider-Specific Completions
If you need provider-specific features, you can still use the individual clients:
from ai_proxy_core import GoogleCompletions, OpenAICompletions, OllamaCompletions
# Google Gemini with safety settings
google = GoogleCompletions(api_key="your-gemini-api-key")
response = await google.create_completion(
messages=[{"role": "user", "content": "Hello!"}],
model="gemini-1.5-flash",
safety_settings=[{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE"}]
)
# OpenAI with tool calling
openai = OpenAICompletions(api_key="your-openai-key")
response = await openai.create_completion(
messages=[{"role": "user", "content": "What's the weather?"}],
model="gpt-4",
tools=[{"type": "function", "function": {"name": "get_weather"}}]
)
# Ollama for local models
ollama = OllamaCompletions() # Auto-detects localhost:11434
response = await ollama.create_completion(
messages=[{"role": "user", "content": "Hello!"}],
model="llama3.2",
temperature=0.7
)
OpenAI-Compatible Endpoints
# Works with any OpenAI-compatible API (Groq, Anyscale, Together, etc.)
groq = OpenAICompletions(
api_key="your-groq-key",
base_url="https://api.groq.com/openai/v1"
)
response = await groq.create_completion(
messages=[{"role": "user", "content": "Hello!"}],
model="mixtral-8x7b-32768"
)
Gemini Live Session
from ai_proxy_core import GeminiLiveSession
# Example 1: Basic session (no system prompt)
session = GeminiLiveSession(api_key="your-gemini-api-key")
# Example 2: Session with system prompt (simple string format)
session = GeminiLiveSession(
api_key="your-gemini-api-key",
system_instruction="You are a helpful voice assistant. Be concise and friendly."
)
# Example 3: Session with built-in tools enabled
session = GeminiLiveSession(
api_key="your-gemini-api-key",
enable_code_execution=True, # Enable Python code execution
enable_google_search=True, # Enable web search
system_instruction="You are a helpful assistant with access to code execution and web search."
)
# Example 4: Session with custom function declarations
from google.genai import types
def get_weather(location: str) -> dict:
# Your custom function implementation
return {"location": location, "temp": 72, "condition": "sunny"}
weather_function = types.FunctionDeclaration(
name="get_weather",
description="Get current weather for a location",
parameters=types.Schema(
type="OBJECT",
properties={
"location": types.Schema(type="STRING", description="City name")
},
required=["location"]
)
)
session = GeminiLiveSession(
api_key="your-gemini-api-key",
custom_tools=[types.Tool(function_declarations=[weather_function])],
system_instruction="You can help with weather information."
)
# Set up callbacks
session.on_audio = lambda data: print(f"Received audio: {len(data)} bytes")
session.on_text = lambda text: print(f"Received text: {text}")
session.on_function_call = lambda call: handle_function_call(call)
async def handle_function_call(call):
if call["name"] == "get_weather":
result = get_weather(**call["args"])
await session.send_function_result(result)
# Start session
await session.start()
# Send audio/text
await session.send_audio(audio_data)
await session.send_text("What's the weather in Boston?")
# Stop when done
await session.stop()
Integration with FastAPI
Chat Completions API
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from ai_proxy_core import CompletionClient
app = FastAPI()
client = CompletionClient()
class CompletionRequest(BaseModel):
messages: list
model: str = "gemini-1.5-flash"
temperature: float = 0.7
@app.post("/api/chat/completions")
async def create_completion(request: CompletionRequest):
try:
response = await client.create_completion(
messages=request.messages,
model=request.model,
temperature=request.temperature
)
return response
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
### Audio in Chat Completions (Gemini)
The completions endpoint supports audio inputs for Google/Gemini via multimodal content:
- data URL audio_url entries (e.g., data:audio/mp3;base64,...)
- OpenAI-style input_audio objects with base64 data and format
Example request messages:
```json
[
{
"role": "user",
"content": [
{"type": "text", "text": "Transcribe and summarize this audio:"},
{
"type": "audio_url",
"audio_url": {
"url": "data:audio/mp3;base64,AAA..."
}
}
]
}
]
OpenAI-style input_audio:
[
{
"role": "user",
"content": [
{"type": "text", "text": "Please analyze this clip."},
{
"type": "input_audio",
"input_audio": {
"data": "AAA...",
"format": "mp3"
}
}
]
}
]
Supported formats include MP3, WAV, AAC, OGG, FLAC. For WebSocket (Live), only PCM is supported at this time; non-PCM will be rejected with a clear message.
#### WebSocket for Gemini Live (Fixed in v0.3.3)
```python
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from google import genai
from google.genai import types
import asyncio
app = FastAPI()
@app.websocket("/api/gemini/ws")
async def gemini_websocket(websocket: WebSocket):
await websocket.accept()
# Create Gemini client
client = genai.Client(
http_options={"api_version": "v1beta"},
api_key="your-gemini-api-key"
)
# Configure for text (audio requires PCM format)
config = types.LiveConnectConfig(
response_modalities=["TEXT"],
generation_config=types.GenerationConfig(
temperature=0.7,
max_output_tokens=1000
)
)
# Connect using async context manager
async with client.aio.live.connect(
model="gemini-2.0-flash-exp",
config=config
) as session:
# Handle bidirectional communication
async def receive_from_client():
async for message in websocket.iter_json():
if message["type"] in ["text", "message"]:
text = message.get("data", {}).get("text", "")
if text:
await session.send(input=text, end_of_turn=True)
async def receive_from_gemini():
while True:
turn = session.receive()
async for response in turn:
if hasattr(response, 'server_content'):
content = response.server_content
if hasattr(content, 'model_turn'):
for part in content.model_turn.parts:
if hasattr(part, 'text') and part.text:
await websocket.send_json({
"type": "response",
"text": part.text
})
# Run both tasks concurrently
task1 = asyncio.create_task(receive_from_client())
task2 = asyncio.create_task(receive_from_gemini())
# Wait for either to complete
done, pending = await asyncio.wait(
[task1, task2],
return_when=asyncio.FIRST_COMPLETED
)
# Clean up
for task in pending:
task.cancel()
Try the HTML Demo:
# Start the FastAPI server
uv run main.py
# Open the HTML demo in your browser
open examples/gemini_live_demo.html
The demo provides a full-featured chat interface with WebSocket connection to Gemini Live.
Note on audio (WebSocket):
- Audio input is supported for PCM only (16-bit PCM). Send base64-encoded PCM and it will be forwarded to Gemini Live.
- Non-PCM inputs (e.g., WebM/Opus) are rejected with: "Audio requires PCM format - WebM conversion not yet implemented".
- Example client payloads:
- Raw base64 string: {"type": "audio", "data": "<base64_pcm>"}
- Object form: {"type": "audio", "data": {"base64": "<base64_pcm>", "mime_type": "audio/pcm"}}
Features
🚀 Unified Interface
- Single client for all providers - No more provider-specific code
- Automatic provider routing - Detects provider from model name
- Intelligent model selection - Find best model based on requirements
- Zero-config setup - Auto-detects available providers from environment
🧠 Model Management
- Cross-provider model discovery - List models from OpenAI, Gemini, Ollama
- Rich model metadata - Context limits, capabilities, multimodal support
- Automatic model provisioning - Downloads Ollama models as needed
- Model compatibility checking - Ensures models support requested features
🔧 Developer Experience
- No framework dependencies - Use with FastAPI, Flask, or any Python app
- Async/await support - Modern async Python
- Type hints - Full type annotations
- Easy testing - Mock the unified client in your tests
- Backward compatible - All existing provider-specific code continues to work
🎯 Advanced Features
- WebSocket support - Real-time audio/text streaming with Gemini Live
- Built-in tools - Code execution and Google Search with simple flags
- Custom functions - Add your own function declarations
- Optional telemetry - OpenTelemetry integration for production monitoring
- Provider-specific optimizations - Access advanced features when needed
Telemetry
Basic observability with OpenTelemetry (optional):
# Install with: pip install "ai-proxy-core[telemetry]"
# Enable telemetry via environment variables
export OTEL_ENABLED=true
export OTEL_EXPORTER_TYPE=console # or "otlp" for production
export OTEL_ENDPOINT=localhost:4317 # for OTLP exporter
# Automatic telemetry for:
# - Request counts by model/status
# - Request latency tracking
# - Session duration for WebSockets
# - Error tracking with types
The telemetry is completely optional and has zero overhead when disabled.
Project Structure
📝 Note: Full documentation of the project structure is being tracked in Issue #12
This project serves dual purposes:
- Python Library (
/ai_proxy_core): Installable via pip for use in Python applications - Web Service (
/api): FastAPI endpoints for REST API access
Development
Releasing New Versions
We provide an automated release script that handles version bumping, building, and publishing:
# Make the script executable (first time only)
chmod +x release.sh
# Release a new version
./release.sh 0.1.9
The script will:
- Show current version and validate the new version format
- Prompt for a release description (for CHANGELOG)
- Update version in all necessary files (pyproject.toml, setup.py, init.py)
- Update CHANGELOG.md with your description
- Build the package
- Upload to PyPI
- Commit changes and create a git tag
- Push to GitHub with the new tag
Manual Build Process
If you prefer to build manually:
uv run python setup.py sdist bdist_wheel
twine upload dist/*
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_proxy_core-0.4.1.tar.gz.
File metadata
- Download URL: ai_proxy_core-0.4.1.tar.gz
- Upload date:
- Size: 44.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
195367e4f2b23b1862ad798d400e6b590644e520571ecfc309577cf4b0284f01
|
|
| MD5 |
8166c7117efce4b2b08a4c60614b489e
|
|
| BLAKE2b-256 |
d54e5de7f5421fd75e2b6e81ce79a54efb8091fb1aa7bd4964a7202744b89ac4
|
File details
Details for the file ai_proxy_core-0.4.1-py3-none-any.whl.
File metadata
- Download URL: ai_proxy_core-0.4.1-py3-none-any.whl
- Upload date:
- Size: 40.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a395fda84ca264ff6bfee8f0554eef8e53dd89a93f770fa8fb873e9ed4a51e4
|
|
| MD5 |
84b1254dcd4e5523c8095269d9ae422d
|
|
| BLAKE2b-256 |
3604e17b2d56a2d0433cc6cc45b499741dc90ed645c174653e57a1c6719212fc
|