Minimal, reusable AI service handlers for Gemini and other LLMs

These details have not been verified by PyPI

Project links

Project description

AI Proxy Core

A unified Python package providing a single interface for AI completions across multiple providers (OpenAI, Gemini, Ollama). Features intelligent model management, automatic provider routing, and zero-config setup.

💡 Why not LangChain? Read our philosophy and architectural rationale for choosing simplicity over complexity.

🎯 What's Next? See our wrapper layer roadmap for planned features and what belongs in a clean LLM wrapper.

Installation

Basic (Google Gemini only):

pip install ai-proxy-core

With specific providers (optional dependencies):

pip install ai-proxy-core[openai]     # OpenAI support
pip install ai-proxy-core[anthropic]  # Anthropic support (coming soon)
pip install ai-proxy-core[telemetry]  # OpenTelemetry support
pip install ai-proxy-core[all]        # Everything

Or install from source:

git clone https://github.com/ebowwa/ai-proxy-core.git
cd ai-proxy-core
pip install -e .
# With all extras: pip install -e ".[all]"

Quick Start

🤖 AI Integration Help: Copy our expert agent prompt to any LLM (ChatGPT, Claude, etc.) for instant integration guidance and code examples tailored to your project.

Unified Interface (Recommended)

from ai_proxy_core import CompletionClient

# Single client for all providers
client = CompletionClient()

# Works with any model - auto-detects provider
response = await client.create_completion(
    messages=[{"role": "user", "content": "Hello!"}],
    model="gpt-4"  # Auto-routes to OpenAI
)

response = await client.create_completion(
    messages=[{"role": "user", "content": "Hello!"}],
    model="gemini-1.5-flash"  # Auto-routes to Gemini
)

response = await client.create_completion(
    messages=[{"role": "user", "content": "Hello!"}],
    model="llama2"  # Auto-routes to Ollama
)

# All return the same standardized format
print(response["choices"][0]["message"]["content"])

Intelligent Model Selection

# Find the best model for your needs
best_model = await client.find_best_model({
    "multimodal": True,
    "min_context_limit": 32000,
    "local_preferred": False
})

response = await client.create_completion(
    messages=[{"role": "user", "content": "Describe this image"}],
    model=best_model["id"]
)

Model Discovery

# List all available models across providers
models = await client.list_models()
for model in models:
    print(f"{model['id']} ({model['provider']}) - {model['context_limit']:,} tokens")

# List models from specific provider
openai_models = await client.list_models(provider="openai")

Ollama Integration

Prerequisites

# Install Ollama from https://ollama.ai
# Start Ollama service
ollama serve

# Pull a model
ollama pull llama3.2

Using Ollama with CompletionClient

from ai_proxy_core import CompletionClient, ModelManager

# Option 1: Auto-detection (Ollama will be detected if running)
client = CompletionClient()

# Option 2: With custom ModelManager
manager = ModelManager()
client = CompletionClient(model_manager=manager)

# List Ollama models
models = await client.list_models(provider="ollama")
print(f"Available Ollama models: {[m['id'] for m in models]}")

# Create completion
response = await client.create_completion(
    messages=[{"role": "user", "content": "Hello!"}],
    model="llama3.2",
    provider="ollama",  # Optional, auto-detected from model name
    temperature=0.7
)

Direct Ollama Usage

from ai_proxy_core import OllamaCompletions

ollama = OllamaCompletions()

# List available models
models = ollama.list_models()
print(f"Available models: {models}")

# Create completion
response = await ollama.create_completion(
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    model="llama3.2",
    temperature=0.7,
    max_tokens=500
)

See examples/ollama_complete_guide.py for comprehensive examples including error handling, streaming, and advanced features.

Advanced Usage

Provider-Specific Completions

If you need provider-specific features, you can still use the individual clients:

from ai_proxy_core import GoogleCompletions, OpenAICompletions, OllamaCompletions

# Google Gemini with safety settings
google = GoogleCompletions(api_key="your-gemini-api-key")
response = await google.create_completion(
    messages=[{"role": "user", "content": "Hello!"}],
    model="gemini-1.5-flash",
    safety_settings=[{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE"}]
)

# OpenAI with tool calling
openai = OpenAICompletions(api_key="your-openai-key")
response = await openai.create_completion(
    messages=[{"role": "user", "content": "What's the weather?"}],
    model="gpt-4",
    tools=[{"type": "function", "function": {"name": "get_weather"}}]
)

# Ollama for local models
ollama = OllamaCompletions()  # Auto-detects localhost:11434
response = await ollama.create_completion(
    messages=[{"role": "user", "content": "Hello!"}],
    model="llama3.2",
    temperature=0.7
)

OpenAI-Compatible Endpoints

# Works with any OpenAI-compatible API (Groq, Anyscale, Together, etc.)
groq = OpenAICompletions(
    api_key="your-groq-key",
    base_url="https://api.groq.com/openai/v1"
)

response = await groq.create_completion(
    messages=[{"role": "user", "content": "Hello!"}],
    model="mixtral-8x7b-32768"
)

Gemini Live Session

from ai_proxy_core import GeminiLiveSession

# Example 1: Basic session (no system prompt)
session = GeminiLiveSession(api_key="your-gemini-api-key")

# Example 2: Session with system prompt (simple string format)
session = GeminiLiveSession(
    api_key="your-gemini-api-key",
    system_instruction="You are a helpful voice assistant. Be concise and friendly."
)

# Example 3: Session with built-in tools enabled
session = GeminiLiveSession(
    api_key="your-gemini-api-key",
    enable_code_execution=True,      # Enable Python code execution
    enable_google_search=True,       # Enable web search
    system_instruction="You are a helpful assistant with access to code execution and web search."
)

# Example 4: Session with custom function declarations
from google.genai import types

def get_weather(location: str) -> dict:
    # Your custom function implementation
    return {"location": location, "temp": 72, "condition": "sunny"}

weather_function = types.FunctionDeclaration(
    name="get_weather",
    description="Get current weather for a location",
    parameters=types.Schema(
        type="OBJECT",
        properties={
            "location": types.Schema(type="STRING", description="City name")
        },
        required=["location"]
    )
)

session = GeminiLiveSession(
    api_key="your-gemini-api-key",
    custom_tools=[types.Tool(function_declarations=[weather_function])],
    system_instruction="You can help with weather information."
)

# Set up callbacks
session.on_audio = lambda data: print(f"Received audio: {len(data)} bytes")
session.on_text = lambda text: print(f"Received text: {text}")
session.on_function_call = lambda call: handle_function_call(call)

async def handle_function_call(call):
    if call["name"] == "get_weather":
        result = get_weather(**call["args"])
        await session.send_function_result(result)

# Start session
await session.start()

# Send audio/text
await session.send_audio(audio_data)
await session.send_text("What's the weather in Boston?")

# Stop when done
await session.stop()

Integration with FastAPI

Chat Completions API

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from ai_proxy_core import CompletionClient

app = FastAPI()
client = CompletionClient()

class CompletionRequest(BaseModel):
    messages: list
    model: str = "gemini-1.5-flash"
    temperature: float = 0.7

@app.post("/api/chat/completions")
async def create_completion(request: CompletionRequest):
    try:
        response = await client.create_completion(
            messages=request.messages,
            model=request.model,
            temperature=request.temperature
        )
        return response
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

WebSocket for Gemini Live (Fixed in v0.3.3)

from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from google import genai
from google.genai import types
import asyncio

app = FastAPI()

@app.websocket("/api/gemini/ws")
async def gemini_websocket(websocket: WebSocket):
    await websocket.accept()
    
    # Create Gemini client
    client = genai.Client(
        http_options={"api_version": "v1beta"},
        api_key="your-gemini-api-key"
    )
    
    # Configure for text (audio requires PCM format)
    config = types.LiveConnectConfig(
        response_modalities=["TEXT"],
        generation_config=types.GenerationConfig(
            temperature=0.7,
            max_output_tokens=1000
        )
    )
    
    # Connect using async context manager
    async with client.aio.live.connect(
        model="gemini-2.0-flash-exp",
        config=config
    ) as session:
        
        # Handle bidirectional communication
        async def receive_from_client():
            async for message in websocket.iter_json():
                if message["type"] in ["text", "message"]:
                    text = message.get("data", {}).get("text", "")
                    if text:
                        await session.send(input=text, end_of_turn=True)
        
        async def receive_from_gemini():
            while True:
                turn = session.receive()
                async for response in turn:
                    if hasattr(response, 'server_content'):
                        content = response.server_content
                        if hasattr(content, 'model_turn'):
                            for part in content.model_turn.parts:
                                if hasattr(part, 'text') and part.text:
                                    await websocket.send_json({
                                        "type": "response",
                                        "text": part.text
                                    })
        
        # Run both tasks concurrently
        task1 = asyncio.create_task(receive_from_client())
        task2 = asyncio.create_task(receive_from_gemini())
        
        # Wait for either to complete
        done, pending = await asyncio.wait(
            [task1, task2],
            return_when=asyncio.FIRST_COMPLETED
        )
        
        # Clean up
        for task in pending:
            task.cancel()

Try the HTML Demo:

# Start the FastAPI server
python main.py

# Open the HTML demo in your browser
open examples/gemini_live_demo.html

The demo provides a full-featured chat interface with WebSocket connection to Gemini Live. Note: Audio input requires PCM format conversion (not yet implemented).

Features

🚀 Unified Interface

Single client for all providers - No more provider-specific code
Automatic provider routing - Detects provider from model name
Intelligent model selection - Find best model based on requirements
Zero-config setup - Auto-detects available providers from environment

🧠 Model Management

Cross-provider model discovery - List models from OpenAI, Gemini, Ollama
Rich model metadata - Context limits, capabilities, multimodal support
Automatic model provisioning - Downloads Ollama models as needed
Model compatibility checking - Ensures models support requested features

🔧 Developer Experience

No framework dependencies - Use with FastAPI, Flask, or any Python app
Async/await support - Modern async Python
Type hints - Full type annotations
Easy testing - Mock the unified client in your tests
Backward compatible - All existing provider-specific code continues to work

🎯 Advanced Features

WebSocket support - Real-time audio/text streaming with Gemini Live
Built-in tools - Code execution and Google Search with simple flags
Custom functions - Add your own function declarations
Optional telemetry - OpenTelemetry integration for production monitoring
Provider-specific optimizations - Access advanced features when needed

Telemetry

Basic observability with OpenTelemetry (optional):

# Install with: pip install "ai-proxy-core[telemetry]"

# Enable telemetry via environment variables
export OTEL_ENABLED=true
export OTEL_EXPORTER_TYPE=console  # or "otlp" for production
export OTEL_ENDPOINT=localhost:4317  # for OTLP exporter

# Automatic telemetry for:
# - Request counts by model/status
# - Request latency tracking
# - Session duration for WebSockets
# - Error tracking with types

The telemetry is completely optional and has zero overhead when disabled.

Project Structure

📝 Note: Full documentation of the project structure is being tracked in Issue #12

This project serves dual purposes:

Python Library (/ai_proxy_core): Installable via pip for use in Python applications
Web Service (/api): FastAPI endpoints for REST API access

Development

Releasing New Versions

We provide an automated release script that handles version bumping, building, and publishing:

# Make the script executable (first time only)
chmod +x release.sh

# Release a new version
./release.sh 0.1.9

The script will:

Show current version and validate the new version format
Prompt for a release description (for CHANGELOG)
Update version in all necessary files (pyproject.toml, setup.py, init.py)
Update CHANGELOG.md with your description
Build the package
Upload to PyPI
Commit changes and create a git tag
Push to GitHub with the new tag

Manual Build Process

If you prefer to build manually:

python setup.py sdist bdist_wheel
twine upload dist/*

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.42

Aug 28, 2025

0.4.41

Aug 28, 2025

0.4.40

Aug 27, 2025

0.4.4

Aug 27, 2025

0.4.3

Aug 21, 2025

0.4.2

Aug 21, 2025

0.4.1

Aug 19, 2025

0.3.9

Aug 14, 2025

0.3.8

Aug 9, 2025

0.3.7

Aug 8, 2025

0.3.6

Aug 8, 2025

0.3.5

Aug 8, 2025

This version

0.3.4

Aug 8, 2025

0.3.3

Aug 4, 2025

0.3.2

Aug 4, 2025

0.3.1

Aug 3, 2025

0.3.0

Aug 3, 2025

0.2.0

Aug 3, 2025

0.1.10

Aug 3, 2025

0.1.9

Aug 3, 2025

0.1.8

Aug 2, 2025

0.1.7

Aug 1, 2025

0.1.6

Aug 1, 2025

0.1.5

Aug 1, 2025

0.1.4

Aug 1, 2025

0.1.3

Aug 1, 2025

0.1.2

Aug 1, 2025

0.1.1

Aug 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai-proxy-core-0.3.4.tar.gz (31.8 kB view details)

Uploaded Aug 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_proxy_core-0.3.4-py3-none-any.whl (30.8 kB view details)

Uploaded Aug 8, 2025 Python 3

File details

Details for the file ai-proxy-core-0.3.4.tar.gz.

File metadata

Download URL: ai-proxy-core-0.3.4.tar.gz
Upload date: Aug 8, 2025
Size: 31.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for ai-proxy-core-0.3.4.tar.gz
Algorithm	Hash digest
SHA256	`741abc6f44b66bede9ba8d0b36807d980df5cdbdb2fcc418bbf0b18f93245e80`
MD5	`d9bf6e5239f91a7d5b56a8a749b27b8d`
BLAKE2b-256	`5fc158066dee5d4815d2d57423a74d86403258c7903a2fc2c9be73158d42ccac`

See more details on using hashes here.

File details

Details for the file ai_proxy_core-0.3.4-py3-none-any.whl.

File metadata

Download URL: ai_proxy_core-0.3.4-py3-none-any.whl
Upload date: Aug 8, 2025
Size: 30.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for ai_proxy_core-0.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`785a043ee98eb25b22562c0c8df1af43bc2504d769990d092fb6513b7110706b`
MD5	`5207ff79c9fd45bf059424fe3b768bff`
BLAKE2b-256	`afd580662b7c3f6b618aa2105e8f89ac2d853a09dc0b016fce6d5b02570b5751`

See more details on using hashes here.

ai-proxy-core 0.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AI Proxy Core

Installation

Quick Start

Unified Interface (Recommended)

Intelligent Model Selection

Model Discovery

Ollama Integration

Prerequisites

Using Ollama with CompletionClient

Direct Ollama Usage

Advanced Usage

Provider-Specific Completions

OpenAI-Compatible Endpoints

Gemini Live Session

Integration with FastAPI

Chat Completions API

WebSocket for Gemini Live (Fixed in v0.3.3)

Features

🚀 Unified Interface

🧠 Model Management

🔧 Developer Experience

🎯 Advanced Features

Telemetry

Project Structure

Development

Releasing New Versions

Manual Build Process

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes