Skip to main content

Multi-LLM Provider Library

Project description

llm_async

An async-first Python library for interacting with Large Language Model (LLM) providers.

Table of Contents

Features

  • Async-first: Built with asyncio for high-performance, non-blocking operations.
  • Provider Support: Supports OpenAI, Anthropic Claude, Google Gemini, and OpenRouter for chat completions.
  • Tool Calling: Automatic tool execution with unified tool definitions across providers.
  • Structured Outputs: Enforce JSON schema validation on responses (OpenAI, Google, OpenRouter).
  • Extensible: Easy to add new providers by inheriting from BaseProvider.
  • Tested: Comprehensive test suite with high coverage.

Installation

Using Poetry (Recommended)

poetry add llm_async

Using pip

pip install git+https://github.com/sonic182/llm_async.git

Usage

Basic Chat Completion

OpenAI

import asyncio
from llm_async import OpenAIProvider

async def main():
    # Initialize the provider with your API key
    provider = OpenAIProvider(api_key="your-openai-api-key")

    # Perform a chat completion
    response = await provider.acomplete(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, how are you?"}
        ]
    )

    print(response.main_response.content)  # Output: The assistant's response

# Run the async function
asyncio.run(main())

OpenRouter

import asyncio
import os
from llm_async import OpenRouterProvider

async def main():
    # Initialize the provider with your API key
    provider = OpenRouterProvider(api_key=os.getenv("OPENROUTER_API_KEY"))

    # Perform a chat completion
    response = await provider.acomplete(
        model="openrouter/auto",  # Let OpenRouter choose the best model
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, how are you?"}
        ],
        http_referer="https://github.com/your-username/your-app",  # Optional
        x_title="My AI App"  # Optional
    )

    print(response.main_response.content)  # Output: The assistant's response

# Run the async function
asyncio.run(main())

Google Gemini

import asyncio
from llm_async.providers.google import GoogleProvider

async def main():
    # Initialize the provider with your API key
    provider = GoogleProvider(api_key="your-google-gemini-api-key")

    # Perform a chat completion
    response = await provider.acomplete(
        model="gemini-2.5-flash",
        messages=[
            {"role": "user", "content": "Hello, how are you?"}
        ]
    )

    print(response.main_response.content)  # Output: The assistant's response

# Run the async function
asyncio.run(main())

Custom Base URL

provider = OpenAIProvider(
    api_key="your-api-key",
    base_url="https://custom-openai-endpoint.com/v1"
)

Tool Usage

import asyncio
from llm_async.models import Tool
from llm_async.providers import OpenAIProvider, ClaudeProvider

# Define a calculator tool that works with both providers
calculator_tool = Tool(
    name="calculator",
    description="Perform basic arithmetic operations",
    parameters={
        "type": "object",
        "properties": {
            "operation": {
                "type": "string",
                "enum": ["add", "subtract", "multiply", "divide"]
            },
            "a": {"type": "number"},
            "b": {"type": "number"}
        },
        "required": ["operation", "a", "b"]
    },
    input_schema={
        "type": "object",
        "properties": {
            "operation": {
                "type": "string",
                "enum": ["add", "subtract", "multiply", "divide"]
            },
            "a": {"type": "number"},
            "b": {"type": "number"}
        },
        "required": ["operation", "a", "b"]
    }
)

def calculator(operation: str, a: float, b: float) -> float:
    """Calculator function that can be called by the LLM."""
    if operation == "add":
        return a + b
    elif operation == "multiply":
        return a * b
    # ... other operations

async def main():
    # Initialize providers
    openai = OpenAIProvider(api_key="your-openai-key")
    claude = ClaudeProvider(api_key="your-anthropic-key")
    
    # Tool executor
    tool_executor = {"calculator": calculator}
    
    # Use the same tool with OpenAI
    response = await openai.acomplete(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "What is 15 + 27?"}],
        tools=[calculator_tool],
        auto_execute_tools=True,
        tool_executor=tool_executor
    )
    print(f"OpenAI: {response}")
    
    # Use the same tool with Claude
    response = await claude.acomplete(
        model="claude-3-haiku-20240307",
        messages=[{"role": "user", "content": "What is 15 + 27?"}],
        tools=[calculator_tool],
        auto_execute_tools=True,
        tool_executor=tool_executor
    )
    print(f"Claude: {response}")

asyncio.run(main())

Pub/Sub Events for Tool Execution

llm_async supports real-time event emission during tool execution via a pub/sub system. This allows you to monitor tool progress, handle errors, and build interactive UIs for agentic workflows.

Events are emitted for each tool call with topics like tools.{provider}.{tool_name}.{status} where status is start, complete, or error.

Basic Usage

import asyncio
from llm_async import OpenAIProvider
from llm_async.pubsub import LocalQueueBackend, PubSub
from llm_async.models import Tool

# Define your tool (same as above)
calculator_tool = Tool(...)  # See Tool Usage example

def calculator(operation: str, a: float, b: float) -> float:
    # Implementation
    pass

async def event_monitor(pubsub: PubSub):
    """Monitor tool execution events."""
    print("📡 Monitoring tool events...")
    async for event in pubsub.subscribe("tools.*"):
        topic = event.topic
        payload = event.payload
        
        if "start" in topic:
            print(f"⏱️  STARTED: {payload.get('tool_name')} with args {payload.get('args')}")
        elif "complete" in topic:
            print(f"✅ COMPLETED: {payload.get('tool_name')} -> {payload.get('result')}")
        elif "error" in topic:
            print(f"❌ ERROR: {payload.get('tool_name')} - {payload.get('error')}")

async def main():
    # Setup pub/sub
    backend = LocalQueueBackend()
    pubsub = PubSub(backend)
    
    # Start monitoring in background
    monitor_task = asyncio.create_task(event_monitor(pubsub))
    
    try:
        provider = OpenAIProvider(api_key="your-openai-key")
        
        response = await provider.acomplete(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Calculate 15 + 27"}],
            tools=[calculator_tool],
            auto_execute_tools=True,
            tool_executor={"calculator": calculator},
            pubsub=pubsub  # Enable event emission
        )
        
        print(f"\n🤖 Final Response: {response.main_response.content}")
    finally:
        await asyncio.sleep(0.2)  # Allow final events
        await pubsub.close()
        monitor_task.cancel()

asyncio.run(main())

Event Payloads

  • Start: {"call_id": str, "tool_name": str, "args": dict}
  • Complete: {"call_id": str, "tool_name": str, "result": str}
  • Error: {"call_id": str, "tool_name": str, "error": str}

Backend Options

  • LocalQueueBackend: In-memory asyncio queues (default, for single-process)
  • Future backends: Redis, RabbitMQ (extensible via PubSubBackend)

Structured Outputs

Enforce JSON schema validation on model responses for consistent, type-safe outputs.

import asyncio
import json
from llm_async import OpenAIProvider
from llm_async.providers.google import GoogleProvider

# Define response schema
response_schema = {
    "type": "object",
    "properties": {
        "answer": {"type": "string"},
        "confidence": {"type": "number"}
    },
    "required": ["answer", "confidence"],
    "additionalProperties": False
}

async def main():
    # OpenAI example
    openai_provider = OpenAIProvider(api_key="your-openai-key")
    response = await openai_provider.acomplete(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "What is the capital of France?"}],
        response_schema=response_schema
    )
    result = json.loads(response.main_response.content)
    print(f"OpenAI: {result}")

    # Google Gemini example
    google_provider = GoogleProvider(api_key="your-google-key")
    response = await google_provider.acomplete(
        model="gemini-2.5-flash",
        messages=[{"role": "user", "content": "What is the capital of France?"}],
        response_schema=response_schema
    )
    result = json.loads(response.main_response.content)
    print(f"Gemini: {result}")

asyncio.run(main())

Supported Providers: OpenAI, Google Gemini, OpenRouter. Claude does not support structured outputs.

API Reference

OpenAIProvider

  • __init__(api_key: str, base_url: str = "https://api.openai.com/v1")

  • acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]

    Performs a chat completion. When stream=True the method returns an async iterator that yields StreamChunk objects as they arrive from the provider.

OpenRouterProvider

  • __init__(api_key: str, base_url: str = "https://openrouter.ai/api/v1")

  • acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]

    Performs a chat completion using OpenRouter's unified API. Supports the same OpenAI-compatible interface with additional optional headers:

    • http_referer: Your application's URL (recommended)
    • x_title: Your application's name (recommended)

    OpenRouter provides access to hundreds of AI models from various providers through a single API.

GoogleProvider

  • __init__(api_key: str, base_url: str = "https://generativelanguage.googleapis.com/v1beta/models/")

  • acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]

    Performs a chat completion using Google's Gemini API. Supports structured outputs and uses camelCase for API keys (e.g., generationConfig).

Streaming

  • Usage: async for chunk in await provider.acomplete(..., stream=True): print or process chunk in real time.
  • Notes: Tool auto-execution (auto_execute_tools=True) is not supported while streaming.

Example output

--- OpenAI streaming response ---
1. Peel and slice potatoes.
2. Par-cook potatoes briefly.
3. Whisk eggs with salt and pepper.
4. Sauté onions until translucent (optional).
5. Combine potatoes and eggs in a pan and cook until set.
6. Fold and serve.
--- Claude streaming response ---
1. Prepare potatoes by peeling and slicing.
2. Fry or boil until tender.
3. Beat eggs and season.
4. Mix potatoes with eggs and cook gently.
5. Serve warm.

Development

Setup

git clone https://github.com/sonic182/llm_async.git
cd llm_async
poetry install

Running Tests

poetry run pytest

Building

poetry build

Roadmap

  • Support for additional providers (e.g., Grok, Anthropic direct API)
  • More advanced tool features
  • Response caching and retry mechanisms

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

License

MIT License - see the LICENSE file for details.

Authors

  • sonic182

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_async-0.1.1.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_async-0.1.1-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file llm_async-0.1.1.tar.gz.

File metadata

  • Download URL: llm_async-0.1.1.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for llm_async-0.1.1.tar.gz
Algorithm Hash digest
SHA256 95c6abb71ed53f60957778d1e13366055454b7b68b067ab20a108b6cb959022e
MD5 add25afe9b2f3bad72ccb2311e8bf0d9
BLAKE2b-256 73d6cc9b6111ab29fc1dbee9a02499ccd1792cd335e92a49637b02475dede658

See more details on using hashes here.

File details

Details for the file llm_async-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llm_async-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for llm_async-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5babc79c5b6ed8a77de9c582f9e8162d6322aac1e72d7710dd443f7ad90e3125
MD5 17135f587e25ec5b40326fa6286642f4
BLAKE2b-256 f6a8f84cdbed882af4cf4e6b716139b7b51f25a9f300225a35e2031cfa7eb439

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page