Skip to main content

Multi-LLM Provider Library

Project description

llm_async — Async multi‑provider LLM client for Python

High-performance, async-first LLM client for OpenAI, Claude, Google Gemini, and OpenRouter. Built on top of aiosonic for fast, low-latency HTTP and true asyncio streaming across providers.

PyPI - Version Python Versions License: MIT Tests Coverage Code Style: ruff

Table of Contents

Features

Supported Providers & Features

Feature OpenAI Claude Google Gemini OpenRouter
Chat Completions
Tool Calling
Streaming
Structured Outputs

Notes:

  • Structured Outputs: Supported by OpenAI, Google Gemini, and OpenRouter; not supported by Claude.

  • See Examples for tool-call round-trips and streaming demos.

  • Async-first: Built with asyncio for high-performance, non-blocking operations.

  • Provider Support: Supports OpenAI, Anthropic Claude, Google Gemini, and OpenRouter for chat completions.

  • Tool Calling: Tool execution with unified tool definitions across providers.

  • Structured Outputs: Enforce JSON schema validation on responses (OpenAI, Google, OpenRouter).

  • Extensible: Easy to add new providers by inheriting from BaseProvider.

  • Tested: Comprehensive test suite with high coverage.

Performance

  • Built on top of aiosonic for fast, low-overhead async HTTP requests and streaming.
  • True asyncio end-to-end: concurrent requests across providers with minimal overhead.
  • Designed for fast tool-call round-trips and low-latency streaming.

Installation

Using Poetry (Recommended)

poetry add llm_async

Using pip

pip install llm-async

Quickstart

Minimal async example with streaming using OpenAI-compatible interface:

import asyncio
from llm_async import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="YOUR_OPENAI_API_KEY")
    # Stream tokens as they arrive
    async for chunk in await provider.acomplete(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Give me 3 ideas for a CLI tool."}],
        stream=True,
    ):
        print(chunk.delta, end="", flush=True)

asyncio.run(main())

Usage

Basic Chat Completion

OpenAI

import asyncio
from llm_async import OpenAIProvider

async def main():
    # Initialize the provider with your API key
    provider = OpenAIProvider(api_key="your-openai-api-key")

    # Perform a chat completion
    response = await provider.acomplete(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, how are you?"}
        ]
    )

    print(response.main_response.content)  # Output: The assistant's response

# Run the async function
asyncio.run(main())

OpenRouter

import asyncio
import os
from llm_async import OpenRouterProvider

async def main():
    # Initialize the provider with your API key
    provider = OpenRouterProvider(api_key=os.getenv("OPENROUTER_API_KEY"))

    # Perform a chat completion
    response = await provider.acomplete(
        model="openrouter/auto",  # Let OpenRouter choose the best model
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, how are you?"}
        ],
        http_referer="https://github.com/your-username/your-app",  # Optional
        x_title="My AI App"  # Optional
    )

    print(response.main_response.content)  # Output: The assistant's response

# Run the async function
asyncio.run(main())

Google Gemini

import asyncio
from llm_async.providers.google import GoogleProvider

async def main():
    # Initialize the provider with your API key
    provider = GoogleProvider(api_key="your-google-gemini-api-key")

    # Perform a chat completion
    response = await provider.acomplete(
        model="gemini-2.5-flash",
        messages=[
            {"role": "user", "content": "Hello, how are you?"}
        ]
    )

    print(response.main_response.content)  # Output: The assistant's response

# Run the async function
asyncio.run(main())

Custom Base URL

provider = OpenAIProvider(
    api_key="your-api-key",
    base_url="https://custom-openai-endpoint.com/v1"
)

Tool Usage

import asyncio
import os
from llm_async.models import Tool
from llm_async.providers import OpenAIProvider

# Define a calculator tool
calculator_tool = Tool(
    name="calculator",
    description="Perform basic arithmetic operations",
    parameters={
        "type": "object",
        "properties": {
            "operation": {
                "type": "string",
                "enum": ["add", "subtract", "multiply", "divide"]
            },
            "a": {"type": "number"},
            "b": {"type": "number"}
        },
        "required": ["operation", "a", "b"]
    },
    input_schema={
        "type": "object",
        "properties": {
            "operation": {
                "type": "string",
                "enum": ["add", "subtract", "multiply", "divide"]
            },
            "a": {"type": "number"},
            "b": {"type": "number"}
        },
        "required": ["operation", "a", "b"]
    }
)

def calculator(operation: str, a: float, b: float) -> float:
    """Calculator function that can be called by the LLM."""
    if operation == "add":
        return a + b
    elif operation == "subtract":
        return a - b
    elif operation == "multiply":
        return a * b
    elif operation == "divide":
        return a / b
    return 0

async def main():
    # Initialize provider
    provider = OpenAIProvider(api_key=os.getenv("OPENAI_API_KEY"))
    
    # Tool executor mapping
    tools_map = {"calculator": calculator}
    
    # Initial user message
    messages = [{"role": "user", "content": "What is 15 + 27?"}]
    
    # First turn: Ask the LLM to perform a calculation
    response = await provider.acomplete(
        model="gpt-4o-mini",
        messages=messages,
        tools=[calculator_tool]
    )
    
    # Execute the tool call
    tool_call = response.main_response.tool_calls[0]
    tool_result = await provider.execute_tool(tool_call, tools_map)
    
    # Second turn: Send the tool result back to the LLM
    messages_with_tool = messages + [response.main_response.original] + [tool_result]
    
    final_response = await provider.acomplete(
        model="gpt-4o-mini",
        messages=messages_with_tool
    )
    
    print(final_response.main_response.content)  # Output: The final answer

asyncio.run(main())

Recipes

  • Streaming across providers: see examples/stream_all_providers.py
  • Tool-call round-trip (calculator): see examples/tool_call_all_providers.py
  • Structured outputs (JSON schema): see section below and examples

Examples

The examples directory contains runnable scripts for local testing against all supported providers:

  • examples/tool_call_all_providers.py shows how to execute the same calculator tool call round-trip with OpenAI, OpenRouter, Claude, and Google using shared message/tool definitions.
  • examples/stream_all_providers.py streams completions from the same provider list so you can compare chunking formats and latency.

Both scripts expect a .env file with OPENAI_API_KEY, OPENROUTER_API_KEY, CLAUDE_API_KEY, and GEMINI_API_KEY (plus optional per-provider model overrides). Run them via Poetry, e.g. poetry run python examples/tool_call_all_providers.py.

Structured Outputs

Enforce JSON schema validation on model responses for consistent, type-safe outputs.

import asyncio
import json
from llm_async import OpenAIProvider
from llm_async.providers.google import GoogleProvider

# Define response schema
response_schema = {
    "type": "object",
    "properties": {
        "answer": {"type": "string"},
        "confidence": {"type": "number"}
    },
    "required": ["answer", "confidence"],
    "additionalProperties": False
}

async def main():
    # OpenAI example
    openai_provider = OpenAIProvider(api_key="your-openai-key")
    response = await openai_provider.acomplete(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "What is the capital of France?"}],
        response_schema=response_schema
    )
    result = json.loads(response.main_response.content)
    print(f"OpenAI: {result}")

    # Google Gemini example
    google_provider = GoogleProvider(api_key="your-google-key")
    response = await google_provider.acomplete(
        model="gemini-2.5-flash",
        messages=[{"role": "user", "content": "What is the capital of France?"}],
        response_schema=response_schema
    )
    result = json.loads(response.main_response.content)
    print(f"Gemini: {result}")

asyncio.run(main())

Supported Providers: OpenAI, Google Gemini, OpenRouter. Claude does not support structured outputs.

Why llm_async?

  • Async-first performance (aiosonic-based) vs. sync or heavier HTTP stacks.
  • Unified provider interface: same message/tool/streaming patterns across OpenAI, Claude, Gemini, OpenRouter.
  • Structured outputs (OpenAI, Google, OpenRouter) with JSON schema validation.
  • Tool-call round-trip helpers for consistent multi-turn execution.
  • Minimal surface area: easy to extend with new providers via BaseProvider.

API Reference

OpenAIProvider

  • __init__(api_key: str, base_url: str = "https://api.openai.com/v1")

  • acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]

    Performs a chat completion. When stream=True the method returns an async iterator that yields StreamChunk objects as they arrive from the provider.

OpenRouterProvider

  • __init__(api_key: str, base_url: str = "https://openrouter.ai/api/v1")

  • acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]

    Performs a chat completion using OpenRouter's unified API. Supports the same OpenAI-compatible interface with additional optional headers:

    • http_referer: Your application's URL (recommended)
    • x_title: Your application's name (recommended)

    OpenRouter provides access to hundreds of AI models from various providers through a single API.

GoogleProvider

  • __init__(api_key: str, base_url: str = "https://generativelanguage.googleapis.com/v1beta/models/")

  • acomplete(model: str, messages: list[dict], stream: bool = False, **kwargs) -> Response | AsyncIterator[StreamChunk]

    Performs a chat completion using Google's Gemini API. Supports structured outputs and uses camelCase for API keys (e.g., generationConfig).

Streaming

  • Usage: async for chunk in await provider.acomplete(..., stream=True): print or process chunk in real time.

Example output

--- OpenAI streaming response ---
1. Peel and slice potatoes.
2. Par-cook potatoes briefly.
3. Whisk eggs with salt and pepper.
4. Sauté onions until translucent (optional).
5. Combine potatoes and eggs in a pan and cook until set.
6. Fold and serve.
--- Claude streaming response ---
1. Prepare potatoes by peeling and slicing.
2. Fry or boil until tender.
3. Beat eggs and season.
4. Mix potatoes with eggs and cook gently.
5. Serve warm.

Development

Setup

git clone https://github.com/sonic182/llm-async.git
cd llm_async
poetry install

Running Tests

poetry run pytest

Building

poetry build

Roadmap

  • Support for additional providers (e.g., Grok, Anthropic direct API)
  • More advanced tool features
  • Response caching and retry mechanisms

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

License

MIT License - see the LICENSE file for details.

Authors

  • sonic182

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_async-0.3.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_async-0.3.0-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file llm_async-0.3.0.tar.gz.

File metadata

  • Download URL: llm_async-0.3.0.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llm_async-0.3.0.tar.gz
Algorithm Hash digest
SHA256 4df887688026a7517af37435c58cb71788cbe9eebda0234ac601b93d61261420
MD5 1a4b27e9cfeaed8d2c8a21d7e98ecc6a
BLAKE2b-256 808bc5644b09134236056d1767fb000c296d3d6e484eb892c7bebd7626f85572

See more details on using hashes here.

File details

Details for the file llm_async-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: llm_async-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llm_async-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 74b3788b6ee770de57ca3bd9d0156a04a2c3223f5f89f0cd499d94d6955319a9
MD5 a0f5f0f5cb75d93f3a7d68158ac3ce6a
BLAKE2b-256 c8f47a3027f71c83c21e57cfca655f071ec99e35a2d1bfa5a613526647765fd4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page