Skip to main content

Add your description here

Project description

any-llm-client

A unified and lightweight asynchronous Python API for communicating with LLMs. It supports multiple providers, including OpenAI Chat Completions API (and any OpenAI-compatible API, such as Ollama and vLLM) and YandexGPT API.

How To Use

Before starting using any-llm-client, make sure you have it installed:

uv add any-llm-client
poetry add any-llm-client

Response API

Here's a full example that uses Ollama and Qwen2.5-Coder:

import asyncio

import httpx
import pydantic

import any_llm_client


config = any_llm_client.OpenAIConfig(
    url=pydantic.HttpUrl("http://127.0.0.1:11434/v1/chat/completions"),
    model_name="qwen2.5-coder:1.5b",
)


async def main() -> None:
    async with httpx.AsyncClient() as httpx_client:
        response = await any_llm_client.get_client(config, httpx_client=httpx_client).request_llm_response(
            messages=[
                any_llm_client.Message(role="system", text="Ты — опытный ассистент"),
                any_llm_client.Message(role="user", text="Привет!"),
            ],
            temperature=0.1,
        )
        print(response)  # type(response) is str


asyncio.run(main())

To use YandexGPT, replace the config:

config = any_llm_client.YandexGPTConfig(
    auth_header=os.environ["YANDEX_AUTH_HEADER"],
    folder_id=os.environ["YANDEX_FOLDER_ID"],
    model_name="yandexgpt",
)

Streaming API

LLMs often take long time to respond fully. Here's an example of streaming API usage:

import asyncio
import sys

import httpx
import pydantic

import any_llm_client


config = any_llm_client.OpenAIConfig(
    url=pydantic.HttpUrl("http://127.0.0.1:11434/v1/chat/completions"),
    model_name="qwen2.5-coder:1.5b",
)


async def main() -> None:
    async with (
        httpx.AsyncClient() as httpx_client,
        any_llm_client.get_client(config, httpx_client=httpx_client).stream_llm_partial_responses(
            messages=[
                any_llm_client.Message(role="system", text="Ты — опытный ассистент"),
                any_llm_client.Message(role="user", text="Привет!"),
            ],
            temperature=0.1,
        ) as partial_messages,
    ):
        async for one_message in partial_messages:  # type(one_message) is str
            sys.stdout.write(f"\r{one_message}")
            sys.stdout.flush()


asyncio.run(main())

Note that this will yield partial growing message, not message chunks, for example: "Hi", "Hi there!", "Hi there! How can I help you?".

Other

Mock client

You can use a mock client for testing:

config = any_llm_client.MockLLMConfig(
    response_message=...,
    stream_messages=["Hi!"],
)
llm_client = any_llm_client.get_client(config, ...)

Using dynamic LLM config from environment with pydantic-settings

import os

import pydantic_settings

import any_llm_client


class Settings(pydantic_settings.BaseSettings):
    llm_model: any_llm_client.AnyLLMConfig


os.environ["LLM_MODEL"] = """{
    "api_type": "openai",
    "url": "http://127.0.0.1:11434/v1/chat/completions",
    "model_name": "qwen2.5-coder:1.5b"
}"""
settings = Settings()
client = any_llm_client.get_client(settings.llm_model, ...)

Using clients directly

The recommended way to get LLM client is to call any_llm_client.get_client(). This way you can easily swap LLM models. If you prefer, you can use any_llm_client.OpenAIClient or any_llm_client.YandexGPTClient directly:

config = any_llm_client.OpenAIConfig(
    url=pydantic.HttpUrl("https://api.openai.com/v1/chat/completions"),
    auth_token=os.environ["OPENAI_API_KEY"],
    model_name="gpt-4o-mini",
)
llm_client = any_llm_client.OpenAIClient(config, ...)

Errors

any_llm_client.LLMClient.request_llm_response() and any_llm_client.LLMClient.stream_llm_partial_responses() will raise any_llm_client.LLMError or any_llm_client.OutOfTokensOrSymbolsError when the LLM API responds with a failed HTTP status.

Retries

By default, requests are retried 3 times on HTTP status errors. You can change the retry behaviour by supplying request_retry parameter:

llm_client = any_llm_client.get_client(..., request_retry=any_llm_client.RequestRetryConfig(attempts=5, ...))

Timeouts and proxy

Configure timeouts or proxy directly in httpx.AsyncClient():

import httpx

import any_llm_client


async with httpx.AsyncClient(
    proxies={
        "https://api.openai.com": httpx.HTTPTransport(proxy="http://localhost:8030"),
    },
    timeout=httpx.Timeout(None, connect=5.0),
) as httpx_client:
    llm_client = any_llm_client.get_client(..., httpx_client=httpx_client)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

any_llm_client-0.3.0.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

any_llm_client-0.3.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file any_llm_client-0.3.0.tar.gz.

File metadata

  • Download URL: any_llm_client-0.3.0.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.4

File hashes

Hashes for any_llm_client-0.3.0.tar.gz
Algorithm Hash digest
SHA256 0232bb90ca3260b32153a7d0f8a6654a9f1d9fb21fb999cd9286c9c9b5f0ea2f
MD5 60e729cfbc227ab6e9da20976d7eefca
BLAKE2b-256 f3a213c378229c520956c8486f7a57443e89f2c3751951d07ded7af330d052e8

See more details on using hashes here.

File details

Details for the file any_llm_client-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for any_llm_client-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd2a6486dfdb90780598b41cf516e73ededb6ea2bebec68c8d61429a2c50b59c
MD5 827c38b4b622c4adb5c5771f582be867
BLAKE2b-256 e8f329edd23903c5e742407c9ba2f5c8e2512f30678a5e80f5eb94e9cb59eb63

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page