Skip to main content

Add your description here

Project description

any-llm-client

A unified and lightweight asynchronous Python API for communicating with LLMs.

Supports multiple providers, including OpenAI Chat Completions API (and any OpenAI-compatible API, such as Ollama and vLLM) and YandexGPT API.

How To Use

Before starting using any-llm-client, make sure you have it installed:

uv add any-llm-client
poetry add any-llm-client

Response API

Here's a full example that uses Ollama and Qwen2.5-Coder:

import asyncio

import any_llm_client


config = any_llm_client.OpenAIConfig(url="http://127.0.0.1:11434/v1/chat/completions", model_name="qwen2.5-coder:1.5b")


async def main() -> None:
    async with any_llm_client.get_client(config) as client:
        print(await client.request_llm_message("Кек, чо как вообще на нарах?"))


asyncio.run(main())

To use YandexGPT, replace the config:

config = any_llm_client.YandexGPTConfig(
    auth_header=os.environ["YANDEX_AUTH_HEADER"], folder_id=os.environ["YANDEX_FOLDER_ID"], model_name="yandexgpt"
)

Streaming API

LLMs often take long time to respond fully. Here's an example of streaming API usage:

import asyncio

import any_llm_client


config = any_llm_client.OpenAIConfig(url="http://127.0.0.1:11434/v1/chat/completions", model_name="qwen2.5-coder:1.5b")


async def main() -> None:
    async with (
        any_llm_client.get_client(config) as client,
        client.stream_llm_partial_messages("Кек, чо как вообще на нарах?") as partial_messages,
    ):
        async for message in partial_messages:
            print("\033[2J")  # clear screen
            print(message)


asyncio.run(main())

Note that this will yield partial growing message, not message chunks, for example: "Hi", "Hi there!", "Hi there! How can I help you?".

Passing chat history and temperature

You can pass list[any_llm_client.Message] instead of str as the first argument, and set temperature:

async with (
    any_llm_client.get_client(config) as client,
    client.stream_llm_partial_messages(
        messages=[
            any_llm_client.Message(role="system", text="Ты — опытный ассистент"),
            any_llm_client.Message(role="user", text="Кек, чо как вообще на нарах?"),
        ],
        temperature=1.0,
    ) as partial_messages,
):
    ...

Other

Mock client

You can use a mock client for testing:

config = any_llm_client.MockLLMConfig(
    response_message=...,
    stream_messages=["Hi!"],
)
client = any_llm_client.get_client(config, ...)

Using dynamic LLM config from environment with pydantic-settings

import os

import pydantic_settings

import any_llm_client


class Settings(pydantic_settings.BaseSettings):
    llm_model: any_llm_client.AnyLLMConfig


os.environ["LLM_MODEL"] = """{
    "api_type": "openai",
    "url": "http://127.0.0.1:11434/v1/chat/completions",
    "model_name": "qwen2.5-coder:1.5b"
}"""
settings = Settings()
client = any_llm_client.get_client(settings.llm_model, ...)

Using clients directly

The recommended way to get LLM client is to call any_llm_client.get_client(). This way you can easily swap LLM models. If you prefer, you can use any_llm_client.OpenAIClient or any_llm_client.YandexGPTClient directly:

config = any_llm_client.OpenAIConfig(
    url=pydantic.HttpUrl("https://api.openai.com/v1/chat/completions"),
    auth_token=os.environ["OPENAI_API_KEY"],
    model_name="gpt-4o-mini",
)
client = any_llm_client.OpenAIClient(config, ...)

Errors

any_llm_client.LLMClient.request_llm_message() and any_llm_client.LLMClient.stream_llm_partial_messages() will raise any_llm_client.LLMError or any_llm_client.OutOfTokensOrSymbolsError when the LLM API responds with a failed HTTP status.

Timeouts, proxy & other HTTP settings

Pass custom HTTPX client:

import httpx

import any_llm_client


async with any_llm_client.get_client(
    ...,
    httpx_client=httpx.AsyncClient(
        mounts={"https://api.openai.com": httpx.AsyncHTTPTransport(proxy="http://localhost:8030")},
        timeout=httpx.Timeout(None, connect=5.0),
    ),
) as client:
    ...

Retries

By default, requests are retried 3 times on HTTP status errors. You can change the retry behaviour by supplying request_retry parameter:

client = any_llm_client.get_client(..., request_retry=any_llm_client.RequestRetryConfig(attempts=5, ...))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

any_llm_client-1.0.2.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

any_llm_client-1.0.2-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file any_llm_client-1.0.2.tar.gz.

File metadata

  • Download URL: any_llm_client-1.0.2.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.4

File hashes

Hashes for any_llm_client-1.0.2.tar.gz
Algorithm Hash digest
SHA256 9a9eaadf979339f0e560a5135fbd87d6c09ede31910609d2ba8e125f56aa3adc
MD5 924568f536e2e62498683efe840f6d99
BLAKE2b-256 3b6e1a34d989b24df9420dff806afe7a16bd76f7b58ada32e4a483de0a9ce110

See more details on using hashes here.

File details

Details for the file any_llm_client-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for any_llm_client-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7df9e4e26792df899247d75f9b88a8cd485ea03c80baa6ece1a293f6ac0f508f
MD5 98be57229f41fbe99735b52cd63c811b
BLAKE2b-256 2618abd319fddf0801f439fb73f90e459e9ed2e140a9a43ccd3e2f3b29520eba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page