Skip to main content

Python client for the LiveLLM Server

Project description

LiveLLM Python Client

Python 3.10+ License: MIT

Python client library for the LiveLLM Server - a unified proxy for AI agent, audio, and transcription services.

Features

  • 🚀 Async-first - Built on httpx and websockets for high-performance operations
  • 🔒 Type-safe - Full type hints and Pydantic validation
  • 🎯 Multi-provider - OpenAI, Google, Anthropic, Groq, ElevenLabs
  • 🔄 Streaming - Real-time streaming for agent and audio
  • 🛠️ Flexible API - Use request objects or keyword arguments
  • 🎙️ Audio services - Text-to-speech and transcription
  • 🎤 Real-Time Transcription - WebSocket-based live audio transcription with bidirectional streaming
  • Fallback strategies - Sequential and parallel handling
  • 🧹 Auto cleanup - Context managers and garbage collection

Installation

pip install livellm

Or with development dependencies:

pip install livellm[testing]

Quick Start

import asyncio
from livellm import LivellmClient
from livellm.models import Settings, ProviderKind, TextMessage, MessageRole

async def main():
    # Initialize with automatic provider setup
    async with LivellmClient(
        base_url="http://localhost:8000",
        configs=[
            Settings(
                uid="openai",
                provider=ProviderKind.OPENAI,
                api_key="your-api-key"
            )
        ]
    ) as client:
        # Simple keyword arguments style (gen_config as kwargs)
        response = await client.agent_run(
            provider_uid="openai",
            model="gpt-4",
            messages=[TextMessage(role="user", content="Hello!")],
            temperature=0.7
        )
        print(response.output)

asyncio.run(main())

Configuration

Client Initialization

from livellm import LivellmClient
from livellm.models import Settings, ProviderKind

# Basic
client = LivellmClient(base_url="http://localhost:8000")

# With timeout and pre-configured providers
client = LivellmClient(
    base_url="http://localhost:8000",
    timeout=30.0,
    configs=[
        Settings(
            uid="openai",
            provider=ProviderKind.OPENAI,
            api_key="sk-...",
            base_url="https://api.openai.com/v1"  # Optional
        ),
        Settings(
            uid="anthropic",
            provider=ProviderKind.ANTHROPIC,
            api_key="sk-ant-...",
            blacklist_models=["claude-instant-1"]  # Optional
        )
    ]
)

Supported Providers

OPENAIGOOGLEANTHROPICGROQELEVENLABS

# Add provider dynamically
await client.update_config(Settings(
    uid="my-provider",
    provider=ProviderKind.OPENAI,
    api_key="your-api-key"
))

# List and delete
configs = await client.get_configs()
await client.delete_config("my-provider")

Usage Examples

Agent Services

Two Ways to Call Methods

All methods support two calling styles:

Style 1: Keyword arguments (kwargs become gen_config)

response = await client.agent_run(
    provider_uid="openai",
    model="gpt-4",
    messages=[TextMessage(role="user", content="Hello!")],
    temperature=0.7,
    max_tokens=500
)

Style 2: Request objects

from livellm.models import AgentRequest

response = await client.agent_run(
    AgentRequest(
        provider_uid="openai",
        model="gpt-4",
        messages=[TextMessage(role="user", content="Hello!")],
        gen_config={"temperature": 0.7, "max_tokens": 500}
    )
)

Basic Agent Run

from livellm.models import TextMessage

# Using kwargs (recommended for simplicity)
response = await client.agent_run(
    provider_uid="openai",
    model="gpt-4",
    messages=[
        TextMessage(role="system", content="You are helpful."),
        TextMessage(role="user", content="Explain quantum computing")
    ],
    temperature=0.7,
    max_tokens=500
)
print(f"Output: {response.output}")
print(f"Tokens: {response.usage.input_tokens} in, {response.usage.output_tokens} out")

Streaming Agent Response

# Streaming also supports both styles
stream = client.agent_run_stream(
    provider_uid="openai",
    model="gpt-4",
    messages=[TextMessage(role="user", content="Tell me a story")],
    temperature=0.8
)

async for chunk in stream:
    print(chunk.output, end="", flush=True)

Agent with Vision (Binary Messages)

import base64
from livellm.models import BinaryMessage

with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = await client.agent_run(
    provider_uid="openai",
    model="gpt-4-vision",
    messages=[
        BinaryMessage(
            role="user",
            content=image_data,
            mime_type="image/jpeg",
            caption="What's in this image?"
        )
    ]
)

Agent with Tools

from livellm.models import WebSearchInput, MCPStreamableServerInput, ToolKind

# Web search tool
response = await client.agent_run(
    provider_uid="openai",
    model="gpt-4",
    messages=[TextMessage(role="user", content="Latest AI news?")],
    tools=[WebSearchInput(
        kind=ToolKind.WEB_SEARCH,
        search_context_size="high"  # low, medium, or high
    )]
)

# MCP server tool
response = await client.agent_run(
    provider_uid="openai",
    model="gpt-4",
    messages=[TextMessage(role="user", content="Run custom tool")],
    tools=[MCPStreamableServerInput(
        kind=ToolKind.MCP_STREAMABLE_SERVER,
        url="http://mcp-server:8080",
        prefix="mcp_",
        timeout=15
    )]
)

Audio Services

Text-to-Speech

from livellm.models import SpeakMimeType

# Non-streaming
audio = await client.speak(
    provider_uid="openai",
    model="tts-1",
    text="Hello, world!",
    voice="alloy",
    mime_type=SpeakMimeType.MP3,
    sample_rate=24000,
    speed=1.0  # kwargs become gen_config
)
with open("output.mp3", "wb") as f:
    f.write(audio)

# Streaming
audio = bytes()
async for chunk in client.speak_stream(
    provider_uid="openai",
    model="tts-1",
    text="Hello, world!",
    voice="alloy",
    mime_type=SpeakMimeType.PCM,
    sample_rate=24000
):
    audio += chunk

# Save PCM as WAV
import wave
with wave.open("output.wav", "wb") as wf:
    wf.setnchannels(1)
    wf.setsampwidth(2)
    wf.setframerate(24000)
    wf.writeframes(audio)

Transcription

# Method 1: Multipart upload (kwargs style)
with open("audio.wav", "rb") as f:
    audio_bytes = f.read()

transcription = await client.transcribe(
    provider_uid="openai",
    file=("audio.wav", audio_bytes, "audio/wav"),
    model="whisper-1",
    language="en",  # Optional
    temperature=0.0  # kwargs become gen_config
)
print(f"Text: {transcription.text}")
print(f"Language: {transcription.language}")

# Method 2: JSON request object (base64-encoded)
import base64
from livellm.models import TranscribeRequest

audio_b64 = base64.b64encode(audio_bytes).decode("utf-8")
transcription = await client.transcribe(
    TranscribeRequest(
        provider_uid="openai",
        file=("audio.wav", audio_b64, "audio/wav"),
        model="whisper-1"
    )
)

Real-Time Transcription (WebSocket)

The realtime transcription API is available either directly via TranscriptionWsClient or through LivellmClient.realtime.transcription.

Using TranscriptionWsClient directly

import asyncio
from livellm import TranscriptionWsClient
from livellm.models import (
    TranscriptionInitWsRequest,
    TranscriptionAudioChunkWsRequest,
    SpeakMimeType,
)

async def transcribe_live_direct():
    base_url = "ws://localhost:8000"  # WebSocket base URL

    async with TranscriptionWsClient(base_url, timeout=30) as client:
        # Define audio source (file, microphone, stream, etc.)
        async def audio_source():
            with open("audio.pcm", "rb") as f:
                while chunk := f.read(4096):
                    yield TranscriptionAudioChunkWsRequest(audio=chunk)
                    await asyncio.sleep(0.1)  # Simulate real-time

        # Initialize transcription session
        init_request = TranscriptionInitWsRequest(
            provider_uid="openai",
            model="gpt-4o-mini-transcribe",
            language="en",  # or "auto" for detection
            input_sample_rate=24000,
            input_audio_format=SpeakMimeType.PCM,
            gen_config={},
        )

        # Stream audio and receive transcriptions
        async for response in client.start_session(init_request, audio_source()):
            print(f"Transcription: {response.transcription}")
            if response.is_end:
                print("Transcription complete!")
                break

asyncio.run(transcribe_live_direct())

Using LivellmClient.realtime.transcription (and running agents while listening)

import asyncio
from livellm import LivellmClient
from livellm.models import (
    TextMessage,
    TranscriptionInitWsRequest,
    TranscriptionAudioChunkWsRequest,
    SpeakMimeType,
)

async def transcribe_and_chat():
    # Central HTTP client; .realtime and .transcription expose WebSocket APIs
    client = LivellmClient(base_url="http://localhost:8000", timeout=30)

    async with client.realtime as realtime:
        async with realtime.transcription as t_client:
            async def audio_source():
                with open("audio.pcm", "rb") as f:
                    while chunk := f.read(4096):
                        yield TranscriptionAudioChunkWsRequest(audio=chunk)
                        await asyncio.sleep(0.1)

            init_request = TranscriptionInitWsRequest(
                provider_uid="openai",
                model="gpt-4o-mini-transcribe",
                language="en",
                input_sample_rate=24000,
                input_audio_format=SpeakMimeType.PCM,
                gen_config={},
            )

            # Listen for transcriptions and, for each chunk, run an agent request
            async for resp in t_client.start_session(init_request, audio_source()):
                print("User said:", resp.transcription)

                # You can call agent_run (or speak, etc.) while the transcription stream is active
                agent_response = await realtime.agent_run(
                    provider_uid="openai",
                    model="gpt-4",
                    messages=[
                        TextMessage(role="user", content=resp.transcription),
                    ],
                    temperature=0.7,
                )
                print("Agent:", agent_response.output)

                if resp.is_end:
                    print("Transcription session complete")
                    break

asyncio.run(transcribe_and_chat())

Supported Audio Formats:

  • PCM: 16-bit uncompressed (recommended)
  • μ-law: 8-bit telephony format (North America/Japan)
  • A-law: 8-bit telephony format (Europe/rest of world)

Use Cases:

  • 🎙️ Voice assistants and chatbots
  • 📝 Live captioning and subtitles
  • 🎤 Meeting transcription
  • 🗣️ Voice commands and control

See also:

Fallback Strategies

Handle failures automatically with sequential or parallel fallback:

from livellm.models import AgentRequest, AgentFallbackRequest, FallbackStrategy, TextMessage

messages = [TextMessage(role="user", content="Hello!")]

# Sequential: try each in order until one succeeds
response = await client.agent_run(
    AgentFallbackRequest(
        strategy=FallbackStrategy.SEQUENTIAL,
        requests=[
            AgentRequest(provider_uid="primary", model="gpt-4", messages=messages, tools=[]),
            AgentRequest(provider_uid="backup", model="claude-3", messages=messages, tools=[])
        ],
        timeout_per_request=30
    )
)

# Parallel: try all simultaneously, use first success
response = await client.agent_run(
    AgentFallbackRequest(
        strategy=FallbackStrategy.PARALLEL,
        requests=[
            AgentRequest(provider_uid="p1", model="gpt-4", messages=messages, tools=[]),
            AgentRequest(provider_uid="p2", model="claude-3", messages=messages, tools=[]),
            AgentRequest(provider_uid="p3", model="gemini-pro", messages=messages, tools=[])
        ],
        timeout_per_request=10
    )
)

# Also works for audio
from livellm.models import AudioFallbackRequest, SpeakRequest

audio = await client.speak(
    AudioFallbackRequest(
        strategy=FallbackStrategy.SEQUENTIAL,
        requests=[
            SpeakRequest(provider_uid="elevenlabs", model="turbo", text="Hi", 
                        voice="rachel", mime_type=SpeakMimeType.MP3, sample_rate=44100),
            SpeakRequest(provider_uid="openai", model="tts-1", text="Hi",
                        voice="alloy", mime_type=SpeakMimeType.MP3, sample_rate=44100)
        ]
    )
)

Resource Management

Recommended: Use context managers for automatic cleanup.

# ✅ Best: Context manager (auto cleanup)
async with LivellmClient(base_url="http://localhost:8000") as client:
    response = await client.ping()
# Configs deleted, connection closed automatically

# ✅ Good: Manual cleanup
client = LivellmClient(base_url="http://localhost:8000")
try:
    response = await client.ping()
finally:
    await client.cleanup()

# ⚠️ OK: Garbage collection (shows warning if configs exist)
client = LivellmClient(base_url="http://localhost:8000")
response = await client.ping()
# Cleaned up when object is destroyed

API Reference

Client Methods

Configuration

  • ping() - Health check
  • update_config(config) / update_configs(configs) - Add/update providers
  • get_configs() - List all configurations
  • delete_config(uid) - Remove provider

Agent

  • agent_run(request | **kwargs) - Run agent (blocking)
  • agent_run_stream(request | **kwargs) - Run agent (streaming)

Audio

  • speak(request | **kwargs) - Text-to-speech (blocking)
  • speak_stream(request | **kwargs) - Text-to-speech (streaming)
  • transcribe(request | **kwargs) - Speech-to-text

Real-Time Transcription (TranscriptionWsClient)

  • connect() - Establish WebSocket connection
  • disconnect() - Close WebSocket connection
  • start_session(init_request, audio_source) - Start bidirectional streaming transcription
  • async with client: - Auto connection management (recommended)

Cleanup

  • cleanup() - Release resources
  • async with client: - Auto cleanup (recommended)

Key Models

Core

  • Settings(uid, provider, api_key, base_url?, blacklist_models?) - Provider config
  • ProviderKind - OPENAI | GOOGLE | ANTHROPIC | GROQ | ELEVENLABS

Messages

  • TextMessage(role, content) - Text message
  • BinaryMessage(role, content, mime_type, caption?) - Image/audio message
  • MessageRole - USER | MODEL | SYSTEM (or use strings: "user", "model", "system")

Requests

  • AgentRequest(provider_uid, model, messages, tools?, gen_config?)
  • SpeakRequest(provider_uid, model, text, voice, mime_type, sample_rate, gen_config?)
  • TranscribeRequest(provider_uid, file, model, language?, gen_config?)
  • TranscriptionInitWsRequest(provider_uid, model, language?, input_sample_rate?, input_audio_format?, gen_config?)
  • TranscriptionAudioChunkWsRequest(audio) - Audio chunk for streaming

Tools

  • WebSearchInput(kind=ToolKind.WEB_SEARCH, search_context_size)
  • MCPStreamableServerInput(kind=ToolKind.MCP_STREAMABLE_SERVER, url, prefix?, timeout?)

Fallback

  • AgentFallbackRequest(strategy, requests, timeout_per_request?)
  • AudioFallbackRequest(strategy, requests, timeout_per_request?)
  • FallbackStrategy - SEQUENTIAL | PARALLEL

Responses

  • AgentResponse(output, usage{input_tokens, output_tokens}, ...)
  • TranscribeResponse(text, language)
  • TranscriptionWsResponse(transcription, is_end) - Real-time transcription result

Error Handling

import httpx

try:
    response = await client.agent_run(
        provider_uid="openai",
        model="gpt-4",
        messages=[TextMessage(role="user", content="Hi")]
    )
except httpx.HTTPStatusError as e:
    print(f"HTTP {e.response.status_code}: {e.response.text}")
except httpx.RequestError as e:
    print(f"Request failed: {e}")

Development

# Install with dev dependencies
pip install -e ".[testing]"

# Run tests
pytest tests/

# Type checking
mypy livellm

Requirements

  • Python 3.10+
  • httpx >= 0.27.0
  • pydantic >= 2.0.0
  • websockets >= 15.0.1

Documentation

Links

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livellm-1.4.5.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livellm-1.4.5-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file livellm-1.4.5.tar.gz.

File metadata

  • Download URL: livellm-1.4.5.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for livellm-1.4.5.tar.gz
Algorithm Hash digest
SHA256 05d25fa5f0b556c20717bbd2adba8fb1f596dd414ef4a141d2359f044aab200c
MD5 a8d4676f0e71c21760bc70e193d47576
BLAKE2b-256 d3083482f701d1df61be2eafe9c4e83063d011c5b38c15f573da64464f2e3495

See more details on using hashes here.

File details

Details for the file livellm-1.4.5-py3-none-any.whl.

File metadata

  • Download URL: livellm-1.4.5-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for livellm-1.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0fa4496abdb0e087340664a0405686ac9f7a1f06c85760de56021982a4132f6e
MD5 f7a8139ad1ef15ced6ce065788d20994
BLAKE2b-256 8acf9b73708599b36387a40970e4a5c78accb17bfb9d007a9534a9a1362522b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page