A lightweight CLI tool and OpenAI-compatible server for querying multiple Large Language Model (LLM) providers

These details have not been verified by PyPI

Project description

llms.py

Lightweight CLI, API and ChatGPT-like alternative to Open WebUI for accessing multiple LLMs, entirely offline, with all data kept private in browser storage.

Configure additional providers and models in llms.json

Mix and match local models with models from different API providers
Requests automatically routed to available providers that supports the requested model (in defined order)
Define free/cheapest/local providers first to save on costs
Any failures are automatically retried on the next available provider

Features

Lightweight: Single llms.py Python file with single aiohttp dependency (Pillow optional)
Multi-Provider Support: OpenRouter, Ollama, Anthropic, Google, OpenAI, Grok, Groq, Qwen, Z.ai, Mistral
OpenAI-Compatible API: Works with any client that supports OpenAI's chat completion API
Built-in Analytics: Built-in analytics UI to visualize costs, requests, and token usage
GitHub OAuth: Optionally Secure your web UI and restrict access to specified GitHub Users
Configuration Management: Easy provider enable/disable and configuration management
CLI Interface: Simple command-line interface for quick interactions
Server Mode: Run an OpenAI-compatible HTTP server at http://localhost:{PORT}/v1/chat/completions
Image Support: Process images through vision-capable models
- Auto resizes and converts to webp if exceeds configured limits
Audio Support: Process audio through audio-capable models
Custom Chat Templates: Configurable chat completion request templates for different modalities
Auto-Discovery: Automatically discover available Ollama models
Unified Models: Define custom model names that map to different provider-specific names
Multi-Model Support: Support for over 160+ different LLMs

llms.py UI

Access all your local all remote LLMs with a single ChatGPT-like UI:

Dark Mode Support

Monthly Costs Analysis

Monthly Token Usage (Dark Mode)

Monthly Activity Log

More Features and Screenshots.

Check Provider Reliability and Response Times

Check the status of configured providers to test if they're configured correctly, reachable and what their response times is for the simplest 1+1= request:

# Check all models for a provider:
llms --check groq

# Check specific models for a provider:
llms --check groq kimi-k2 llama4:400b gpt-oss:120b

As they're a good indicator for the reliability and speed you can expect from different providers we've created a test-providers.yml GitHub Action to test the response times for all configured providers and models, the results of which will be frequently published to /checks/latest.txt

Change Log

v2.0.30 (2025-11-01)

Improved Responsive Layout with collapsible Sidebar
Watching config files for changes and auto-reloading
Add cancel button to cancel pending request
Return focus to textarea after request completes
Clicking outside model or system prompt selector will collapse it
Clicking on selected item no longer deselects it
Support VERBOSE=1 for enabling --verbose mode (useful in Docker)

v2.0.28 (2025-10-31)

Dark Mode
Drag n' Drop files in Message prompt
Copy & Paste files in Message prompt
Support for GitHub OAuth and optional restrict access to specified Users
Support for Docker and Docker Compose

llms.py Releases

Installation

Using pip

pip install llms-py

Using Docker

Quick Start

1. Set API Keys

Set environment variables for the providers you want to use:

export OPENROUTER_API_KEY="..."

Provider	Variable	Description	Example
openrouter_free	`OPENROUTER_API_KEY`	OpenRouter FREE models API key	`sk-or-...`
groq	`GROQ_API_KEY`	Groq API key	`gsk_...`
google_free	`GOOGLE_FREE_API_KEY`	Google FREE API key	`AIza...`
codestral	`CODESTRAL_API_KEY`	Codestral API key	`...`
ollama	N/A	No API key required
openrouter	`OPENROUTER_API_KEY`	OpenRouter API key	`sk-or-...`
google	`GOOGLE_API_KEY`	Google API key	`AIza...`
anthropic	`ANTHROPIC_API_KEY`	Anthropic API key	`sk-ant-...`
openai	`OPENAI_API_KEY`	OpenAI API key	`sk-...`
grok	`GROK_API_KEY`	Grok (X.AI) API key	`xai-...`
qwen	`DASHSCOPE_API_KEY`	Qwen (Alibaba) API key	`sk-...`
z.ai	`ZAI_API_KEY`	Z.ai API key	`sk-...`
mistral	`MISTRAL_API_KEY`	Mistral API key	`...`

2. Run Server

Start the UI and an OpenAI compatible API on port 8000:

llms --serve 8000

Launches UI at http://localhost:8000 and OpenAI Endpoint at http://localhost:8000/v1/chat/completions.

To see detailed request/response logging, add --verbose:

llms --serve 8000 --verbose

Use llms.py CLI

llms "What is the capital of France?"

Enable Providers

Any providers that have their API Keys set and enabled in llms.json are automatically made available.

Providers can be enabled or disabled in the UI at runtime next to the model selector, or on the command line:

# Disable free providers with free models and free tiers
llms --disable openrouter_free codestral google_free groq

# Enable paid providers
llms --enable openrouter anthropic google openai grok z.ai qwen mistral

Using Docker

a) Simple - Run in a Docker container:

Run the server on port 8000:

docker run -p 8000:8000 -e GROQ_API_KEY=$GROQ_API_KEY ghcr.io/servicestack/llms:latest

Get the latest version:

docker pull ghcr.io/servicestack/llms:latest

Use custom llms.json and ui.json config files outside of the container (auto created if they don't exist):

docker run -p 8000:8000 -e GROQ_API_KEY=$GROQ_API_KEY \
  -v ~/.llms:/home/llms/.llms \
  ghcr.io/servicestack/llms:latest

b) Recommended - Use Docker Compose:

Download and use docker-compose.yml:

curl -O https://raw.githubusercontent.com/ServiceStack/llms/refs/heads/main/docker-compose.yml

Update API Keys in docker-compose.yml then start the server:

docker-compose up -d

c) Build and run local Docker image from source:

git clone https://github.com/ServiceStack/llms

docker-compose -f docker-compose.local.yml up -d --build

After the container starts, you can access the UI and API at http://localhost:8000.

See DOCKER.md for detailed instructions on customizing configuration files.

GitHub OAuth Authentication

llms.py supports optional GitHub OAuth authentication to secure your web UI and API endpoints. When enabled, users must sign in with their GitHub account before accessing the application.

{
    "auth": {
        "enabled": true,
        "github": {
            "client_id": "$GITHUB_CLIENT_ID",
            "client_secret": "$GITHUB_CLIENT_SECRET",
            "redirect_uri": "http://localhost:8000/auth/github/callback",
            "restrict_to": "$GITHUB_USERS"
        }
    }
}

GITHUB_USERS is optional but if set will only allow access to the specified users.

See GITHUB_OAUTH_SETUP.md for detailed setup instructions.

Configuration

The configuration file llms.json is saved to ~/.llms/llms.json and defines available providers, models, and default settings. If it doesn't exist, llms.json is auto created with the latest configuration, so you can re-create it by deleting your local config (e.g. rm -rf ~/.llms).

Key sections:

Defaults

headers: Common HTTP headers for all requests
text: Default chat completion request template for text prompts
image: Default chat completion request template for image prompts
audio: Default chat completion request template for audio prompts
file: Default chat completion request template for file prompts
check: Check request template for testing provider connectivity
limits: Override Request size limits
convert: Max image size and length limits and auto conversion settings

Providers

Each provider configuration includes:

enabled: Whether the provider is active
type: Provider class (OpenAiProvider, GoogleProvider, etc.)
api_key: API key (supports environment variables with $VAR_NAME)
base_url: API endpoint URL
models: Model name mappings (local name → provider name)
pricing: Pricing per token (input/output) for each model
default_pricing: Default pricing if not specified in pricing
check: Check request template for testing provider connectivity

Command Line Usage

Basic Chat

# Simple question
llms "Explain quantum computing"

# With specific model
llms -m gemini-2.5-pro "Write a Python function to sort a list"
llms -m grok-4 "Explain this code with humor"
llms -m qwen3-max "Translate this to Chinese"

# With system prompt
llms -s "You are a helpful coding assistant" "How do I reverse a string in Python?"

# With image (vision models)
llms --image image.jpg "What's in this image?"
llms --image https://example.com/photo.png "Describe this photo"

# Display full JSON Response
llms "Explain quantum computing" --raw

Using a Chat Template

By default llms uses the defaults/text chat completion request defined in llms.json.

You can instead use a custom chat completion request with --chat, e.g:

# Load chat completion request from JSON file
llms --chat request.json

# Override user message
llms --chat request.json "New user message"

# Override model
llms -m kimi-k2 --chat request.json

Example request.json:

{
  "model": "kimi-k2",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": ""}
  ],
  "temperature": 0.7,
  "max_tokens": 150
}

Image Requests

Send images to vision-capable models using the --image option:

# Use defaults/image Chat Template (Describe the key features of the input image)
llms --image ./screenshot.png

# Local image file
llms --image ./screenshot.png "What's in this image?"

# Remote image URL
llms --image https://example.org/photo.jpg "Describe this photo"

# Data URI
llms --image "data:image/png;base64,$(base64 -w 0 image.png)" "Describe this image"

# With a specific vision model
llms -m gemini-2.5-flash --image chart.png "Analyze this chart"
llms -m qwen2.5vl --image document.jpg "Extract text from this document"

# Combined with system prompt
llms -s "You are a data analyst" --image graph.png "What trends do you see?"

# With custom chat template
llms --chat image-request.json --image photo.jpg

Example of image-request.json:

{
    "model": "qwen2.5vl",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": ""
                    }
                },
                {
                    "type": "text",
                    "text": "Caption this image"
                }
            ]
        }
    ]
}

Supported image formats: PNG, WEBP, JPG, JPEG, GIF, BMP, TIFF, ICO

Image sources:

Local files: Absolute paths (/path/to/image.jpg) or relative paths (./image.png, ../image.jpg)
Remote URLs: HTTP/HTTPS URLs are automatically downloaded
Data URIs: Base64-encoded images (data:image/png;base64,...)

Images are automatically processed and converted to base64 data URIs before being sent to the model.

Vision-Capable Models

Popular models that support image analysis:

OpenAI: GPT-4o, GPT-4o-mini, GPT-4.1
Anthropic: Claude Sonnet 4.0, Claude Opus 4.1
Google: Gemini 2.5 Pro, Gemini Flash
Qwen: Qwen2.5-VL, Qwen3-VL, QVQ-max
Ollama: qwen2.5vl, llava

Images are automatically downloaded and converted to base64 data URIs.

Audio Requests

Send audio files to audio-capable models using the --audio option:

# Use defaults/audio Chat Template (Transcribe the audio)
llms --audio ./recording.mp3

# Local audio file
llms --audio ./meeting.wav "Summarize this meeting recording"

# Remote audio URL
llms --audio https://example.org/podcast.mp3 "What are the key points discussed?"

# With a specific audio model
llms -m gpt-4o-audio-preview --audio interview.mp3 "Extract the main topics"
llms -m gemini-2.5-flash --audio interview.mp3 "Extract the main topics"

# Combined with system prompt
llms -s "You're a transcription specialist" --audio talk.mp3 "Provide a detailed transcript"

# With custom chat template
llms --chat audio-request.json --audio speech.wav

Example of audio-request.json:

{
    "model": "gpt-4o-audio-preview",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "",
                        "format": "mp3"
                    }
                },
                {
                    "type": "text",
                    "text": "Please transcribe this audio"
                }
            ]
        }
    ]
}

Supported audio formats: MP3, WAV

Audio sources:

Local files: Absolute paths (/path/to/audio.mp3) or relative paths (./audio.wav, ../recording.m4a)
Remote URLs: HTTP/HTTPS URLs are automatically downloaded
Base64 Data: Base64-encoded audio

Audio files are automatically processed and converted to base64 data before being sent to the model.

Audio-Capable Models

Popular models that support audio processing:

OpenAI: gpt-4o-audio-preview
Google: gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

Audio files are automatically downloaded and converted to base64 data URIs with appropriate format detection.

File Requests

Send documents (e.g. PDFs) to file-capable models using the --file option:

# Use defaults/file Chat Template (Summarize the document)
llms --file ./docs/handbook.pdf

# Local PDF file
llms --file ./docs/policy.pdf "Summarize the key changes"

# Remote PDF URL
llms --file https://example.org/whitepaper.pdf "What are the main findings?"

# With specific file-capable models
llms -m gpt-5               --file ./policy.pdf   "Summarize the key changes"
llms -m gemini-flash-latest --file ./report.pdf   "Extract action items"
llms -m qwen2.5vl           --file ./manual.pdf   "List key sections and their purpose"

# Combined with system prompt
llms -s "You're a compliance analyst" --file ./policy.pdf "Identify compliance risks"

# With custom chat template
llms --chat file-request.json --file ./docs/handbook.pdf

Example of file-request.json:

{
  "model": "gpt-5",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "file",
          "file": {
            "filename": "",
            "file_data": ""
          }
        },
        {
          "type": "text",
          "text": "Please summarize this document"
        }
      ]
    }
  ]
}

Supported file formats: PDF

Other document types may work depending on the model/provider.

File sources:

Local files: Absolute paths (/path/to/file.pdf) or relative paths (./file.pdf, ../file.pdf)
Remote URLs: HTTP/HTTPS URLs are automatically downloaded
Base64/Data URIs: Inline data:application/pdf;base64,... is supported

Files are automatically downloaded (for URLs) and converted to base64 data URIs before being sent to the model.

File-Capable Models

Popular multi-modal models that support file (PDF) inputs:

OpenAI: gpt-5, gpt-5-mini, gpt-4o, gpt-4o-mini
Google: gemini-flash-latest, gemini-2.5-flash-lite
Grok: grok-4-fast (OpenRouter)
Qwen: qwen2.5vl, qwen3-max, qwen3-vl:235b, qwen3-coder, qwen3-coder-flash (OpenRouter)
Others: kimi-k2, glm-4.5-air, deepseek-v3.1:671b, llama4:400b, llama3.3:70b, mai-ds-r1, nemotron-nano:9b

Server Mode

Run as an OpenAI-compatible HTTP server:

# Start server on port 8000
llms --serve 8000

The server exposes a single endpoint:

POST /v1/chat/completions - OpenAI-compatible chat completions

Example client usage:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Configuration Management

# List enabled providers and models
llms --list
llms ls

# List specific providers
llms ls ollama
llms ls google anthropic

# Enable providers
llms --enable openrouter
llms --enable anthropic google_free groq

# Disable providers
llms --disable ollama
llms --disable openai anthropic

# Set default model
llms --default grok-4

Update

pip install llms-py --upgrade

Advanced Options

# Use custom config file
llms --config /path/to/config.json "Hello"

# Get raw JSON response
llms --raw "What is 2+2?"

# Enable verbose logging
llms --verbose "Tell me a joke"

# Custom log prefix
llms --verbose --logprefix "[DEBUG] " "Hello world"

# Set default model (updates config file)
llms --default grok-4

# Pass custom parameters to chat request (URL-encoded)
llms --args "temperature=0.7&seed=111" "What is 2+2?"

# Multiple parameters with different types
llms --args "temperature=0.5&max_completion_tokens=50" "Tell me a joke"

# URL-encoded special characters (stop sequences)
llms --args "stop=Two,Words" "Count to 5"

# Combine with other options
llms --system "You are helpful" --args "temperature=0.3" --raw "Hello"

Custom Parameters with `--args`

The --args option allows you to pass URL-encoded parameters to customize the chat request sent to LLM providers:

Parameter Types:

Floats: temperature=0.7, frequency_penalty=0.2
Integers: max_completion_tokens=100
Booleans: store=true, verbose=false, logprobs=true
Strings: stop=one
Lists: stop=two,words

Common Parameters:

temperature: Controls randomness (0.0 to 2.0)
max_completion_tokens: Maximum tokens in response
seed: For reproducible outputs
top_p: Nucleus sampling parameter
stop: Stop sequences (URL-encode special chars)
store: Whether or not to store the output
frequency_penalty: Penalize new tokens based on frequency
presence_penalty: Penalize new tokens based on presence
logprobs: Include log probabilities in response
parallel_tool_calls: Enable parallel tool calls
prompt_cache_key: Cache key for prompt
reasoning_effort: Reasoning effort (low, medium, high, *minimal, *none, *default)
safety_identifier: A string that uniquely identifies each user
seed: For reproducible outputs
service_tier: Service tier (free, standard, premium, *default)
top_logprobs: Number of top logprobs to return
top_p: Nucleus sampling parameter
verbosity: Verbosity level (0, 1, 2, 3, *default)
enable_thinking: Enable thinking mode (Qwen)
stream: Enable streaming responses

Default Model Configuration

The --default MODEL option allows you to set the default model used for all chat completions. This updates the defaults.text.model field in your configuration file:

# Set default model to gpt-oss
llms --default gpt-oss:120b

# Set default model to Claude Sonnet
llms --default claude-sonnet-4-0

# The model must be available in your enabled providers
llms --default gemini-2.5-pro

When you set a default model:

The configuration file (~/.llms/llms.json) is automatically updated
The specified model becomes the default for all future chat requests
The model must exist in your currently enabled providers
You can still override the default using -m MODEL for individual requests

Updating llms.py

pip install llms-py --upgrade

Beautiful rendered Markdown

Pipe Markdown output to glow to beautifully render it in the terminal:

llms "Explain quantum computing" | glow

Supported Providers

Any OpenAI-compatible providers and their models can be added by configuring them in llms.json. By default only AI Providers with free tiers are enabled which will only be "available" if their API Key is set.

You can list the available providers, their models and which are enabled or disabled with:

llms ls

They can be enabled/disabled in your llms.json file or with:

llms --enable <provider>
llms --disable <provider>

For a provider to be available, they also require their API Key configured in either your Environment Variables or directly in your llms.json.

Environment Variables

Provider	Variable	Description	Example
openrouter_free	`OPENROUTER_API_KEY`	OpenRouter FREE models API key	`sk-or-...`
groq	`GROQ_API_KEY`	Groq API key	`gsk_...`
google_free	`GOOGLE_FREE_API_KEY`	Google FREE API key	`AIza...`
codestral	`CODESTRAL_API_KEY`	Codestral API key	`...`
ollama	N/A	No API key required
openrouter	`OPENROUTER_API_KEY`	OpenRouter API key	`sk-or-...`
google	`GOOGLE_API_KEY`	Google API key	`AIza...`
anthropic	`ANTHROPIC_API_KEY`	Anthropic API key	`sk-ant-...`
openai	`OPENAI_API_KEY`	OpenAI API key	`sk-...`
grok	`GROK_API_KEY`	Grok (X.AI) API key	`xai-...`
qwen	`DASHSCOPE_API_KEY`	Qwen (Alibaba) API key	`sk-...`
z.ai	`ZAI_API_KEY`	Z.ai API key	`sk-...`
mistral	`MISTRAL_API_KEY`	Mistral API key	`...`

OpenAI

Type: OpenAiProvider
Models: GPT-5, GPT-5 Codex, GPT-4o, GPT-4o-mini, o3, etc.
Features: Text, images, function calling

export OPENAI_API_KEY="your-key"
llms --enable openai

Anthropic (Claude)

Type: OpenAiProvider
Models: Claude Opus 4.1, Sonnet 4.0, Haiku 3.5, etc.
Features: Text, images, large context windows

export ANTHROPIC_API_KEY="your-key"
llms --enable anthropic

Google Gemini

Type: GoogleProvider
Models: Gemini 2.5 Pro, Flash, Flash-Lite
Features: Text, images, safety settings

export GOOGLE_API_KEY="your-key"
llms --enable google_free

OpenRouter

Type: OpenAiProvider
Models: 100+ models from various providers
Features: Access to latest models, free tier available

export OPENROUTER_API_KEY="your-key"
llms --enable openrouter

Grok (X.AI)

Type: OpenAiProvider
Models: Grok-4, Grok-3, Grok-3-mini, Grok-code-fast-1, etc.
Features: Real-time information, humor, uncensored responses

export GROK_API_KEY="your-key"
llms --enable grok

Groq

Type: OpenAiProvider
Models: Llama 3.3, Gemma 2, Kimi K2, etc.
Features: Fast inference, competitive pricing

export GROQ_API_KEY="your-key"
llms --enable groq

Ollama (Local)

Type: OllamaProvider
Models: Auto-discovered from local Ollama installation
Features: Local inference, privacy, no API costs

# Ollama must be running locally
llms --enable ollama

Qwen (Alibaba Cloud)

Type: OpenAiProvider
Models: Qwen3-max, Qwen-max, Qwen-plus, Qwen2.5-VL, QwQ-plus, etc.
Features: Multilingual, vision models, coding, reasoning, audio processing

export DASHSCOPE_API_KEY="your-key"
llms --enable qwen

Z.ai

Type: OpenAiProvider
Models: GLM-4.6, GLM-4.5, GLM-4.5-air, GLM-4.5-x, GLM-4.5-airx, GLM-4.5-flash, GLM-4:32b
Features: Advanced language models with strong reasoning capabilities

export ZAI_API_KEY="your-key"
llms --enable z.ai

Mistral

Type: OpenAiProvider
Models: Mistral Large, Codestral, Pixtral, etc.
Features: Code generation, multilingual

export MISTRAL_API_KEY="your-key"
llms --enable mistral

Codestral

Type: OpenAiProvider
Models: Codestral
Features: Code generation

export CODESTRAL_API_KEY="your-key"
llms --enable codestral

Model Routing

The tool automatically routes requests to the first available provider that supports the requested model. If a provider fails, it tries the next available provider with that model.

Example: If both OpenAI and OpenRouter support kimi-k2, the request will first try OpenRouter (free), then fall back to Groq than OpenRouter (Paid) if requests fails.

Configuration Examples

Minimal Configuration

{
  "defaults": {
    "headers": {"Content-Type": "application/json"},
    "text": {
      "model": "kimi-k2",
      "messages": [{"role": "user", "content": ""}]
    }
  },
  "providers": {
    "groq": {
      "enabled": true,
      "type": "OpenAiProvider",
      "base_url": "https://api.groq.com/openai",
      "api_key": "$GROQ_API_KEY",
      "models": {
        "llama3.3:70b": "llama-3.3-70b-versatile",
        "llama4:109b": "meta-llama/llama-4-scout-17b-16e-instruct",
        "llama4:400b": "meta-llama/llama-4-maverick-17b-128e-instruct",
        "kimi-k2": "moonshotai/kimi-k2-instruct-0905",
        "gpt-oss:120b": "openai/gpt-oss-120b",
        "gpt-oss:20b": "openai/gpt-oss-20b",
        "qwen3:32b": "qwen/qwen3-32b"
      }
    }
  }
}

Multi-Provider Setup

{
  "providers": {
    "openrouter": {
      "enabled": false,
      "type": "OpenAiProvider",
      "base_url": "https://openrouter.ai/api",
      "api_key": "$OPENROUTER_API_KEY",
      "models": {
        "grok-4": "x-ai/grok-4",
        "glm-4.5-air": "z-ai/glm-4.5-air",
        "kimi-k2": "moonshotai/kimi-k2",
        "deepseek-v3.1:671b": "deepseek/deepseek-chat",
        "llama4:400b": "meta-llama/llama-4-maverick"
      }
    },
    "anthropic": {
      "enabled": false,
      "type": "OpenAiProvider",
      "base_url": "https://api.anthropic.com",
      "api_key": "$ANTHROPIC_API_KEY",
      "models": {
        "claude-sonnet-4-0": "claude-sonnet-4-0"
      }
    },
    "ollama": {
      "enabled": false,
      "type": "OllamaProvider",
      "base_url": "http://localhost:11434",
      "models": {},
      "all_models": true
    }
  }
}

Usage

usage: llms [-h] [--config FILE] [-m MODEL] [--chat REQUEST] [-s PROMPT] [--image IMAGE] [--audio AUDIO] [--file FILE]
            [--args PARAMS] [--raw] [--list] [--check PROVIDER] [--serve PORT] [--enable PROVIDER] [--disable PROVIDER]
            [--default MODEL] [--init] [--root PATH] [--logprefix PREFIX] [--verbose]

llms v2.0.24

options:
  -h, --help            show this help message and exit
  --config FILE         Path to config file
  -m, --model MODEL     Model to use
  --chat REQUEST        OpenAI Chat Completion Request to send
  -s, --system PROMPT   System prompt to use for chat completion
  --image IMAGE         Image input to use in chat completion
  --audio AUDIO         Audio input to use in chat completion
  --file FILE           File input to use in chat completion
  --args PARAMS         URL-encoded parameters to add to chat request (e.g. "temperature=0.7&seed=111")
  --raw                 Return raw AI JSON response
  --list                Show list of enabled providers and their models (alias ls provider?)
  --check PROVIDER      Check validity of models for a provider
  --serve PORT          Port to start an OpenAI Chat compatible server on
  --enable PROVIDER     Enable a provider
  --disable PROVIDER    Disable a provider
  --default MODEL       Configure the default model to use
  --init                Create a default llms.json
  --root PATH           Change root directory for UI files
  --logprefix PREFIX    Prefix used in log messages
  --verbose             Verbose output

Docker Deployment

Quick Start with Docker

The easiest way to run llms-py is using Docker:

# Using docker-compose (recommended)
docker-compose up -d

# Or pull and run directly
docker run -p 8000:8000 \
  -e OPENROUTER_API_KEY="your-key" \
  ghcr.io/servicestack/llms:latest

Docker Images

Pre-built Docker images are automatically published to GitHub Container Registry:

Latest stable: ghcr.io/servicestack/llms:latest
Specific version: ghcr.io/servicestack/llms:v2.0.24
Main branch: ghcr.io/servicestack/llms:main

Environment Variables

Pass API keys as environment variables:

docker run -p 8000:8000 \
  -e OPENROUTER_API_KEY="sk-or-..." \
  -e GROQ_API_KEY="gsk_..." \
  -e GOOGLE_FREE_API_KEY="AIza..." \
  -e ANTHROPIC_API_KEY="sk-ant-..." \
  -e OPENAI_API_KEY="sk-..." \
  ghcr.io/servicestack/llms:latest

Using docker-compose

Create a docker-compose.yml file (or use the one in the repository):

version: '3.8'

services:
  llms:
    image: ghcr.io/servicestack/llms:latest
    ports:
      - "8000:8000"
    environment:
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
      - GROQ_API_KEY=${GROQ_API_KEY}
      - GOOGLE_FREE_API_KEY=${GOOGLE_FREE_API_KEY}
    volumes:
      - llms-data:/home/llms/.llms
    restart: unless-stopped

volumes:
  llms-data:

Create a .env file with your API keys:

OPENROUTER_API_KEY=sk-or-...
GROQ_API_KEY=gsk_...
GOOGLE_FREE_API_KEY=AIza...

Start the service:

docker-compose up -d

Building Locally

Build the Docker image from source:

# Using the build script
./docker-build.sh

# Or manually
docker build -t llms-py:latest .

# Run your local build
docker run -p 8000:8000 \
  -e OPENROUTER_API_KEY="your-key" \
  llms-py:latest

Volume Mounting

To persist configuration and analytics data between container restarts:

# Using a named volume (recommended)
docker run -p 8000:8000 \
  -v llms-data:/home/llms/.llms \
  -e OPENROUTER_API_KEY="your-key" \
  ghcr.io/servicestack/llms:latest

# Or mount a local directory
docker run -p 8000:8000 \
  -v $(pwd)/llms-config:/home/llms/.llms \
  -e OPENROUTER_API_KEY="your-key" \
  ghcr.io/servicestack/llms:latest

Custom Configuration Files

Customize llms-py behavior by providing your own llms.json and ui.json files:

Option 1: Mount a directory with custom configs

# Create config directory with your custom files
mkdir -p config
# Add your custom llms.json and ui.json to config/

# Mount the directory
docker run -p 8000:8000 \
  -v $(pwd)/config:/home/llms/.llms \
  -e OPENROUTER_API_KEY="your-key" \
  ghcr.io/servicestack/llms:latest

Option 2: Mount individual config files

docker run -p 8000:8000 \
  -v $(pwd)/my-llms.json:/home/llms/.llms/llms.json:ro \
  -v $(pwd)/my-ui.json:/home/llms/.llms/ui.json:ro \
  -e OPENROUTER_API_KEY="your-key" \
  ghcr.io/servicestack/llms:latest

With docker-compose:

volumes:
  # Use local directory
  - ./config:/home/llms/.llms

  # Or mount individual files
  # - ./my-llms.json:/home/llms/.llms/llms.json:ro
  # - ./my-ui.json:/home/llms/.llms/ui.json:ro

The container will auto-create default config files on first run if they don't exist. You can customize these to:

Enable/disable specific providers
Add or remove models
Configure API endpoints
Set custom pricing
Customize chat templates
Configure UI settings

See DOCKER.md for detailed configuration examples.

Custom Port

Change the port mapping to run on a different port:

# Run on port 3000 instead of 8000
docker run -p 3000:8000 \
  -e OPENROUTER_API_KEY="your-key" \
  ghcr.io/servicestack/llms:latest

Docker CLI Usage

You can also use the Docker container for CLI commands:

# Run a single query
docker run --rm \
  -e OPENROUTER_API_KEY="your-key" \
  ghcr.io/servicestack/llms:latest \
  llms "What is the capital of France?"

# List available models
docker run --rm \
  -e OPENROUTER_API_KEY="your-key" \
  ghcr.io/servicestack/llms:latest \
  llms --list

# Check provider status
docker run --rm \
  -e GROQ_API_KEY="your-key" \
  ghcr.io/servicestack/llms:latest \
  llms --check groq

Health Checks

The Docker image includes a health check that verifies the server is responding:

# Check container health
docker ps

# View health check logs
docker inspect --format='{{json .State.Health}}' llms-server

Multi-Architecture Support

The Docker images support multiple architectures:

linux/amd64 (x86_64)
linux/arm64 (ARM64/Apple Silicon)

Docker will automatically pull the correct image for your platform.

Troubleshooting

Common Issues

Config file not found

# Initialize default config
llms --init

# Or specify custom path
llms --config ./my-config.json

No providers enabled

# Check status
llms --list

# Enable providers
llms --enable google anthropic

API key issues

# Check environment variables
echo $ANTHROPIC_API_KEY

# Enable verbose logging
llms --verbose "test"

Model not found

# List available models
llms --list

# Check provider configuration
llms ls openrouter

Debug Mode

Enable verbose logging to see detailed request/response information:

llms --verbose --logprefix "[DEBUG] " "Hello"

This shows:

Enabled providers
Model routing decisions
HTTP request details
Error messages with stack traces

Development

Project Structure

llms/main.py - Main script with CLI and server functionality
llms/llms.json - Default configuration file
llms/ui.json - UI configuration file
requirements.txt - Python dependencies, required: aiohttp, optional: Pillow

Provider Classes

OpenAiProvider - Generic OpenAI-compatible provider
OllamaProvider - Ollama-specific provider with model auto-discovery
GoogleProvider - Google Gemini with native API format
GoogleOpenAiProvider - Google Gemini via OpenAI-compatible endpoint

Adding New Providers

Create a provider class inheriting from OpenAiProvider
Implement provider-specific authentication and formatting
Add provider configuration to llms.json
Update initialization logic in init_llms()

Contributing

Contributions are welcome! Please submit a PR to add support for any missing OpenAI-compatible providers.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

3.0.42

Mar 22, 2026

3.0.41

Mar 3, 2026

3.0.40

Mar 2, 2026

3.0.39

Mar 1, 2026

3.0.38

Feb 27, 2026

3.0.37

Feb 27, 2026

3.0.36

Feb 27, 2026

3.0.35

Feb 25, 2026

3.0.34

Feb 18, 2026

3.0.33

Feb 15, 2026

3.0.32

Feb 9, 2026

3.0.31

Feb 9, 2026

3.0.30

Feb 8, 2026

3.0.29

Feb 5, 2026

3.0.28

Feb 5, 2026

3.0.27

Feb 3, 2026

3.0.26

Feb 3, 2026

3.0.25

Jan 30, 2026

3.0.24

Jan 30, 2026

3.0.23

Jan 30, 2026

3.0.22

Jan 29, 2026

3.0.21

Jan 28, 2026

3.0.20

Jan 28, 2026

3.0.19

Jan 27, 2026

3.0.18

Jan 27, 2026

3.0.17

Jan 27, 2026

3.0.16

Jan 26, 2026

3.0.15

Jan 25, 2026

3.0.14

Jan 24, 2026

3.0.13

Jan 21, 2026

3.0.12

Jan 21, 2026

3.0.11

Jan 20, 2026

3.0.10

Jan 19, 2026

3.0.9

Jan 19, 2026

3.0.8

Jan 18, 2026

3.0.7

Jan 15, 2026

3.0.6

Jan 14, 2026

3.0.5

Jan 13, 2026

3.0.4

Jan 13, 2026

3.0.3

Jan 12, 2026

3.0.2

Jan 10, 2026

3.0.1

Jan 7, 2026

3.0.0

Jan 7, 2026

3.0.0b10 pre-release

Jan 7, 2026

3.0.0b9 pre-release

Jan 5, 2026

3.0.0b8 pre-release

Jan 5, 2026

3.0.0b7 pre-release

Dec 31, 2025

3.0.0b6 pre-release

Dec 23, 2025

3.0.0b5 pre-release

Dec 23, 2025

3.0.0b4 pre-release

Dec 22, 2025

3.0.0b3 pre-release

Dec 22, 2025

3.0.0b2 pre-release

Dec 17, 2025

3.0.0b1 pre-release

Dec 15, 2025

2.0.35

Nov 19, 2025

2.0.34

Nov 16, 2025

This version

2.0.33

Nov 4, 2025

2.0.32

Nov 3, 2025

2.0.31

Nov 3, 2025

2.0.30

Nov 1, 2025

2.0.29

Nov 1, 2025

2.0.28

Oct 31, 2025

2.0.27

Oct 31, 2025

2.0.26

Oct 31, 2025

2.0.25

Oct 29, 2025

2.0.24

Oct 28, 2025

2.0.21

Oct 27, 2025

2.0.15

Oct 15, 2025

2.0.10

Oct 7, 2025

2.0.9

Oct 4, 2025

2.0.8

Oct 1, 2025

2.0.7

Sep 29, 2025

2.0.6

Sep 29, 2025

2.0.0

Sep 28, 2025

1.0.3

Sep 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llms_py-2.0.33.tar.gz (575.2 kB view details)

Uploaded Nov 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llms_py-2.0.33-py3-none-any.whl (570.5 kB view details)

Uploaded Nov 4, 2025 Python 3

File details

Details for the file llms_py-2.0.33.tar.gz.

File metadata

Download URL: llms_py-2.0.33.tar.gz
Upload date: Nov 4, 2025
Size: 575.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llms_py-2.0.33.tar.gz
Algorithm	Hash digest
SHA256	`31f6b9b0a62ace77f3d668c4f12d91b2ecb4c293ea0afa0997e5b1202fb681fa`
MD5	`3dd69208a9365f9c9e5144d15a2488e1`
BLAKE2b-256	`5d11ce31722da1cf360ea209a17aa25e8e588e69a3b32b79f915c17473427fe9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llms_py-2.0.33.tar.gz:

Publisher: python-publish.yml on ServiceStack/llms

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llms_py-2.0.33.tar.gz
- Subject digest: 31f6b9b0a62ace77f3d668c4f12d91b2ecb4c293ea0afa0997e5b1202fb681fa
- Sigstore transparency entry: 665150062
- Sigstore integration time: Nov 4, 2025
Source repository:
- Permalink: ServiceStack/llms@7e31e3c71927727b322f4c9ce8418a3ec07740fb
- Branch / Tag: refs/tags/v2.0.33
- Owner: https://github.com/ServiceStack
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@7e31e3c71927727b322f4c9ce8418a3ec07740fb
- Trigger Event: release

File details

Details for the file llms_py-2.0.33-py3-none-any.whl.

File metadata

Download URL: llms_py-2.0.33-py3-none-any.whl
Upload date: Nov 4, 2025
Size: 570.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llms_py-2.0.33-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e36f483b5251ca2ba8c7004691b2c216670b46fc5a36c5a0eebe0e05bca308f`
MD5	`27a83839e4870c0615cb9d97886f0b33`
BLAKE2b-256	`2fc30082ccf118016fda420ee09d73191f6c2a74030372a630b1c17816cd9b10`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llms_py-2.0.33-py3-none-any.whl:

Publisher: python-publish.yml on ServiceStack/llms

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llms_py-2.0.33-py3-none-any.whl
- Subject digest: 8e36f483b5251ca2ba8c7004691b2c216670b46fc5a36c5a0eebe0e05bca308f
- Sigstore transparency entry: 665150071
- Sigstore integration time: Nov 4, 2025
Source repository:
- Permalink: ServiceStack/llms@7e31e3c71927727b322f4c9ce8418a3ec07740fb
- Branch / Tag: refs/tags/v2.0.33
- Owner: https://github.com/ServiceStack
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@7e31e3c71927727b322f4c9ce8418a3ec07740fb
- Trigger Event: release

llms-py 2.0.33

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

llms.py

Features

llms.py UI

Dark Mode Support

Monthly Costs Analysis

Monthly Token Usage (Dark Mode)

Monthly Activity Log

Check Provider Reliability and Response Times

Change Log

v2.0.30 (2025-11-01)

v2.0.28 (2025-10-31)

Installation

Using pip

Quick Start

1. Set API Keys

2. Run Server

Use llms.py CLI

Enable Providers

Using Docker

a) Simple - Run in a Docker container:

b) Recommended - Use Docker Compose:

c) Build and run local Docker image from source:

GitHub OAuth Authentication

Configuration

Defaults

Providers

Command Line Usage

Basic Chat

Using a Chat Template

Image Requests

Vision-Capable Models

Audio Requests

Audio-Capable Models

File Requests

File-Capable Models

Server Mode

Configuration Management

Update

Advanced Options

Custom Parameters with --args

Default Model Configuration

Updating llms.py

Beautiful rendered Markdown

Supported Providers

Environment Variables

OpenAI

Anthropic (Claude)

Google Gemini

OpenRouter

Grok (X.AI)

Groq

Ollama (Local)

Qwen (Alibaba Cloud)

Z.ai

Mistral

Codestral

Model Routing

Configuration Examples

Minimal Configuration

Multi-Provider Setup

Usage

Docker Deployment

Quick Start with Docker

Docker Images

Environment Variables

Using docker-compose

Building Locally

Volume Mounting

Custom Configuration Files

Custom Port

Custom Parameters with `--args`