A thin, unified LLM abstraction layer. Call any LLM with a single API.
Project description
anyllm
Chat with any LLM — local-first, cloud-optional
A thin, unified LLM abstraction layer. Call any LLM with a single API.
Local-first: Uses Ollama/llama.cpp by default. Cloud APIs optional. When local models are available, anyllm automatically prefers them over cloud providers, ensuring your applications work offline.
Simpler than litellm -- focused on the essentials. No bloat, no complex abstractions. Just anyllm.chat().
Provider Support
| Provider | API Key Env Var | Streaming | Local |
|---|---|---|---|
| OpenAI (+ compatible APIs) | OPENAI_API_KEY |
Yes | No |
| Anthropic Claude | ANTHROPIC_API_KEY |
Yes | No |
| Ollama | -- | Yes | Yes |
| llama.cpp server | -- | Yes | Yes |
Installation
pip install anyllm
With optional provider SDKs:
pip install anyllm[openai] # OpenAI SDK
pip install anyllm[anthropic] # Anthropic SDK
pip install anyllm[all] # All optional SDKs
Note: anyllm works without any provider SDKs installed -- it uses
httpxfor all HTTP calls by default.
Quick Start
Simple Chat
import anyllm
# Auto-detects available provider: checks Ollama (localhost:11434) first,
# then env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY)
# Sends HTTP request via httpx — no SDK dependencies required
# Returns Response(content, model, usage) with token counts
response = anyllm.chat("What is the meaning of life?")
print(response.content)
OpenAI
export OPENAI_API_KEY="sk-..."
response = anyllm.chat("Hello!", model="openai/gpt-4")
print(response.content)
print(f"Tokens used: {response.usage.total_tokens}")
Anthropic Claude
export ANTHROPIC_API_KEY="sk-ant-..."
response = anyllm.chat("Hello!", model="anthropic/claude-sonnet-4-20250514")
print(response.content)
Ollama (Local)
ollama pull llama3
response = anyllm.chat("Hello!", model="ollama/llama3")
print(response.content)
llama.cpp Server
./llama-server -m model.gguf
response = anyllm.chat("Hello!", model="llamacpp/default")
print(response.content)
Streaming
for chunk in anyllm.chat("Tell me a story", model="openai/gpt-4", stream=True):
print(chunk, end="", flush=True)
print()
Tool / Function Calling
# Tool/function calling — LLM decides when to call your functions
# anyllm forwards tool schemas to the provider, parses tool_call responses,
# and surfaces them in response.tool_calls for you to execute
response = anyllm.chat("What's the weather?", tools=[...])
Full Message Format
response = anyllm.chat(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"},
],
model="openai/gpt-4",
temperature=0.7,
max_tokens=500,
)
Or use the system parameter shorthand:
response = anyllm.chat(
"What is Python?",
model="openai/gpt-4",
system="You are a helpful assistant.",
)
Configuration
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key |
OPENAI_BASE_URL |
Custom OpenAI-compatible API base URL |
ANTHROPIC_API_KEY |
Anthropic API key |
OLLAMA_HOST |
Ollama server URL (default: http://localhost:11434) |
ANYLLM_DEFAULT_MODEL |
Default model to use |
Config File
Create ~/.anyllm/config.json:
{
"default_model": "openai/gpt-4",
"openai_api_key": "sk-...",
"ollama_base_url": "http://localhost:11434"
}
Programmatic Configuration
import anyllm
# Set default model
anyllm.set_default("ollama/llama3")
# Check available providers
print(anyllm.available_providers()) # ['openai', 'ollama']
# List models
print(anyllm.list_models())
# {'openai': ['gpt-4', 'gpt-3.5-turbo', ...], 'ollama': ['llama3', ...]}
# Direct config access
config = anyllm.get_config()
config.set("openai_base_url", "http://localhost:1234/v1")
Response Object
response = anyllm.chat("Hello!", model="openai/gpt-4")
response.content # "Hello! How can I help you?"
response.model # "gpt-4"
response.usage # Usage(prompt_tokens=10, completion_tokens=8, total_tokens=18)
response.raw_response # Raw API response dict
Local-First / Edge AI
anyllm is designed with a local-first philosophy. When auto-detecting providers, it checks local options (Ollama, llama.cpp) before cloud APIs. This means your applications work offline by default when local models are available.
import anyllm
# See what local models are available
print(anyllm.list_local_models())
# {'ollama': ['llama3', 'mistral', ...]}
# Auto-detection prefers local -- this uses Ollama if running
response = anyllm.chat("Hello!")
# Explicitly use a local model
response = anyllm.chat("Hello!", model="ollama/llama3")
Error Handling
anyllm includes automatic retry with exponential backoff for transient errors:
# Retries up to 3 times by default
response = anyllm.chat("Hello!", model="openai/gpt-4", max_retries=3)
License
MIT License. See LICENSE for details.
Author
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anyllm-0.2.2.tar.gz.
File metadata
- Download URL: anyllm-0.2.2.tar.gz
- Upload date:
- Size: 40.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45ec5ad0d248c1c92be99e816d6d755dfda29da8bbf4c9a1a8817c83a1de49d6
|
|
| MD5 |
0d920e7d6678cca3811ca8875487285f
|
|
| BLAKE2b-256 |
449d637f177082fff6d03a687cbd1144ededc1c752db8904f44f66f0dd1b17cb
|
File details
Details for the file anyllm-0.2.2-py3-none-any.whl.
File metadata
- Download URL: anyllm-0.2.2-py3-none-any.whl
- Upload date:
- Size: 33.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9249198451b268833e5d3343c437d931b679ef3488ca277dc0a410ac59be902b
|
|
| MD5 |
21659de27c62cac386f79a786bbe10db
|
|
| BLAKE2b-256 |
615cd76762fea4b0526af5926fc2410c2898035abdde6c24f609259e298fee38
|