A thin, unified LLM abstraction layer. Call any LLM with a single API.
Project description
anyllm
Local-first LLM abstraction — one API for Ollama, llama.cpp, OpenAI, Anthropic, and HuggingFace.
anyllm is a lightweight abstraction layer over the most popular LLM providers. Unlike heavier alternatives, it is local-first: if Ollama is running on your machine, anyllm.chat("hello") just works — no API keys, no cloud. It also supports llama.cpp, OpenAI, Anthropic, and HuggingFace Transformers behind the same tiny API, with first-class support for tool/function calling, streaming, structured JSON outputs, multi-modal inputs, embeddings, and conversation memory.
Built by Viet-Anh Nguyen at NRL.ai.
Why anyllm?
- One-liner API —
anyllm.chat("Hello")auto-detects your best local provider - Plugin architecture — Add custom providers via
@register_provider - Local-first — Defaults to Ollama if available, no API key required
- Minimal core deps — Only
httpxandpydantic; every provider is optional - Production-ready — Streaming, async, tool-calling, retries, structured outputs
Installation
pip install anyllm
For optional providers:
pip install anyllm[openai] # OpenAI GPT-4, GPT-3.5
pip install anyllm[anthropic] # Claude 3.5 Sonnet / Opus / Haiku
pip install anyllm[llamacpp] # llama.cpp local quantized models
pip install anyllm[transformers] # HuggingFace Transformers (local)
pip install anyllm[all] # everything
Ollama needs no Python package — just have it running at http://localhost:11434.
Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)
Quick Start
import anyllm
# 1. Simple chat (auto-selects Ollama if running, else first configured provider)
reply = anyllm.chat("Explain RAG in one sentence.")
print(reply)
# 2. Specify a provider + model explicitly
reply = anyllm.chat(
"What is the capital of France?",
provider="ollama",
model="llama3.1:8b",
)
# 3. Streaming (yields tokens as they are generated)
for chunk in anyllm.stream("Write a haiku about Python"):
print(chunk, end="", flush=True)
# 4. Structured output (JSON mode — validates against a Pydantic model)
from pydantic import BaseModel
class Recipe(BaseModel):
name: str
ingredients: list[str]
steps: list[str]
recipe = anyllm.chat("Give me a pasta recipe", response_model=Recipe)
print(recipe.name, recipe.ingredients)
Models & Methods
Providers (local-first priority)
| Priority | Provider | How it works | Install |
|---|---|---|---|
| 1 | Ollama | HTTP client to http://localhost:11434 (default if reachable) |
built-in |
| 2 | llama.cpp | Loads GGUF models via llama-cpp-python |
anyllm[llamacpp] |
| 3 | OpenAI | REST API (gpt-4o, gpt-4o-mini, gpt-3.5-turbo) |
anyllm[openai] |
| 4 | Anthropic | REST API (claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus) |
anyllm[anthropic] |
| 5 | HuggingFace Transformers | Loads any HF causal-LM model locally | anyllm[transformers] |
Provider priority can be overridden via anyllm.set_priority([...]) or per-call with provider="...".
Features
- Tool / function calling — Pass Python functions; parameter schemas are auto-extracted from type hints and docstrings. Dispatches to Ollama tools, OpenAI tools, or Anthropic tool use automatically.
- Streaming — Unified token streaming for every provider (yields strings).
- Async —
anyllm.achat(...),anyllm.astream(...). - Structured outputs —
response_model=MyPydanticModeluses native JSON mode on OpenAI/Anthropic/Ollama, falls back to regex extraction + retries elsewhere. - Multi-modal — Pass images via
anyllm.chat([..., {"image": "cat.jpg"}], model="gpt-4o"). - Embeddings —
anyllm.embed("text", model="nomic-embed-text")with Ollama / OpenAI / sentence-transformers. - Conversation memory —
Conversation()with sliding-window history and optional disk persistence. - Retries + timeouts — Configurable exponential backoff on transient errors.
API Reference
| Function | Purpose |
|---|---|
anyllm.chat(messages, **opts) |
Chat completion -> str or Pydantic model |
anyllm.stream(messages, **opts) |
Generator yielding token chunks |
anyllm.achat / astream |
Async variants |
anyllm.embed(text, model=...) |
Returns list[float] embedding |
anyllm.tools(fns, prompt) |
Tool-calling loop with auto-dispatch |
anyllm.Conversation(system=...) |
Multi-turn memory |
anyllm.list_models(provider=...) |
Enumerate available models |
anyllm.register_provider(name, cls) |
Add a custom provider |
CLI Usage
anyllm chat "Summarize this file" --file notes.txt
anyllm chat "Hi" --provider ollama --model llama3.1:8b
anyllm stream "Write a poem"
anyllm embed "hello world" --model nomic-embed-text
anyllm list-models --provider ollama
Examples
Tool calling with auto-extracted schemas
import anyllm
def get_weather(city: str, units: str = "celsius") -> dict:
"""Get the current weather for a city."""
# ... call a weather API ...
return {"city": city, "temp": 22, "units": units}
# anyllm inspects the signature + docstring, builds the JSON schema,
# runs the LLM, dispatches the tool call, and returns the final reply.
reply = anyllm.tools([get_weather], "What's the weather in Hanoi?")
print(reply)
Multi-turn conversation with memory
from anyllm import Conversation
conv = Conversation(system="You are a helpful Python tutor.", model="llama3.1:8b")
conv.send("What is a decorator?")
conv.send("Show me an example") # remembers previous context
conv.save("chat.json") # persist to disk
Vision input with a multi-modal model
import anyllm
reply = anyllm.chat(
[{"text": "What's in this image?"}, {"image": "cat.jpg"}],
provider="openai",
model="gpt-4o",
)
License
MIT (c) Viet-Anh Nguyen
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anyllm-0.2.4.tar.gz.
File metadata
- Download URL: anyllm-0.2.4.tar.gz
- Upload date:
- Size: 41.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a14f7af802a6f3075ad431dc56d46c4dc2fdf51c9581e723ea0316c1e6b32578
|
|
| MD5 |
5d44de909dc3e6e3d5bae5edee30dbc0
|
|
| BLAKE2b-256 |
a9d6d4c8c4e1bd11486b709886271e015cfd548c827c7a82d85c9845ed26903a
|
File details
Details for the file anyllm-0.2.4-py3-none-any.whl.
File metadata
- Download URL: anyllm-0.2.4-py3-none-any.whl
- Upload date:
- Size: 34.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b2b3da267f04064885867144a80dcb1123ba52e1680318d68aee7069179c25a
|
|
| MD5 |
c5b1e96ebf7bec1a4e9c2269d70fc17c
|
|
| BLAKE2b-256 |
651fb5761fddd111f1454bfa1a0ac126f95f7affe311b6e1d2ec8326949f503f
|