Skip to main content

A thin, unified LLM abstraction layer. Call any LLM with a single API.

Project description

anyllm

Local-first LLM abstraction — one API for Ollama, llama.cpp, OpenAI, Anthropic, and HuggingFace.

PyPI Python License

anyllm is a lightweight abstraction layer over the most popular LLM providers. Unlike heavier alternatives, it is local-first: if Ollama is running on your machine, anyllm.chat("hello") just works — no API keys, no cloud. It also supports llama.cpp, OpenAI, Anthropic, and HuggingFace Transformers behind the same tiny API, with first-class support for tool/function calling, streaming, structured JSON outputs, multi-modal inputs, embeddings, and conversation memory.

Built by Viet-Anh Nguyen at NRL.ai.

Why anyllm?

  • One-liner APIanyllm.chat("Hello") auto-detects your best local provider
  • Plugin architecture — Add custom providers via @register_provider
  • Local-first — Defaults to Ollama if available, no API key required
  • Minimal core deps — Only httpx and pydantic; every provider is optional
  • Production-ready — Streaming, async, tool-calling, retries, structured outputs

Installation

pip install anyllm

For optional providers:

pip install anyllm[openai]          # OpenAI GPT-4, GPT-3.5
pip install anyllm[anthropic]       # Claude 3.5 Sonnet / Opus / Haiku
pip install anyllm[llamacpp]        # llama.cpp local quantized models
pip install anyllm[transformers]    # HuggingFace Transformers (local)
pip install anyllm[all]             # everything

Ollama needs no Python package — just have it running at http://localhost:11434.

Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

Quick Start

import anyllm

# 1. Simple chat (auto-selects Ollama if running, else first configured provider)
reply = anyllm.chat("Explain RAG in one sentence.")
print(reply)

# 2. Specify a provider + model explicitly
reply = anyllm.chat(
    "What is the capital of France?",
    provider="ollama",
    model="llama3.1:8b",
)

# 3. Streaming (yields tokens as they are generated)
for chunk in anyllm.stream("Write a haiku about Python"):
    print(chunk, end="", flush=True)

# 4. Structured output (JSON mode — validates against a Pydantic model)
from pydantic import BaseModel
class Recipe(BaseModel):
    name: str
    ingredients: list[str]
    steps: list[str]

recipe = anyllm.chat("Give me a pasta recipe", response_model=Recipe)
print(recipe.name, recipe.ingredients)

Models & Methods

Providers (local-first priority)

Priority Provider How it works Install
1 Ollama HTTP client to http://localhost:11434 (default if reachable) built-in
2 llama.cpp Loads GGUF models via llama-cpp-python anyllm[llamacpp]
3 OpenAI REST API (gpt-4o, gpt-4o-mini, gpt-3.5-turbo) anyllm[openai]
4 Anthropic REST API (claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus) anyllm[anthropic]
5 HuggingFace Transformers Loads any HF causal-LM model locally anyllm[transformers]

Provider priority can be overridden via anyllm.set_priority([...]) or per-call with provider="...".

Features

  • Tool / function calling — Pass Python functions; parameter schemas are auto-extracted from type hints and docstrings. Dispatches to Ollama tools, OpenAI tools, or Anthropic tool use automatically.
  • Streaming — Unified token streaming for every provider (yields strings).
  • Asyncanyllm.achat(...), anyllm.astream(...).
  • Structured outputsresponse_model=MyPydanticModel uses native JSON mode on OpenAI/Anthropic/Ollama, falls back to regex extraction + retries elsewhere.
  • Multi-modal — Pass images via anyllm.chat([..., {"image": "cat.jpg"}], model="gpt-4o").
  • Embeddingsanyllm.embed("text", model="nomic-embed-text") with Ollama / OpenAI / sentence-transformers.
  • Conversation memoryConversation() with sliding-window history and optional disk persistence.
  • Retries + timeouts — Configurable exponential backoff on transient errors.

API Reference

Function Purpose
anyllm.chat(messages, **opts) Chat completion -> str or Pydantic model
anyllm.stream(messages, **opts) Generator yielding token chunks
anyllm.achat / astream Async variants
anyllm.embed(text, model=...) Returns list[float] embedding
anyllm.tools(fns, prompt) Tool-calling loop with auto-dispatch
anyllm.Conversation(system=...) Multi-turn memory
anyllm.list_models(provider=...) Enumerate available models
anyllm.register_provider(name, cls) Add a custom provider

CLI Usage

anyllm chat "Summarize this file" --file notes.txt
anyllm chat "Hi" --provider ollama --model llama3.1:8b
anyllm stream "Write a poem"
anyllm embed "hello world" --model nomic-embed-text
anyllm list-models --provider ollama

Examples

Tool calling with auto-extracted schemas

import anyllm

def get_weather(city: str, units: str = "celsius") -> dict:
    """Get the current weather for a city."""
    # ... call a weather API ...
    return {"city": city, "temp": 22, "units": units}

# anyllm inspects the signature + docstring, builds the JSON schema,
# runs the LLM, dispatches the tool call, and returns the final reply.
reply = anyllm.tools([get_weather], "What's the weather in Hanoi?")
print(reply)

Multi-turn conversation with memory

from anyllm import Conversation

conv = Conversation(system="You are a helpful Python tutor.", model="llama3.1:8b")
conv.send("What is a decorator?")
conv.send("Show me an example")          # remembers previous context
conv.save("chat.json")                   # persist to disk

Vision input with a multi-modal model

import anyllm

reply = anyllm.chat(
    [{"text": "What's in this image?"}, {"image": "cat.jpg"}],
    provider="openai",
    model="gpt-4o",
)

License

MIT (c) Viet-Anh Nguyen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anyllm-0.2.4.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anyllm-0.2.4-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file anyllm-0.2.4.tar.gz.

File metadata

  • Download URL: anyllm-0.2.4.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyllm-0.2.4.tar.gz
Algorithm Hash digest
SHA256 a14f7af802a6f3075ad431dc56d46c4dc2fdf51c9581e723ea0316c1e6b32578
MD5 5d44de909dc3e6e3d5bae5edee30dbc0
BLAKE2b-256 a9d6d4c8c4e1bd11486b709886271e015cfd548c827c7a82d85c9845ed26903a

See more details on using hashes here.

File details

Details for the file anyllm-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: anyllm-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyllm-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8b2b3da267f04064885867144a80dcb1123ba52e1680318d68aee7069179c25a
MD5 c5b1e96ebf7bec1a4e9c2269d70fc17c
BLAKE2b-256 651fb5761fddd111f1454bfa1a0ac126f95f7affe311b6e1d2ec8326949f503f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page