Skip to main content

A thin, unified LLM abstraction layer. Call any LLM with a single API.

Project description

anyllm

Local-first LLM abstraction — one API for Ollama, llama.cpp, OpenAI, Anthropic, and HuggingFace.

PyPI Python License

anyllm is a lightweight abstraction layer over the most popular LLM providers. Unlike heavier alternatives, it is local-first: if Ollama is running on your machine, anyllm.chat("hello") just works — no API keys, no cloud. It also supports llama.cpp, OpenAI, Anthropic, and HuggingFace Transformers behind the same tiny API, with first-class support for tool/function calling, streaming, structured JSON outputs, multi-modal inputs, embeddings, and conversation memory.

Built by Viet-Anh Nguyen at NRL.ai.

Why anyllm?

  • One-liner APIanyllm.chat("Hello") auto-detects your best local provider
  • Plugin architecture — Add custom providers via @register_provider
  • Local-first — Defaults to Ollama if available, no API key required
  • Minimal core deps — Only httpx and pydantic; every provider is optional
  • Production-ready — Streaming, async, tool-calling, retries, structured outputs

Installation

pip install anyllm

For optional providers:

pip install anyllm[openai]          # OpenAI GPT-4, GPT-3.5
pip install anyllm[anthropic]       # Claude 3.5 Sonnet / Opus / Haiku
pip install anyllm[llamacpp]        # llama.cpp local quantized models
pip install anyllm[transformers]    # HuggingFace Transformers (local)
pip install anyllm[all]             # everything

Ollama needs no Python package — just have it running at http://localhost:11434.

Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

Quick Start

import anyllm

# 1. Simple chat (auto-selects Ollama if running, else first configured provider)
reply = anyllm.chat("Explain RAG in one sentence.")
print(reply)

# 2. Specify a provider + model explicitly
reply = anyllm.chat(
    "What is the capital of France?",
    provider="ollama",
    model="llama3.1:8b",
)

# 3. Streaming (yields tokens as they are generated)
for chunk in anyllm.stream("Write a haiku about Python"):
    print(chunk, end="", flush=True)

# 4. Structured output (JSON mode — validates against a Pydantic model)
from pydantic import BaseModel
class Recipe(BaseModel):
    name: str
    ingredients: list[str]
    steps: list[str]

recipe = anyllm.chat("Give me a pasta recipe", response_model=Recipe)
print(recipe.name, recipe.ingredients)

Models & Methods

Providers (local-first priority)

Priority Provider How it works Install
1 Ollama HTTP client to http://localhost:11434 (default if reachable) built-in
2 llama.cpp Loads GGUF models via llama-cpp-python anyllm[llamacpp]
3 OpenAI REST API (gpt-4o, gpt-4o-mini, gpt-3.5-turbo) anyllm[openai]
4 Anthropic REST API (claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus) anyllm[anthropic]
5 HuggingFace Transformers Loads any HF causal-LM model locally anyllm[transformers]

Provider priority can be overridden via anyllm.set_priority([...]) or per-call with provider="...".

Features

  • Tool / function calling — Pass Python functions; parameter schemas are auto-extracted from type hints and docstrings. Dispatches to Ollama tools, OpenAI tools, or Anthropic tool use automatically.
  • Streaming — Unified token streaming for every provider (yields strings).
  • Asyncanyllm.achat(...), anyllm.astream(...).
  • Structured outputsresponse_model=MyPydanticModel uses native JSON mode on OpenAI/Anthropic/Ollama, falls back to regex extraction + retries elsewhere.
  • Multi-modal — Pass images via anyllm.chat([..., {"image": "cat.jpg"}], model="gpt-4o").
  • Embeddingsanyllm.embed("text", model="nomic-embed-text") with Ollama / OpenAI / sentence-transformers.
  • Conversation memoryConversation() with sliding-window history and optional disk persistence.
  • Retries + timeouts — Configurable exponential backoff on transient errors.

API Reference

Function Purpose
anyllm.chat(messages, **opts) Chat completion -> str or Pydantic model
anyllm.stream(messages, **opts) Generator yielding token chunks
anyllm.achat / astream Async variants
anyllm.embed(text, model=...) Returns list[float] embedding
anyllm.tools(fns, prompt) Tool-calling loop with auto-dispatch
anyllm.Conversation(system=...) Multi-turn memory
anyllm.list_models(provider=...) Enumerate available models
anyllm.register_provider(name, cls) Add a custom provider

CLI Usage

anyllm chat "Summarize this file" --file notes.txt
anyllm chat "Hi" --provider ollama --model llama3.1:8b
anyllm stream "Write a poem"
anyllm embed "hello world" --model nomic-embed-text
anyllm list-models --provider ollama

Examples

Tool calling with auto-extracted schemas

import anyllm

def get_weather(city: str, units: str = "celsius") -> dict:
    """Get the current weather for a city."""
    # ... call a weather API ...
    return {"city": city, "temp": 22, "units": units}

# anyllm inspects the signature + docstring, builds the JSON schema,
# runs the LLM, dispatches the tool call, and returns the final reply.
reply = anyllm.tools([get_weather], "What's the weather in Hanoi?")
print(reply)

Multi-turn conversation with memory

from anyllm import Conversation

conv = Conversation(system="You are a helpful Python tutor.", model="llama3.1:8b")
conv.send("What is a decorator?")
conv.send("Show me an example")          # remembers previous context
conv.save("chat.json")                   # persist to disk

Vision input with a multi-modal model

import anyllm

reply = anyllm.chat(
    [{"text": "What's in this image?"}, {"image": "cat.jpg"}],
    provider="openai",
    model="gpt-4o",
)

License

MIT (c) Viet-Anh Nguyen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anyllm-0.2.3.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anyllm-0.2.3-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file anyllm-0.2.3.tar.gz.

File metadata

  • Download URL: anyllm-0.2.3.tar.gz
  • Upload date:
  • Size: 40.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyllm-0.2.3.tar.gz
Algorithm Hash digest
SHA256 b104ef9a0a5dced264a72d4db7186d231e44b108aab5b4469dcd69318be8d3db
MD5 d4226541076558e74825cc98a006f773
BLAKE2b-256 f105d7d7f2a1099ae339e6f96704f69691063c8953691f570219a6b12287168c

See more details on using hashes here.

File details

Details for the file anyllm-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: anyllm-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyllm-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8b8d49ecc994730328c32ad29db9a46576b27bef6c93c324c7d0a230d7923d81
MD5 8b04c0e626b918bdf50eda025aed446d
BLAKE2b-256 23d785f6392f38399dec1ffe71f6b1d56e6988454eff3a9d445927ca40a858c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page