Production LLM calls. Just the three lines. Reliability, native caching, and reversible context compression on by default.
Project description
justllm
Production LLM calls. Just the three lines.
from justllm import LLM
llm = LLM("anthropic/claude-opus-4-8")
reply = llm("Summarize this contract.")
Cross-provider fallback, native prompt caching, and reversible context compression are on by default. No config. The surface stays tiny on purpose — the moment you need a dozen knobs, that is what LiteLLM is for.
Why
The ecosystem split in two: feature-complete but heavy (LiteLLM, LangChain),
or simple but feature-thin (aisuite, any-llm). Nobody ships the production
layer behind a three-line front door. justllm is that middle.
The one number that makes it worth a switch: compressing the dynamic junk that
bloats agent calls — tool outputs, logs, RAG dumps — cuts the input-token bill
without touching your code. Measured here (gpt-4o token basis): 53% saved on a
JSON API tool result, 97% on repetitive logs, with a safe no-op when
compression wouldn't help. The engine is
Headroom (PyPI: headroom-ai,
content-aware and reversible); justllm applies it only to tool/retrieved
content, never to your prompts. See benchmarks/.
Install
pip install 'justllm[all]' # transport + structured output + compression
Or take only what you need: justllm[litellm] (real calls), justllm[structured]
(extract()), justllm[compression] (Headroom). The bare pip install justllm
gives you the API and the reliability layer; calls raise a clear error until a
transport is installed.
Usage
# fallback chain + explicit knobs
llm = LLM(
chain=["anthropic/claude-opus-4-8", "openai/gpt-5", "groq/llama-3.1-70b"],
compress=True, # reversible, dynamic-context only
cache="prompt", # "cache" never silently means semantic
)
# structured output — a validated Pydantic instance
from pydantic import BaseModel
class Invoice(BaseModel):
vendor: str
total: float
inv = llm.extract(Invoice, "Parse: Acme Corp billed $4,200")
# a minimal tool-calling agent (tool outputs are auto-compressed)
agent = llm.agent(system="You are a travel assistant.", max_steps=8)
@agent.tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return weather_api(city)
agent.run("What should I pack for Boston this weekend?")
# streaming
for chunk in llm.stream("Tell me a short story."):
print(chunk, end="")
# async (acall / aextract)
reply = await llm.acall("Summarize this.")
# opt-in routing: short prompts -> cheap model, long -> strong (no extra call)
from justllm import Router
routed = LLM(router=Router(small="groq/llama-3.1-8b-instant", large="openai/gpt-4o"))
# cheap-first cascade: escalate to the strong model only when needed
from justllm import Cascade
smart = LLM(router=Cascade(small="groq/llama-3.1-8b-instant", large="openai/gpt-4o"))
# load prompts from files (no registry); only your {vars} are substituted
from justllm import prompts
prompt = prompts.load("summary", document=text) # reads prompts/summary.txt
# ...or back the same seam with a registry (Langfuse, etc.)
prompts.set_loader(prompts.langfuse_loader(label="production"))
prompt = prompts.load("summary", document=text) # now fetched from Langfuse
Optional OpenTelemetry tracing (pip install 'justllm[otel]') emits gen_ai.*
spans with a per-call gen_ai.usage.cost — the dollar figure the OTel spec
leaves out. No-op until you configure a collector.
Status
Alpha (0.3.0). Wiring is unit-tested with mocked providers (no network in CI),
and the call paths are validated live against Ollama and Groq:
- Calls — sync
llm("..."), asyncllm.acall(...), andllm.stream(...), all through LiteLLM and wrapped in cross-provider fallback. - Structured output —
llm.extract(Model, ...)/await llm.aextract(...)return a validated Pydantic instance (via instructor). - Reliability —
with_fallback/awith_fallback+RetryPolicy: retry-with-jitter on retryable errors only, one retry layer. - Caching — Headroom's per-provider cache optimizer (Anthropic breakpoints; OpenAI/Google handled) plus an opt-in exact-match cache.
- Compression —
compressover Headroom (tunable viaCompressConfig); agent tool outputs are compressed automatically. - Routing — opt-in
Router(length-based) andCascade(cheap-first, escalate only when needed); deterministic, no extra judge call. - Prompts —
prompts.load(name, **vars)file loader with a pluggable seam (swap in Langfuse etc.); no registry built in. - Observability — optional OpenTelemetry GenAI spans with the per-call
gen_ai.usage.costthe spec omits; no-op unless[otel]is installed. - Agent — a minimal tool-calling loop with a hard step cap.
Live behavior is still alpha — exercise it with your own keys (or local Ollama)
via benchmarks/bench_e2e.py.
Benchmarks
pip install -e '.[benchmarks]'
python -m benchmarks.run
Measures token/cost savings from compression, the overhead the layer adds, and that fallback actually recovers provider failures. The suite runs even without the optional deps (using fallbacks), so it is never a hard blocker.
Contributing
Contributions are very welcome — the goal is to stay SOTA and easy to use at
the same time. Start with CONTRIBUTING.md (especially the
design principles that keep the surface small), see where things are headed in
ROADMAP.md, and look for good first issue labels.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file justllm-0.3.1.tar.gz.
File metadata
- Download URL: justllm-0.3.1.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa228d99f77057d9fc5bfb549c2a12d34d020ed40899c9375a72365758114aa3
|
|
| MD5 |
b17fe9181c3739ca572aba3ece29c0c9
|
|
| BLAKE2b-256 |
c3870e981926c33c9ae35ece64752a46e7e9e7686eb8d806f2a5b6741d739be8
|
File details
Details for the file justllm-0.3.1-py3-none-any.whl.
File metadata
- Download URL: justllm-0.3.1-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df3f2daa080c3ae4de4d5a1327ff4e47360a6903d4390ba50042fb7278ff0fea
|
|
| MD5 |
599158a3be3adf2bc6a299a7c2fa40eb
|
|
| BLAKE2b-256 |
5f3a4b09103fee52eaca798f0fa8b13f4ed67e6929d679603b42732654de10e9
|