Production LLM calls. Just the three lines. Reliability, native caching, and reversible context compression on by default.
Project description
justllm
Production LLM calls. Just the three lines.
from justllm import LLM
llm = LLM("anthropic/claude-opus-4-8")
reply = llm("Summarize this contract.")
Cross-provider fallback, native prompt caching, and reversible context compression are on by default. No config. The surface stays tiny on purpose — the moment you need a dozen knobs, that is what LiteLLM is for.
Why
The ecosystem split in two: feature-complete but heavy (LiteLLM, LangChain),
or simple but feature-thin (aisuite, any-llm). Nobody ships the production
layer behind a three-line front door. justllm is that middle.
The one number that makes it worth a switch: compressing the dynamic junk that
bloats agent calls — tool outputs, logs, RAG dumps — cuts the input-token bill
without touching your code. Measured here (gpt-4o token basis): 53% saved on a
JSON API tool result, 97% on repetitive logs, with a safe no-op when
compression wouldn't help. The engine is
Headroom (PyPI: headroom-ai,
content-aware and reversible); justllm applies it only to tool/retrieved
content, never to your prompts. See benchmarks/.
Install
pip install 'justllm[all]' # transport + structured output + compression
Or take only what you need: justllm[litellm] (real calls), justllm[structured]
(extract()), justllm[compression] (Headroom). The bare pip install justllm
gives you the API and the reliability layer; calls raise a clear error until a
transport is installed.
Usage
# fallback chain + explicit knobs
llm = LLM(
chain=["anthropic/claude-opus-4-8", "openai/gpt-5", "groq/llama-3.1-70b"],
compress=True, # reversible, dynamic-context only
cache="prompt", # "cache" never silently means semantic
)
# structured output — a validated Pydantic instance
from pydantic import BaseModel
class Invoice(BaseModel):
vendor: str
total: float
inv = llm.extract(Invoice, "Parse: Acme Corp billed $4,200")
# a minimal tool-calling agent (tool outputs are auto-compressed)
agent = llm.agent(system="You are a travel assistant.", max_steps=8)
@agent.tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return weather_api(city)
agent.run("What should I pack for Boston this weekend?")
Status
Alpha (0.1.0). All wiring is unit-tested — transport, caching, fallback,
structured output, and the agent loop are verified with mocked providers, so
there's no network in CI:
- Calls —
llm("...")andllm.extract(Model, ...)make real calls through LiteLLM, wrapped in cross-provider fallback. - Reliability —
with_fallback+RetryPolicy: retry-with-jitter on retryable errors only, one retry layer. - Caching — native prompt caching (Anthropic breakpoint / OpenAI automatic) plus an opt-in exact-match cache.
- Compression —
compressover Headroom; agent tool outputs are compressed automatically. - Agent — a minimal tool-calling loop with a hard step cap.
Not yet validated against live provider APIs end-to-end (that needs keys — see
benchmarks/bench_e2e.py). Treat live behavior as alpha.
Benchmarks
pip install -e '.[benchmarks]'
python -m benchmarks.run
Measures token/cost savings from compression, the overhead the layer adds, and that fallback actually recovers provider failures. The suite runs even without the optional deps (using fallbacks), so it is never a hard blocker.
License
MIT © Robert Walmsley
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file justllm-0.1.0.tar.gz.
File metadata
- Download URL: justllm-0.1.0.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d7511eff93581019120aaaecdbe2209d29e444cc7f961de5da0236a8d3bd0c2
|
|
| MD5 |
a3c5e226460f5e044ae41de9ad89cd80
|
|
| BLAKE2b-256 |
29c36a9fe44e131aa964e0b06590e013564cc903d5b20a771bc8acc2fb05b72c
|
File details
Details for the file justllm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: justllm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4efc1e36e2763285ac6d34e46be2f32460776675ad91c7fdb2d9d069ab4712d
|
|
| MD5 |
e41c661476df431f4f700aed85d6dafd
|
|
| BLAKE2b-256 |
c7dff64b624a0c291a0af195bb2fd0129a85b04e0bd083b5795b175107d1030c
|