Production LLM calls. Just the three lines. Reliability, native caching, and reversible context compression on by default.
Project description
justllm
Production LLM calls. Just the three lines.
from justllm import LLM
llm = LLM("anthropic/claude-opus-4-8")
llm("Summarize this contract.")
That call already does the work you'd normally wire up yourself, on by default:
- Context compression. Headroom shrinks tool output by 50–95% before it reaches the model.
- Prompt-cache optimization. Cache breakpoints go where each provider wants them (Anthropic, OpenAI, Google).
- Reliability. Calls retry with backoff, then fail over to the next provider.
pip install 'justllm[all]'
A little more
Same three lines. Each of these is one call or one kwarg:
llm.extract(Invoice, text) # structured output (validated Pydantic)
llm.stream("...") # token streaming
await llm.acall("...") # async
llm.map(prompts, concurrency=8) # many prompts at once, in order
llm.embed(texts) # embeddings
chat = llm.chat(); chat.send("..."); chat.send("...") # multi-turn, remembers history
llm.agent(system="...").run("...") # tool-calling loop
llm.judge(output, criteria="...") # LLM-as-judge score
llm.evaluate(cases) # run + grade a test set
LLM(router=Cascade(small=cheap, large=big)) # cheap first, escalate when needed
A few more things sit behind opt-in extras: OpenTelemetry traces that include the per-call dollar cost (most setups leave that out), Langfuse-backed prompts, semantic cascade escalation, and exact-match caching. The hard parts are already wired; you just call them.
Runnable recipes: cookbook
Why
The ecosystem splits two ways. You can have powerful but heavy (LiteLLM, LangChain), or simple but thin (aisuite, any-llm). justllm sits in the middle: every optimization is on, and the surface stays at three lines. Keeping it that small was most of the work.
| justllm | LiteLLM | aisuite | |
|---|---|---|---|
| three-line call | yes | yes | yes |
| cross-provider fallback | on by default | config | no |
| context compression | on by default (Headroom) | manual trim | no |
| prompt-cache optimization | on by default | passthrough | no |
| structured output | yes (instructor) | passthrough | no |
| tool-calling agent | yes (minimal) | no | no |
| surface area | tiny | large | tiny |
It runs on LiteLLM underneath, so think of it as the opinionated layer on top rather than a replacement.
Alpha. The wiring is tested on CI (Python 3.10–3.13) and the call paths are checked against live models.
Cookbook · Roadmap · Changelog · Contributing · MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file justllm-0.7.0.tar.gz.
File metadata
- Download URL: justllm-0.7.0.tar.gz
- Upload date:
- Size: 98.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
687466d51027cb03a2f31ea02b98828abddd84dc204b64117279c8bdb418911e
|
|
| MD5 |
4043ca740cf6281bd32b8bf8024022cf
|
|
| BLAKE2b-256 |
2b755ed4a27095967ac5d54fd1af4c9887cdf9bce38c2358d9521ab59a0769a5
|
File details
Details for the file justllm-0.7.0-py3-none-any.whl.
File metadata
- Download URL: justllm-0.7.0-py3-none-any.whl
- Upload date:
- Size: 23.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e21b198e07acec1dd5587d7a94f02abaddbc2b77af3a6b1b4e6d48bbcd8fe33f
|
|
| MD5 |
421012119b47d842be580f3cb2e6b5d5
|
|
| BLAKE2b-256 |
e22446a68fdf5efc82ec78c7533b551d0f4854e935b832ee27950b065bf2af5c
|