An all-you-can-eat buffet of free-tier LLM APIs behind one OpenAI-compatible endpoint, with automatic failover and quota tracking.

These details have not been verified by PyPI

Project links

Project description

🍽️ llmbuffet — one free LLM API gateway for every free tier

A free, OpenAI-compatible LLM gateway that pools 15 free-tier providers (Groq, Cerebras, NVIDIA NIM, Gemini, OpenRouter, GitHub Models, Cloudflare & more) behind one endpoint — with automatic failover and quota tracking. Works out of the box with zero API keys.

Stop juggling a dozen free LLM SDKs and rate limits. Point your OpenAI client at llmbuffet and never pay for a hobby project's inference again.

Groq, Cerebras, Google Gemini, OpenRouter, GitHub Models, Cloudflare Workers AI, Mistral, Cohere, SambaNova — each hands out a generous free tier, but each has its own SDK, its own rate limits, and its own daily cap. llmbuffet puts all of them into one pool:

🔌 One OpenAI-compatible endpoint. Point any existing OpenAI SDK / tool at llmbuffet and it just works — no code changes.
🔁 Automatic failover. Hit a rate limit or a 5xx on one provider? llmbuffet transparently moves to the next.
📊 Quota-aware routing. Spreads load least-used-first and respects each provider's free daily limit, so you squeeze the most out of every tier.
🧩 One catalog, your keys. Drop in the keys you have; llmbuffet skips the rest. No key is ever stored in the repo.
🪶 Tiny. Pure-Python, one dependency (httpx). The proxy runs on the standard library.

Why it exists: stitching together a dozen free LLM tiers by hand is fiddly and breaks constantly. llmbuffet makes "never pay for a hobby project's LLM calls again" a one-command setup.

Install

pip install llmbuffet      # or: pipx install llmbuffet

Zero-config: it works with no keys at all

Two providers in the catalog need no signup (OVHcloud is keyless; LLM7's key is optional), so this works the moment you install:

pip install llmbuffet
llmbuffet ask "Explain the CAP theorem in one sentence."

Add provider keys (below) to unlock more models, higher limits, and better failover.

60-second quickstart (with keys)

Grab one or more free API keys — all free, no credit card. You only need one to start (Groq and Cerebras are the fastest to sign up for). 👉 docs/ACCOUNTS.md has 1-minute, click-by-click steps for every provider.

Provider	Get a key
Groq	https://console.groq.com/keys
Cerebras	https://cloud.cerebras.ai
OpenRouter	https://openrouter.ai/keys
Google Gemini	https://aistudio.google.com/apikey
GitHub Models	any GitHub PAT

Export the ones you have (see .env.example for all of them):

export GROQ_API_KEY=gsk_...
export CEREBRAS_API_KEY=csk-...

Ask something:

llmbuffet ask "Explain the CAP theorem in one sentence."

or pipe context in:

cat error.log | llmbuffet ask "What's the root cause here?"

Check what's wired up:

llmbuffet providers

llmbuffet catalog: 15 providers, 53 models

  ✓ ovh          OVHcloud AI Endpoints (keyless)  5 models   [configured]
  ✓ llm7         LLM7 (key optional)           1 models   [configured]
  · groq         Groq                          6 models   [set GROQ_API_KEY]
  · cerebras     Cerebras                      4 models   [set CEREBRAS_API_KEY]
  · nvidia       NVIDIA NIM                    5 models   [set NVIDIA_API_KEY]
  ...

Choosing a model or provider

By default llmbuffet auto-picks the least-used provider you have. To pin a choice:

llmbuffet models                       # list every provider/model id
llmbuffet ask -m groq/llama-3.3-70b-versatile "hi"   # exact provider + model
llmbuffet ask -m llama-3.3-70b-versatile "hi"        # that model on any provider
llmbuffet ask -p cerebras,groq "hi"                  # restrict to these providers

Same idea through the proxy via the OpenAI model field: "auto", "groq", or "groq/llama-3.3-70b-versatile".

Providers in the box

Provider	Key env	Notes
OVHcloud AI Endpoints	—	keyless, works out of the box
LLM7	`LLM7_API_KEY`	key optional
Groq	`GROQ_API_KEY`	very fast
Cerebras	`CEREBRAS_API_KEY`	very fast, large daily cap
NVIDIA NIM	`NVIDIA_API_KEY`	big model catalog (build.nvidia.com)
OpenRouter	`OPENROUTER_API_KEY`	many `:free` models
Google Gemini	`GEMINI_API_KEY`	generous free tier
GitHub Models	`GITHUB_TOKEN`	any PAT works
Cloudflare Workers AI	`CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID`
Mistral	`MISTRAL_API_KEY`
Cohere	`COHERE_API_KEY`
SambaNova	`SAMBANOVA_API_KEY`
Z.ai / Zhipu GLM	`ZHIPU_API_KEY`
Ollama Cloud	`OLLAMA_API_KEY`
LongCat (Meituan)	`LONGCAT_API_KEY`

Full signup steps for each: docs/ACCOUNTS.md.

The killer feature: a drop-in OpenAI proxy

Run the gateway:

llmbuffet proxy --port 8080

Now point any OpenAI-compatible app or SDK at it — no other changes:

export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=anything        # llmbuffet ignores it

from openai import OpenAI

client = OpenAI()  # picks up OPENAI_BASE_URL
resp = client.chat.completions.create(
    model="auto",                      # or "groq", or "groq/llama-3.3-70b-versatile"
    messages=[{"role": "user", "content": "Say hi in French."}],
)
print(resp.choices[0].message.content)

The model field controls routing:

`model` value	Routes to
`auto` (or omitted)	any configured provider, least-used first
`groq`	any model on Groq
`groq/llama-3.3-70b-versatile`	that exact model
`llama-3.3-70b-versatile`	that model on any provider that has it

Use it as the free LLM backend for your AI agent

Coding agents and agent frameworks (aider, Continue, Cline, the OpenAI Agents SDK, LangChain, ...) almost all speak the OpenAI API — so they can run on pooled free inference through llmbuffet, with failover when one provider rate-limits you mid-run (exactly when long agent loops tend to die):

llmbuffet proxy --port 8080
export OPENAI_BASE_URL=http://localhost:8080/v1 OPENAI_API_KEY=anything
aider --model openai/auto          # or point any OpenAI-compatible tool here

The proxy supports stream: true (Server-Sent Events), so streaming chat UIs and agent loops work too. Full integration snippets (aider, LangChain, Continue/Cline, OpenAI Agents SDK) are in docs/AGENTS.md.

Use it as a library

from llmbuffet import Buffet

buffet = Buffet.from_default_config()
reply = buffet.ask("Summarize the plot of Hamlet in 20 words.")
print(reply.text)
print(f"served by {reply.provider_id}/{reply.model}")

How routing works

For each request llmbuffet builds the list of (provider, model) candidates you have keys for, orders them least-used-today first (providers already over their free daily hint sink to the bottom), then tries them in order until one returns a non-empty completion. Every success is recorded to a small per-day counter at ~/.config/llmbuffet/quota.json (reset at UTC midnight). See docs/ARCHITECTURE.md for the full picture.

Adding or overriding providers

The built-in catalog lives in src/llmbuffet/providers.toml. To add a provider or override a model list without forking, drop a providers.toml at ~/.config/llmbuffet/providers.toml (or point LLMBUFFET_CONFIG at one). Same-id entries override the built-ins; new ids are appended. See CONTRIBUTING.md for the (small) anatomy of a provider.

Comparison

	llmbuffet	Calling each SDK by hand	A paid gateway
Free tiers pooled	✅ 15 providers	⚠️ you wire each one	❌
Automatic failover	✅	❌	✅
Quota tracking	✅ per-day	❌	varies
Drop-in OpenAI proxy	✅	❌	✅
Cost	$0	$0	💸
Dependencies	1 (`httpx`)	many	a service

Status

llmbuffet is 0.1 and moving fast. Provider endpoints and free-tier limits drift — if something breaks, please open an issue or send a one-line PR to providers.toml. Contributions of new free providers are especially welcome.

Found this useful?

⭐ Star the repo — it's the single biggest thing that helps others discover llmbuffet, and it keeps the free-provider catalog maintained. New free providers and one-line limit fixes are always welcome (CONTRIBUTING.md).

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmbuffet-0.2.0.tar.gz (33.6 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmbuffet-0.2.0-py3-none-any.whl (25.8 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file llmbuffet-0.2.0.tar.gz.

File metadata

Download URL: llmbuffet-0.2.0.tar.gz
Upload date: Jun 3, 2026
Size: 33.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmbuffet-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`38ea974831fd731e1313ad3973ac0a1a8ab8b549a0e5cb3559978e1e70eb3008`
MD5	`349e5cdf6ea6af2fd2b62b2409436bbc`
BLAKE2b-256	`c65f28b420b937c9a0734b7f681858d7d484fb2cb39aed81cd53ceba4bb4afe5`

See more details on using hashes here.

File details

Details for the file llmbuffet-0.2.0-py3-none-any.whl.

File metadata

Download URL: llmbuffet-0.2.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 25.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmbuffet-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`96f21277678ce5cd427597a8250f5c8d06ef9a747901af1a32c892d69d0927d0`
MD5	`ba1eab3d0a58452a439f600010531a54`
BLAKE2b-256	`2ce6fe22ebcf0a66e014945ab1e07a5f97cc9607415e15ee4bc821758bbb8f7b`

See more details on using hashes here.

llmbuffet 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🍽️ llmbuffet — one free LLM API gateway for every free tier

Install

Zero-config: it works with no keys at all

60-second quickstart (with keys)

Choosing a model or provider

Providers in the box

The killer feature: a drop-in OpenAI proxy

Use it as the free LLM backend for your AI agent

Use it as a library

How routing works

Adding or overriding providers

Comparison

Status

Found this useful?

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes