Skip to main content

An all-you-can-eat buffet of free-tier LLM APIs behind one OpenAI-compatible endpoint, with automatic failover and quota tracking.

Project description

🍽️ llmbuffet — one free LLM API gateway for every free tier

A free, OpenAI-compatible LLM gateway that pools 15 free-tier providers (Groq, Cerebras, NVIDIA NIM, Gemini, OpenRouter, GitHub Models, Cloudflare & more) behind one endpoint — with automatic failover and quota tracking. Works out of the box with zero API keys.

CI License: MIT Python 3.11+

Stop juggling a dozen free LLM SDKs and rate limits. Point your OpenAI client at llmbuffet and never pay for a hobby project's inference again.

Groq, Cerebras, Google Gemini, OpenRouter, GitHub Models, Cloudflare Workers AI, Mistral, Cohere, SambaNova — each hands out a generous free tier, but each has its own SDK, its own rate limits, and its own daily cap. llmbuffet puts all of them into one pool:

  • 🔌 One OpenAI-compatible endpoint. Point any existing OpenAI SDK / tool at llmbuffet and it just works — no code changes.
  • 🔁 Automatic failover. Hit a rate limit or a 5xx on one provider? llmbuffet transparently moves to the next.
  • 📊 Quota-aware routing. Spreads load least-used-first and respects each provider's free daily limit, so you squeeze the most out of every tier.
  • 🧩 One catalog, your keys. Drop in the keys you have; llmbuffet skips the rest. No key is ever stored in the repo.
  • 🪶 Tiny. Pure-Python, one dependency (httpx). The proxy runs on the standard library.

Why it exists: stitching together a dozen free LLM tiers by hand is fiddly and breaks constantly. llmbuffet makes "never pay for a hobby project's LLM calls again" a one-command setup.


Install

pip install llmbuffet      # or: pipx install llmbuffet

Zero-config: it works with no keys at all

Two providers in the catalog need no signup (OVHcloud is keyless; LLM7's key is optional), so this works the moment you install:

pip install llmbuffet
llmbuffet ask "Explain the CAP theorem in one sentence."

Add provider keys (below) to unlock more models, higher limits, and better failover.

60-second quickstart (with keys)

  1. Grab one or more free API keys — all free, no credit card. You only need one to start (Groq and Cerebras are the fastest to sign up for). 👉 docs/ACCOUNTS.md has 1-minute, click-by-click steps for every provider.

    Provider Get a key
    Groq https://console.groq.com/keys
    Cerebras https://cloud.cerebras.ai
    OpenRouter https://openrouter.ai/keys
    Google Gemini https://aistudio.google.com/apikey
    GitHub Models any GitHub PAT
  2. Export the ones you have (see .env.example for all of them):

    export GROQ_API_KEY=gsk_...
    export CEREBRAS_API_KEY=csk-...
    
  3. Ask something:

    llmbuffet ask "Explain the CAP theorem in one sentence."
    

    or pipe context in:

    cat error.log | llmbuffet ask "What's the root cause here?"
    

Check what's wired up:

llmbuffet providers
llmbuffet catalog: 15 providers, 53 models

  ✓ ovh          OVHcloud AI Endpoints (keyless)  5 models   [configured]
  ✓ llm7         LLM7 (key optional)           1 models   [configured]
  · groq         Groq                          6 models   [set GROQ_API_KEY]
  · cerebras     Cerebras                      4 models   [set CEREBRAS_API_KEY]
  · nvidia       NVIDIA NIM                    5 models   [set NVIDIA_API_KEY]
  ...

Choosing a model or provider

By default llmbuffet auto-picks the least-used provider you have. To pin a choice:

llmbuffet models                       # list every provider/model id
llmbuffet ask -m groq/llama-3.3-70b-versatile "hi"   # exact provider + model
llmbuffet ask -m llama-3.3-70b-versatile "hi"        # that model on any provider
llmbuffet ask -p cerebras,groq "hi"                  # restrict to these providers

Same idea through the proxy via the OpenAI model field: "auto", "groq", or "groq/llama-3.3-70b-versatile".

Providers in the box

Provider Key env Notes
OVHcloud AI Endpoints keyless, works out of the box
LLM7 LLM7_API_KEY key optional
Groq GROQ_API_KEY very fast
Cerebras CEREBRAS_API_KEY very fast, large daily cap
NVIDIA NIM NVIDIA_API_KEY big model catalog (build.nvidia.com)
OpenRouter OPENROUTER_API_KEY many :free models
Google Gemini GEMINI_API_KEY generous free tier
GitHub Models GITHUB_TOKEN any PAT works
Cloudflare Workers AI CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID
Mistral MISTRAL_API_KEY
Cohere COHERE_API_KEY
SambaNova SAMBANOVA_API_KEY
Z.ai / Zhipu GLM ZHIPU_API_KEY
Ollama Cloud OLLAMA_API_KEY
LongCat (Meituan) LONGCAT_API_KEY

Full signup steps for each: docs/ACCOUNTS.md.

The killer feature: a drop-in OpenAI proxy

Run the gateway:

llmbuffet proxy --port 8080

Now point any OpenAI-compatible app or SDK at it — no other changes:

export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=anything        # llmbuffet ignores it
from openai import OpenAI

client = OpenAI()  # picks up OPENAI_BASE_URL
resp = client.chat.completions.create(
    model="auto",                      # or "groq", or "groq/llama-3.3-70b-versatile"
    messages=[{"role": "user", "content": "Say hi in French."}],
)
print(resp.choices[0].message.content)

The model field controls routing:

model value Routes to
auto (or omitted) any configured provider, least-used first
groq any model on Groq
groq/llama-3.3-70b-versatile that exact model
llama-3.3-70b-versatile that model on any provider that has it

Use it as the free LLM backend for your AI agent

Coding agents and agent frameworks (aider, Continue, Cline, the OpenAI Agents SDK, LangChain, ...) almost all speak the OpenAI API — so they can run on pooled free inference through llmbuffet, with failover when one provider rate-limits you mid-run (exactly when long agent loops tend to die):

llmbuffet proxy --port 8080
export OPENAI_BASE_URL=http://localhost:8080/v1 OPENAI_API_KEY=anything
aider --model openai/auto          # or point any OpenAI-compatible tool here

The proxy supports stream: true (Server-Sent Events), so streaming chat UIs and agent loops work too. Full integration snippets (aider, LangChain, Continue/Cline, OpenAI Agents SDK) are in docs/AGENTS.md.

Use it as a library

from llmbuffet import Buffet

buffet = Buffet.from_default_config()
reply = buffet.ask("Summarize the plot of Hamlet in 20 words.")
print(reply.text)
print(f"served by {reply.provider_id}/{reply.model}")

How routing works

For each request llmbuffet builds the list of (provider, model) candidates you have keys for, orders them least-used-today first (providers already over their free daily hint sink to the bottom), then tries them in order until one returns a non-empty completion. Every success is recorded to a small per-day counter at ~/.config/llmbuffet/quota.json (reset at UTC midnight). See docs/ARCHITECTURE.md for the full picture.

Adding or overriding providers

The built-in catalog lives in src/llmbuffet/providers.toml. To add a provider or override a model list without forking, drop a providers.toml at ~/.config/llmbuffet/providers.toml (or point LLMBUFFET_CONFIG at one). Same-id entries override the built-ins; new ids are appended. See CONTRIBUTING.md for the (small) anatomy of a provider.

Comparison

llmbuffet Calling each SDK by hand A paid gateway
Free tiers pooled ✅ 15 providers ⚠️ you wire each one
Automatic failover
Quota tracking ✅ per-day varies
Drop-in OpenAI proxy
Cost $0 $0 💸
Dependencies 1 (httpx) many a service

Status

llmbuffet is 0.1 and moving fast. Provider endpoints and free-tier limits drift — if something breaks, please open an issue or send a one-line PR to providers.toml. Contributions of new free providers are especially welcome.

Found this useful?

Star the repo — it's the single biggest thing that helps others discover llmbuffet, and it keeps the free-provider catalog maintained. New free providers and one-line limit fixes are always welcome (CONTRIBUTING.md).

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmbuffet-0.2.0.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmbuffet-0.2.0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file llmbuffet-0.2.0.tar.gz.

File metadata

  • Download URL: llmbuffet-0.2.0.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmbuffet-0.2.0.tar.gz
Algorithm Hash digest
SHA256 38ea974831fd731e1313ad3973ac0a1a8ab8b549a0e5cb3559978e1e70eb3008
MD5 349e5cdf6ea6af2fd2b62b2409436bbc
BLAKE2b-256 c65f28b420b937c9a0734b7f681858d7d484fb2cb39aed81cd53ceba4bb4afe5

See more details on using hashes here.

File details

Details for the file llmbuffet-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llmbuffet-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmbuffet-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 96f21277678ce5cd427597a8250f5c8d06ef9a747901af1a32c892d69d0927d0
MD5 ba1eab3d0a58452a439f600010531a54
BLAKE2b-256 2ce6fe22ebcf0a66e014945ab1e07a5f97cc9607415e15ee4bc821758bbb8f7b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page