Pool the free tiers of 15+ LLM providers behind one OpenAI-compatible endpoint. Free, zero-config, with automatic failover and quota tracking.
Project description
freellmpool — pool every free LLM API into one endpoint
A free, OpenAI-compatible LLM gateway that pools the free tiers of 16 providers (Groq, Cerebras, NVIDIA NIM, Gemini, OpenRouter, GitHub Models, Cloudflare & more) behind one /v1 endpoint — with automatic failover and quota tracking. Works out of the box with zero API keys.
One free tier is a toy. Sixteen, stacked, are tens of thousands of free requests a day. Point your OpenAI client at
freellmpooland stop paying for a hobby project's inference.
Groq, Cerebras, NVIDIA NIM, Google Gemini, OpenRouter, GitHub Models, Cloudflare Workers AI, Mistral, Cohere, and more each hand out a generous free tier — but each has its own SDK, rate limits, and daily cap. freellmpool puts all of them into one pool:
- 🔌 One OpenAI-compatible endpoint. Point any OpenAI SDK / tool at
freellmpooland it just works —/v1/chat/completions,/v1/models, and a/v1/responsesshim for Codex CLI & agents. - 🟢 Zero config. Works with no API keys at all — keyless providers are built in.
pip install→ask→ done. - 🔁 Automatic failover. Rate-limited or 5xx on one provider?
freellmpooltransparently rolls to the next, with a cooldown so it stops hammering a throttled pool. - 📊 Quota-aware routing. Spreads load least-used-first and respects each free daily limit, so you squeeze the most out of every tier.
- 🤖 Built for agents. Streaming (SSE), a Codex/Responses shim, and mid-run failover — exactly where long agent loops usually die.
- 🪶 Tiny. Pure-Python, one dependency (
httpx). The proxy runs on the standard library. No keys are ever stored in the repo.
Install
pip install freellmpool # or: pipx install freellmpool
Zero-config: it works with no keys at all
Three providers in the catalog need no signup (Pollinations and OVHcloud are keyless; LLM7's key is optional), so this works the moment you install:
pip install freellmpool
freellmpool ask "Explain the CAP theorem in one sentence."
Add provider keys (below) to unlock more models, higher limits, and better failover.
60-second quickstart (with keys)
-
Grab one or more free API keys — all free, no credit card. You only need one to start (Groq and Cerebras are the fastest to sign up for). 👉 docs/ACCOUNTS.md has 1-minute, click-by-click steps for every provider.
Provider Get a key Groq https://console.groq.com/keys Cerebras https://cloud.cerebras.ai OpenRouter https://openrouter.ai/keys Google Gemini https://aistudio.google.com/apikey GitHub Models any GitHub PAT -
Export the ones you have (see
.env.examplefor all of them):export GROQ_API_KEY=gsk_... export CEREBRAS_API_KEY=csk-...
-
Ask something:
freellmpool ask "Explain the CAP theorem in one sentence."
or pipe context in:
cat error.log | freellmpool ask "What's the root cause here?"
Check what's wired up:
freellmpool providers
freellmpool catalog: 16 providers, 56 models
✓ ovh OVHcloud AI Endpoints (keyless) 5 models [configured]
✓ llm7 LLM7 (key optional) 1 models [configured]
· groq Groq 6 models [set GROQ_API_KEY]
· cerebras Cerebras 4 models [set CEREBRAS_API_KEY]
· nvidia NVIDIA NIM 5 models [set NVIDIA_API_KEY]
...
Choosing a model or provider
By default freellmpool auto-picks the least-used provider you have. To pin a choice:
freellmpool models # list every provider/model id
freellmpool ask -m groq/llama-3.3-70b-versatile "hi" # exact provider + model
freellmpool ask -m llama-3.3-70b-versatile "hi" # that model on any provider
freellmpool ask -p cerebras,groq "hi" # restrict to these providers
Same idea through the proxy via the OpenAI model field: "auto", "groq", or "groq/llama-3.3-70b-versatile".
Providers in the box
| Provider | Key env | Notes |
|---|---|---|
| Pollinations | — | keyless, works out of the box |
| OVHcloud AI Endpoints | — | keyless, works out of the box |
| LLM7 | LLM7_API_KEY |
key optional |
| Groq | GROQ_API_KEY |
very fast |
| Cerebras | CEREBRAS_API_KEY |
very fast, large daily cap |
| NVIDIA NIM | NVIDIA_API_KEY |
big model catalog (build.nvidia.com) |
| OpenRouter | OPENROUTER_API_KEY |
many :free models |
| Google Gemini | GEMINI_API_KEY |
generous free tier |
| GitHub Models | GITHUB_TOKEN |
any PAT works |
| Cloudflare Workers AI | CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID |
|
| Mistral | MISTRAL_API_KEY |
|
| Cohere | COHERE_API_KEY |
|
| SambaNova | SAMBANOVA_API_KEY |
|
| Z.ai / Zhipu GLM | ZHIPU_API_KEY |
|
| Ollama Cloud | OLLAMA_API_KEY |
|
| LongCat (Meituan) | LONGCAT_API_KEY |
Full signup steps for each: docs/ACCOUNTS.md.
The killer feature: a drop-in OpenAI proxy
Run the gateway:
freellmpool proxy --port 8080
Now point any OpenAI-compatible app or SDK at it — no other changes:
export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=anything # freellmpool ignores it
from openai import OpenAI
client = OpenAI() # picks up OPENAI_BASE_URL
resp = client.chat.completions.create(
model="auto", # or "groq", or "groq/llama-3.3-70b-versatile"
messages=[{"role": "user", "content": "Say hi in French."}],
)
print(resp.choices[0].message.content)
The model field controls routing:
model value |
Routes to |
|---|---|
auto (or omitted) |
any configured provider, least-used first |
groq |
any model on Groq |
groq/llama-3.3-70b-versatile |
that exact model |
llama-3.3-70b-versatile |
that model on any provider that has it |
Use it as the free LLM backend for your AI agent
Coding agents and agent frameworks (aider, Continue, Cline, the OpenAI Agents SDK, LangChain, ...) almost all speak the OpenAI API — so they can run on pooled free inference through freellmpool, with failover when one provider rate-limits you mid-run (exactly when long agent loops tend to die):
freellmpool proxy --port 8080
export OPENAI_BASE_URL=http://localhost:8080/v1 OPENAI_API_KEY=anything
aider --model openai/auto # or point any OpenAI-compatible tool here
The proxy supports stream: true (Server-Sent Events), so streaming chat UIs and agent loops work too. Full integration snippets (aider, LangChain, Continue/Cline, OpenAI Agents SDK) are in docs/AGENTS.md.
Use it as a library
from freellmpool import Pool
pool = Pool.from_default_config()
reply = pool.ask("Summarize the plot of Hamlet in 20 words.")
print(reply.text)
print(f"served by {reply.provider_id}/{reply.model}")
How routing works
For each request freellmpool builds the list of (provider, model) candidates you have keys for, orders them least-used-today first (providers already over their free daily hint sink to the bottom), then tries them in order until one returns a non-empty completion. Every success is recorded to a small per-day counter at ~/.config/freellmpool/quota.json (reset at UTC midnight). See docs/ARCHITECTURE.md for the full picture.
Adding or overriding providers
The built-in catalog lives in src/freellmpool/providers.toml. To add a provider or override a model list without forking, drop a providers.toml at ~/.config/freellmpool/providers.toml (or point FREELLMPOOL_CONFIG at one). Same-id entries override the built-ins; new ids are appended. See CONTRIBUTING.md for the (small) anatomy of a provider.
Comparison
| freellmpool | Calling each SDK by hand | A paid gateway | |
|---|---|---|---|
| Free tiers pooled | ✅ 16 providers | ⚠️ you wire each one | ❌ |
| Automatic failover | ✅ | ❌ | ✅ |
| Quota tracking | ✅ per-day | ❌ | varies |
| Drop-in OpenAI proxy | ✅ | ❌ | ✅ |
| Cost | $0 | $0 | 💸 |
| Dependencies | 1 (httpx) |
many | a service |
Limitations (read this)
freellmpool is honest about what it is — a way to pool free tiers, not a frontier-model service:
- No GPT-5 / Claude-Opus-class reasoning. Free tiers are smaller/faster models — great for triage, drafting, classification, tool-routing, and everyday coding; reach for a frontier model for the hardest reasoning.
- Quality and capacity vary through the day as high-cap pools exhaust; daily limits reset at UTC midnight.
- Free tiers change without notice. Endpoints, model ids, and limits drift — that's what the one-line
providers.tomlPRs are for. - Local-first, single-user. The proxy defaults to
127.0.0.1; if you bind it to a network interface, set a proxy key (--api-key). Not meant as a multi-tenant production gateway. - Respect the providers. This pools free tiers for personal projects and experimentation — don't abuse them, or we all lose them.
Status
freellmpool is 0.3 and moving fast. Provider endpoints and free-tier limits drift — if something breaks, please open an issue or send a one-line PR to providers.toml. Contributions of new free providers are especially welcome.
Found this useful?
⭐ Star the repo — it's the single biggest thing that helps others discover freellmpool, and it keeps the free-provider catalog maintained. New free providers and one-line limit fixes are always welcome (CONTRIBUTING.md).
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file freellmpool-0.3.0.tar.gz.
File metadata
- Download URL: freellmpool-0.3.0.tar.gz
- Upload date:
- Size: 35.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84652a753ce79ae1198a5e9942e9437487d19aa5bc99126f67ee0fdbf7352147
|
|
| MD5 |
6c7bc6373182850adcab097c026ca292
|
|
| BLAKE2b-256 |
83f8fa4698e07fad467b6e69458dd7b5ef604afe54018a76ec4633d472627bbf
|
File details
Details for the file freellmpool-0.3.0-py3-none-any.whl.
File metadata
- Download URL: freellmpool-0.3.0-py3-none-any.whl
- Upload date:
- Size: 27.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edde3db938449f13d60c813964a6932df1e6edaaf89841a0f26c0fea1abe2325
|
|
| MD5 |
37552c648a68e8f825db294ce6e14f7a
|
|
| BLAKE2b-256 |
509e12ccd8d2f306615a4afe51570acab281bcc1ce1d1657083bbe140719f499
|