Pool the free tiers of 15+ LLM providers behind one OpenAI-compatible endpoint. Free, zero-config, with automatic failover and quota tracking.

These details have not been verified by PyPI

Project links

Project description

freellmpool — pool every free LLM API into one endpoint

A free, OpenAI-compatible LLM gateway that pools the free tiers of 16 providers (Groq, Cerebras, NVIDIA NIM, Gemini, OpenRouter, GitHub Models, Cloudflare & more) behind one /v1 endpoint — with automatic failover and quota tracking. Works out of the box with zero API keys.

freellmpool demo

One free tier is a toy. Sixteen, stacked, are tens of thousands of free requests a day. And unlike a self-hosted gateway, freellmpool is just pip install — a CLI, a Python library, and a proxy — that works with no keys, no Docker, no setup.

pip install freellmpool
freellmpool ask "Explain the CAP theorem in one sentence."   # ← real answer, zero keys

Groq, Cerebras, NVIDIA NIM, Google Gemini, OpenRouter, GitHub Models, Cloudflare Workers AI, Mistral, Cohere, and more each hand out a generous free tier — but each has its own SDK, rate limits, and daily cap. freellmpool puts all of them into one pool:

🔌 True drop-in. Point any OpenAI SDK / tool at freellmpool and it just works — /v1/chat/completions, /v1/models, tool/function-calling, and a /v1/responses shim for Codex CLI & agents. Common model names (gpt-4o-mini, claude-3-5-sonnet, …) are auto-aliased to free models, so existing code runs unchanged.
🟢 Zero config. Works with no API keys at all — keyless providers are built in. pip install → ask → done.
🔁 Automatic failover. Rate-limited or 5xx on one provider? freellmpool transparently rolls to the next, with a cooldown so it stops hammering a throttled pool.
📊 Quota-aware routing. Spreads load least-used-first and respects each free daily limit, so you squeeze the most out of every tier.
🤖 Built for agents. Streaming (SSE), a Codex/Responses shim, and mid-run failover — exactly where long agent loops usually die.
🧠 Chat + embeddings. Pooled free /v1/embeddings too (pool.embed(...)) — free RAG, not just chat.
🪶 Tiny. Pure-Python, one dependency (httpx). The proxy runs on the standard library. No keys are ever stored in the repo.

Use it five ways


CLI	`freellmpool ask "..."` — pipe stdin in, `--json` out
Library	`from freellmpool import Pool` — `pool.ask(...)`, `pool.embed(...)`
Proxy	`freellmpool proxy` — a drop-in `OPENAI_BASE_URL` for any tool
`llm` plugin	`llm install llm-freellmpool` → `llm -m freellmpool "..."`
MCP server	`freellmpool mcp` — let Claude Desktop / Code / Cursor offload to free models (docs)

It's not a server you have to host with keys you have to manage — it's a client that just works.

Install

pip install freellmpool      # or: pipx install freellmpool

Zero-config: it works with no keys at all

Three providers in the catalog need no signup (Pollinations and OVHcloud are keyless; LLM7's key is optional), so this works the moment you install:

pip install freellmpool
freellmpool ask "Explain the CAP theorem in one sentence."

Add provider keys (below) to unlock more models, higher limits, and better failover.

60-second quickstart (with keys)

Grab one or more free API keys — all free, no credit card. You only need one to start (Groq and Cerebras are the fastest to sign up for). 👉 docs/ACCOUNTS.md has 1-minute, click-by-click steps for every provider.

Provider	Get a key
Groq	https://console.groq.com/keys
Cerebras	https://cloud.cerebras.ai
OpenRouter	https://openrouter.ai/keys
Google Gemini	https://aistudio.google.com/apikey
GitHub Models	any GitHub PAT

Export the ones you have (see .env.example for all of them):

export GROQ_API_KEY=gsk_...
export CEREBRAS_API_KEY=csk-...

Ask something:

freellmpool ask "Explain the CAP theorem in one sentence."

or pipe context in:

cat error.log | freellmpool ask "What's the root cause here?"

Check what's wired up:

freellmpool providers

freellmpool catalog: 16 providers, 56 models

  ✓ ovh          OVHcloud AI Endpoints (keyless)  5 models   [configured]
  ✓ llm7         LLM7 (key optional)           1 models   [configured]
  · groq         Groq                          6 models   [set GROQ_API_KEY]
  · cerebras     Cerebras                      4 models   [set CEREBRAS_API_KEY]
  · nvidia       NVIDIA NIM                    5 models   [set NVIDIA_API_KEY]
  ...

Choosing a model or provider

By default freellmpool auto-picks the least-used provider you have. To pin a choice:

freellmpool models                       # list every provider/model id
freellmpool ask -m groq/llama-3.3-70b-versatile "hi"   # exact provider + model
freellmpool ask -m llama-3.3-70b-versatile "hi"        # that model on any provider
freellmpool ask -p cerebras,groq "hi"                  # restrict to these providers

Same idea through the proxy via the OpenAI model field: "auto", "groq", or "groq/llama-3.3-70b-versatile".

Providers in the box

Provider	Key env	Notes
Pollinations	—	keyless, works out of the box
OVHcloud AI Endpoints	—	keyless, works out of the box
LLM7	`LLM7_API_KEY`	key optional
Groq	`GROQ_API_KEY`	very fast
Cerebras	`CEREBRAS_API_KEY`	very fast, large daily cap
NVIDIA NIM	`NVIDIA_API_KEY`	big model catalog (build.nvidia.com)
OpenRouter	`OPENROUTER_API_KEY`	many `:free` models
Google Gemini	`GEMINI_API_KEY`	generous free tier
GitHub Models	`GITHUB_TOKEN`	any PAT works
Cloudflare Workers AI	`CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID`
Mistral	`MISTRAL_API_KEY`
Cohere	`COHERE_API_KEY`
SambaNova	`SAMBANOVA_API_KEY`
Z.ai / Zhipu GLM	`ZHIPU_API_KEY`
Ollama Cloud	`OLLAMA_API_KEY`
LongCat (Meituan)	`LONGCAT_API_KEY`

Full signup steps for each: docs/ACCOUNTS.md.

The killer feature: a drop-in OpenAI proxy

Run the gateway:

freellmpool proxy --port 8080

Now point any OpenAI-compatible app or SDK at it — no other changes:

export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=anything        # freellmpool ignores it

from openai import OpenAI

client = OpenAI()  # picks up OPENAI_BASE_URL
resp = client.chat.completions.create(
    model="auto",                      # or "groq", or "groq/llama-3.3-70b-versatile"
    messages=[{"role": "user", "content": "Say hi in French."}],
)
print(resp.choices[0].message.content)

The model field controls routing:

`model` value	Routes to
`auto` (or omitted)	any configured provider, least-used first
`groq`	any model on Groq
`groq/llama-3.3-70b-versatile`	that exact model
`llama-3.3-70b-versatile`	that model on any provider that has it

Use it as the free LLM backend for your AI agent

Coding agents and agent frameworks (aider, Continue, Cline, the OpenAI Agents SDK, LangChain, ...) almost all speak the OpenAI API — so they can run on pooled free inference through freellmpool, with failover when one provider rate-limits you mid-run (exactly when long agent loops tend to die):

freellmpool proxy --port 8080
export OPENAI_BASE_URL=http://localhost:8080/v1 OPENAI_API_KEY=anything
aider --model openai/auto          # or point any OpenAI-compatible tool here

The proxy supports stream: true (SSE) and tool/function-calling, so streaming chat UIs and tool-using agent loops work too.

Works with your tools

Anything that accepts a custom OpenAI base URL drops straight in — copy-paste configs in docs/INTEGRATIONS.md:

opencode · aider · Continue · Cline / Roo · Cursor / Windsurf · Codex CLI · Open WebUI · LibreChat · LangChain · LlamaIndex · Vercel AI SDK · llm CLI · shell-gpt · n8n

Use it as a library

from freellmpool import Pool

pool = Pool.from_default_config()
reply = pool.ask("Summarize the plot of Hamlet in 20 words.")
print(reply.text)
print(f"served by {reply.provider_id}/{reply.model}")

# Pooled free embeddings too — free RAG in a couple lines:
vecs = pool.embed(["first document", "second document"]).vectors

Correct by design

freellmpool aims to be a faithful OpenAI drop-in, so agents and SDKs don't trip over edge cases:

Fails over on errors (incl. provider tool-call errors) instead of returning a hard 400.
Accepts assistant messages with empty/null content + tool_calls (doesn't reject them).
Respects each provider's own per-day free limit in its quota tracking, not a single global guess.
Skips a rate-limited provider's other models for that request, with a cooldown so it stops hammering a throttled pool.

How routing works

For each request freellmpool builds the list of (provider, model) candidates you have keys for, orders them least-used-today first (providers already over their free daily hint sink to the bottom), then tries them in order until one returns a non-empty completion. Every success is recorded to a small per-day counter at ~/.config/freellmpool/quota.json (reset at UTC midnight). See docs/ARCHITECTURE.md for the full picture.

Adding or overriding providers

The built-in catalog lives in src/freellmpool/providers.toml. To add a provider or override a model list without forking, drop a providers.toml at ~/.config/freellmpool/providers.toml (or point FREELLMPOOL_CONFIG at one). Same-id entries override the built-ins; new ids are appended. See CONTRIBUTING.md for the (small) anatomy of a provider.

Comparison

	freellmpool	Calling each SDK by hand	A paid gateway
Free tiers pooled	✅ 16 providers	⚠️ you wire each one	❌
Automatic failover	✅	❌	✅
Quota tracking	✅ per-day	❌	varies
Drop-in OpenAI proxy	✅	❌	✅
Cost	$0	$0	💸
Dependencies	1 (`httpx`)	many	a service

Limitations (read this)

freellmpool is honest about what it is — a way to pool free tiers, not a frontier-model service:

No GPT-5 / Claude-Opus-class reasoning. Free tiers are smaller/faster models — great for triage, drafting, classification, tool-routing, and everyday coding; reach for a frontier model for the hardest reasoning.
Quality and capacity vary through the day as high-cap pools exhaust; daily limits reset at UTC midnight.
Free tiers change without notice. Endpoints, model ids, and limits drift — that's what the one-line providers.toml PRs are for.
Local-first, single-user. The proxy defaults to 127.0.0.1; if you bind it to a network interface, set a proxy key (--api-key). Not meant as a multi-tenant production gateway.
Respect the providers. This pools free tiers for personal projects and experimentation — don't abuse them, or we all lose them.

Status

freellmpool is 0.3 and moving fast. Provider endpoints and free-tier limits drift — if something breaks, please open an issue or send a one-line PR to providers.toml. Contributions of new free providers are especially welcome.

Found this useful?

⭐ Star the repo — it's the single biggest thing that helps others discover freellmpool, and it keeps the free-provider catalog maintained. New free providers and one-line limit fixes are always welcome (CONTRIBUTING.md).

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.10.1

Jun 3, 2026

0.9.3

Jun 3, 2026

0.9.2

Jun 3, 2026

0.9.1

Jun 3, 2026

0.9.0

Jun 3, 2026

0.8.1

Jun 3, 2026

0.8.0

Jun 3, 2026

0.7.0

Jun 3, 2026

This version

0.6.0

Jun 3, 2026

0.5.0

Jun 3, 2026

0.4.0

Jun 3, 2026

0.3.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freellmpool-0.6.0.tar.gz (53.7 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

freellmpool-0.6.0-py3-none-any.whl (34.9 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file freellmpool-0.6.0.tar.gz.

File metadata

Download URL: freellmpool-0.6.0.tar.gz
Upload date: Jun 3, 2026
Size: 53.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for freellmpool-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`e06ae75729c527de4f1c6792f04f5fa65c7ee0e6603726eeb512eb4aa4570ee4`
MD5	`07cbc8e5dac32389dd6383b2fd35db29`
BLAKE2b-256	`7ca2cf364723beeaaf6c0d169ddcd97a071033ce929b14cd036530ba70435dd3`

See more details on using hashes here.

File details

Details for the file freellmpool-0.6.0-py3-none-any.whl.

File metadata

Download URL: freellmpool-0.6.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 34.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for freellmpool-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`832bd15bcabee295a81f7ba1e9f711b20b7aaab76409a955245f9f78ac6ed9f4`
MD5	`6e10115939cf81ead69d60f9878f16e7`
BLAKE2b-256	`de7fd981c0a4810cad91be226e7eb0c150c1d7f46ec964c903c0574a5f8d2ad3`

See more details on using hashes here.

freellmpool 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

freellmpool — pool every free LLM API into one endpoint

Use it five ways

Install

Zero-config: it works with no keys at all

60-second quickstart (with keys)

Choosing a model or provider

Providers in the box

The killer feature: a drop-in OpenAI proxy

Use it as the free LLM backend for your AI agent

Works with your tools

Use it as a library

Correct by design

How routing works

Adding or overriding providers

Comparison

Limitations (read this)

Status

Found this useful?

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes