Skip to main content

Pool the free tiers of 15+ LLM providers behind one OpenAI-compatible endpoint. Free, zero-config, with automatic failover and quota tracking.

Project description

freellmpool — pool every free LLM API into one endpoint

A free, OpenAI-compatible LLM gateway that pools the free tiers of 16 providers (Groq, Cerebras, NVIDIA NIM, Gemini, OpenRouter, GitHub Models, Cloudflare & more) behind one /v1 endpoint — with automatic failover and quota tracking. Works out of the box with zero API keys.

PyPI CI License: MIT Python 3.11+

One free tier is a toy. Sixteen, stacked, are tens of thousands of free requests a day. Point your OpenAI client at freellmpool and stop paying for a hobby project's inference.

Groq, Cerebras, NVIDIA NIM, Google Gemini, OpenRouter, GitHub Models, Cloudflare Workers AI, Mistral, Cohere, and more each hand out a generous free tier — but each has its own SDK, rate limits, and daily cap. freellmpool puts all of them into one pool:

  • 🔌 One OpenAI-compatible endpoint. Point any OpenAI SDK / tool at freellmpool and it just works — /v1/chat/completions, /v1/models, and a /v1/responses shim for Codex CLI & agents.
  • 🟢 Zero config. Works with no API keys at all — keyless providers are built in. pip installask → done.
  • 🔁 Automatic failover. Rate-limited or 5xx on one provider? freellmpool transparently rolls to the next, with a cooldown so it stops hammering a throttled pool.
  • 📊 Quota-aware routing. Spreads load least-used-first and respects each free daily limit, so you squeeze the most out of every tier.
  • 🤖 Built for agents. Streaming (SSE), a Codex/Responses shim, and mid-run failover — exactly where long agent loops usually die.
  • 🪶 Tiny. Pure-Python, one dependency (httpx). The proxy runs on the standard library. No keys are ever stored in the repo.

Install

pip install freellmpool      # or: pipx install freellmpool

Zero-config: it works with no keys at all

Three providers in the catalog need no signup (Pollinations and OVHcloud are keyless; LLM7's key is optional), so this works the moment you install:

pip install freellmpool
freellmpool ask "Explain the CAP theorem in one sentence."

Add provider keys (below) to unlock more models, higher limits, and better failover.

60-second quickstart (with keys)

  1. Grab one or more free API keys — all free, no credit card. You only need one to start (Groq and Cerebras are the fastest to sign up for). 👉 docs/ACCOUNTS.md has 1-minute, click-by-click steps for every provider.

    Provider Get a key
    Groq https://console.groq.com/keys
    Cerebras https://cloud.cerebras.ai
    OpenRouter https://openrouter.ai/keys
    Google Gemini https://aistudio.google.com/apikey
    GitHub Models any GitHub PAT
  2. Export the ones you have (see .env.example for all of them):

    export GROQ_API_KEY=gsk_...
    export CEREBRAS_API_KEY=csk-...
    
  3. Ask something:

    freellmpool ask "Explain the CAP theorem in one sentence."
    

    or pipe context in:

    cat error.log | freellmpool ask "What's the root cause here?"
    

Check what's wired up:

freellmpool providers
freellmpool catalog: 16 providers, 56 models

  ✓ ovh          OVHcloud AI Endpoints (keyless)  5 models   [configured]
  ✓ llm7         LLM7 (key optional)           1 models   [configured]
  · groq         Groq                          6 models   [set GROQ_API_KEY]
  · cerebras     Cerebras                      4 models   [set CEREBRAS_API_KEY]
  · nvidia       NVIDIA NIM                    5 models   [set NVIDIA_API_KEY]
  ...

Choosing a model or provider

By default freellmpool auto-picks the least-used provider you have. To pin a choice:

freellmpool models                       # list every provider/model id
freellmpool ask -m groq/llama-3.3-70b-versatile "hi"   # exact provider + model
freellmpool ask -m llama-3.3-70b-versatile "hi"        # that model on any provider
freellmpool ask -p cerebras,groq "hi"                  # restrict to these providers

Same idea through the proxy via the OpenAI model field: "auto", "groq", or "groq/llama-3.3-70b-versatile".

Providers in the box

Provider Key env Notes
Pollinations keyless, works out of the box
OVHcloud AI Endpoints keyless, works out of the box
LLM7 LLM7_API_KEY key optional
Groq GROQ_API_KEY very fast
Cerebras CEREBRAS_API_KEY very fast, large daily cap
NVIDIA NIM NVIDIA_API_KEY big model catalog (build.nvidia.com)
OpenRouter OPENROUTER_API_KEY many :free models
Google Gemini GEMINI_API_KEY generous free tier
GitHub Models GITHUB_TOKEN any PAT works
Cloudflare Workers AI CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID
Mistral MISTRAL_API_KEY
Cohere COHERE_API_KEY
SambaNova SAMBANOVA_API_KEY
Z.ai / Zhipu GLM ZHIPU_API_KEY
Ollama Cloud OLLAMA_API_KEY
LongCat (Meituan) LONGCAT_API_KEY

Full signup steps for each: docs/ACCOUNTS.md.

The killer feature: a drop-in OpenAI proxy

Run the gateway:

freellmpool proxy --port 8080

Now point any OpenAI-compatible app or SDK at it — no other changes:

export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=anything        # freellmpool ignores it
from openai import OpenAI

client = OpenAI()  # picks up OPENAI_BASE_URL
resp = client.chat.completions.create(
    model="auto",                      # or "groq", or "groq/llama-3.3-70b-versatile"
    messages=[{"role": "user", "content": "Say hi in French."}],
)
print(resp.choices[0].message.content)

The model field controls routing:

model value Routes to
auto (or omitted) any configured provider, least-used first
groq any model on Groq
groq/llama-3.3-70b-versatile that exact model
llama-3.3-70b-versatile that model on any provider that has it

Use it as the free LLM backend for your AI agent

Coding agents and agent frameworks (aider, Continue, Cline, the OpenAI Agents SDK, LangChain, ...) almost all speak the OpenAI API — so they can run on pooled free inference through freellmpool, with failover when one provider rate-limits you mid-run (exactly when long agent loops tend to die):

freellmpool proxy --port 8080
export OPENAI_BASE_URL=http://localhost:8080/v1 OPENAI_API_KEY=anything
aider --model openai/auto          # or point any OpenAI-compatible tool here

The proxy supports stream: true (Server-Sent Events), so streaming chat UIs and agent loops work too. Full integration snippets (aider, LangChain, Continue/Cline, OpenAI Agents SDK) are in docs/AGENTS.md.

Use it as a library

from freellmpool import Pool

pool = Pool.from_default_config()
reply = pool.ask("Summarize the plot of Hamlet in 20 words.")
print(reply.text)
print(f"served by {reply.provider_id}/{reply.model}")

How routing works

For each request freellmpool builds the list of (provider, model) candidates you have keys for, orders them least-used-today first (providers already over their free daily hint sink to the bottom), then tries them in order until one returns a non-empty completion. Every success is recorded to a small per-day counter at ~/.config/freellmpool/quota.json (reset at UTC midnight). See docs/ARCHITECTURE.md for the full picture.

Adding or overriding providers

The built-in catalog lives in src/freellmpool/providers.toml. To add a provider or override a model list without forking, drop a providers.toml at ~/.config/freellmpool/providers.toml (or point FREELLMPOOL_CONFIG at one). Same-id entries override the built-ins; new ids are appended. See CONTRIBUTING.md for the (small) anatomy of a provider.

Comparison

freellmpool Calling each SDK by hand A paid gateway
Free tiers pooled ✅ 16 providers ⚠️ you wire each one
Automatic failover
Quota tracking ✅ per-day varies
Drop-in OpenAI proxy
Cost $0 $0 💸
Dependencies 1 (httpx) many a service

Limitations (read this)

freellmpool is honest about what it is — a way to pool free tiers, not a frontier-model service:

  • No GPT-5 / Claude-Opus-class reasoning. Free tiers are smaller/faster models — great for triage, drafting, classification, tool-routing, and everyday coding; reach for a frontier model for the hardest reasoning.
  • Quality and capacity vary through the day as high-cap pools exhaust; daily limits reset at UTC midnight.
  • Free tiers change without notice. Endpoints, model ids, and limits drift — that's what the one-line providers.toml PRs are for.
  • Local-first, single-user. The proxy defaults to 127.0.0.1; if you bind it to a network interface, set a proxy key (--api-key). Not meant as a multi-tenant production gateway.
  • Respect the providers. This pools free tiers for personal projects and experimentation — don't abuse them, or we all lose them.

Status

freellmpool is 0.3 and moving fast. Provider endpoints and free-tier limits drift — if something breaks, please open an issue or send a one-line PR to providers.toml. Contributions of new free providers are especially welcome.

Found this useful?

Star the repo — it's the single biggest thing that helps others discover freellmpool, and it keeps the free-provider catalog maintained. New free providers and one-line limit fixes are always welcome (CONTRIBUTING.md).

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freellmpool-0.3.0.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

freellmpool-0.3.0-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file freellmpool-0.3.0.tar.gz.

File metadata

  • Download URL: freellmpool-0.3.0.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for freellmpool-0.3.0.tar.gz
Algorithm Hash digest
SHA256 84652a753ce79ae1198a5e9942e9437487d19aa5bc99126f67ee0fdbf7352147
MD5 6c7bc6373182850adcab097c026ca292
BLAKE2b-256 83f8fa4698e07fad467b6e69458dd7b5ef604afe54018a76ec4633d472627bbf

See more details on using hashes here.

File details

Details for the file freellmpool-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: freellmpool-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for freellmpool-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 edde3db938449f13d60c813964a6932df1e6edaaf89841a0f26c0fea1abe2325
MD5 37552c648a68e8f825db294ce6e14f7a
BLAKE2b-256 509e12ccd8d2f306615a4afe51570acab281bcc1ce1d1657083bbe140719f499

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page