Pool the free tiers of 16 LLM providers (300+ models) behind one OpenAI-compatible endpoint. Free, zero-config, with automatic failover and quota tracking.

These details have not been verified by PyPI

Project links

Project description

freellmpool

Pool the free tiers of 16 LLM providers (200+ live-validated models) behind one OpenAI-compatible endpoint — as a CLI, a Python library, or a local proxy. Works with no API keys.

demo

Groq, Cerebras, NVIDIA NIM, Google Gemini, OpenRouter, GitHub Models, Cloudflare, Mistral, Cohere and others each give away a free tier — but each has its own SDK, rate limits, and daily cap. freellmpool puts them in one pool: it sends each request to a provider you have access to, fails over to the next when one is rate limited or down, and tracks per-day usage so you get the most out of every tier.

Two providers (Pollinations and OVHcloud) need no API key, so a fresh install answers immediately:

$ pip install freellmpool
$ freellmpool ask "Explain the CAP theorem in one sentence."
A distributed system can guarantee at most two of consistency, availability, and
partition tolerance at the same time.

Add keys for the other providers to unlock more models and higher limits.

Run a coding agent on free models

freellmpool's proxy speaks both the OpenAI and the Anthropic API, so coding agents run against pooled free tiers with no code changes — just point them at the proxy:

freellmpool proxy                       # starts http://localhost:8080
freellmpool code claude                 # prints the one-line setup for Claude Code
# (also: codex, aider, cline, continue, cursor, opencode)

Your existing OpenAI/Anthropic apps work the same way — set OPENAI_BASE_URL (or ANTHROPIC_BASE_URL) to the proxy and keep your code unchanged.

New in 0.11: capacity tools — freellmpool capacity status shows which free tiers are usable right now, freellmpool providers health live-probes them, and freellmpool keys add walks you through configuring more (see Capacity & provider health and docs/CAPACITY.md).

New in 0.10: an async API (AsyncPool), an MCP server (freellmpool mcp), latency-aware routing with freellmpool benchmark, observability hooks, and a plugin system for custom providers. See the changelog.

Install

pip install freellmpool      # or: pipx install freellmpool

Only dependency is httpx. Python 3.11+.

Command line

freellmpool ask "Write a haiku about sqlite"
git diff | freellmpool ask "Write a commit message for this"
freellmpool providers        # which providers are configured
freellmpool models           # every provider/model id

Pin a provider or model; common OpenAI/Anthropic model names are mapped to a free equivalent so existing scripts keep working:

freellmpool ask -m groq/llama-3.3-70b-versatile "hi"
freellmpool ask -p cerebras,groq "hi"
freellmpool ask -m gpt-4o-mini "hi"      # routed to a free model

As a proxy

Run a local server that speaks the OpenAI API, then point any OpenAI-compatible tool at it:

freellmpool proxy
export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=unused

from openai import OpenAI
client = OpenAI()
print(client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "hi"}],
).choices[0].message.content)

The proxy also implements the OpenAI Responses API (for the Codex CLI) and the Anthropic Messages API (for Claude Code), so coding agents can run on free models too. freellmpool code <agent> prints the exact setup:

freellmpool code aider       # also: claude, codex, cline, continue, cursor, opencode

Endpoints: /v1/chat/completions (token streaming, tool calling), /v1/embeddings, /v1/responses, /v1/messages, /v1/models, and a /dashboard page showing usage. Setup snippets for specific tools are in docs/INTEGRATIONS.md and docs/AGENTS.md.

As a library

from freellmpool import Pool

pool = Pool.from_default_config()
reply = pool.ask("Summarize the plot of Hamlet in 20 words.")
print(reply.text, "—", reply.provider_id)

vectors = pool.embed(["first document", "second document"]).vectors

Async is the same API with await:

from freellmpool import AsyncPool

async with AsyncPool.from_default_config() as pool:
    reply = await pool.aask("Summarize the plot of Hamlet in 20 words.")

Pass on_event=... to either pool to receive structured routing events (attempt/success/error/cooldown/exhausted) for logging or tracing. Add your own endpoint with register_provider(...), or a new request shape with register_adapter(name, fn).

Benchmark your providers

freellmpool benchmark times one call per configured provider and prints latency and success, so you can see which of your free tiers are fastest right now. The router learns the same latency/success signal from real traffic as it runs; set FREELLMPOOL_ROUTING=fast to prefer the lowest-latency provider instead of the default least-used-first.

$ freellmpool benchmark
  provider/model            status   latency  note
  cerebras/llama-3.3-70b    ok        180 ms  6 tok
  groq/llama-3.3-70b        ok        240 ms  6 tok
  ovh/Meta-Llama-3_3-70B    FAIL           -  HTTP 429

Capacity & provider health

Free tiers drift through the day — keys expire, providers go down, daily caps fill. These commands tell you what's usable right now and what to set up next:

freellmpool capacity status --target 5   # who's healthy / near quota / missing a key
freellmpool providers health             # send one tiny request to each, time it
freellmpool keys checklist --target 5    # which keys to add to reach N healthy providers
freellmpool keys add groq                # configure a key (and record metadata)

capacity status is local-first: it reads your catalog, environment, and per-day quota counters and labels each provider healthy, low_quota, exhausted, invalid_key, or missing. It also syncs an advisory external catalog (mnfst/awesome-free-llm-apis) to suggest free providers you could add — advisory only; your providers.toml stays the source of truth for routing. keys add <name> can even import a suggested provider from that catalog or create an OpenAI-compatible stub and autodiscover its models. The proxy /dashboard shows the same capacity at a glance. Full reference: docs/CAPACITY.md.

As an MCP server

freellmpool mcp runs a Model Context Protocol server over stdio, so Claude Desktop, Claude Code, or Cursor can hand subtasks to free models. See docs/MCP.md. A server.json is included for the MCP registry.

In Simon Willison's `llm` CLI

There's a plugin: llm install llm-freellmpool → llm -m freellmpool "..." with no API key. Source: 0xzr/llm-freellmpool.

Provider keys

freellmpool reads keys from the environment and uses whatever is set. None are required. Step-by-step signup links for each (all free, no card) are in docs/ACCOUNTS.md.

Provider	Env var	Notes
Pollinations	—	no key needed
OVHcloud	—	no key needed (anonymous tier)
LLM7	`LLM7_API_KEY`	optional
Groq	`GROQ_API_KEY`	fast
Cerebras	`CEREBRAS_API_KEY`	fast, large daily cap
NVIDIA NIM	`NVIDIA_API_KEY`
OpenRouter	`OPENROUTER_API_KEY`	free models
Google Gemini	`GEMINI_API_KEY`
GitHub Models	`GITHUB_TOKEN`	any PAT
Cloudflare	`CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID`
Mistral, Cohere, SambaNova, Z.ai, Ollama Cloud, LongCat	see `.env.example`

A config.toml (see config.toml.example) can hold keys, model aliases, and settings instead of env vars.

How routing works

For each request, freellmpool builds the list of (provider, model) pairs you have access to, orders them least-used-first (so load spreads across tiers), and tries them in order until one returns a non-empty result. A provider that returns a 429 is set aside for a cooldown window. Daily counts are kept in ~/.config/freellmpool/quota.json and reset at UTC midnight.

Every call records latency and success per provider. A provider that is currently failing sinks to the back automatically; with FREELLMPOOL_ROUTING=fast the fastest measured provider goes first instead. freellmpool benchmark warms these metrics on demand.

Context windows. Free models often have small context windows. freellmpool never truncates your input; instead, when a model rejects a request as too long, it learns that model's limit and stops routing oversized requests there, escalating only to larger-window models. If nothing fits it raises a clear ContextWindowExceeded (with the estimated input size) instead of a generic failure — over the proxy that's a 413. You can declare a model's window with context = N in providers.toml to skip it proactively.

Architecture notes: docs/ARCHITECTURE.md.

Limitations

Free-tier models are smaller than frontier models. They're good for drafting, summarizing, classification, triage, and everyday coding — not a replacement for GPT-class reasoning on hard problems.
Quality and capacity vary through the day as high-cap tiers exhaust; limits reset at UTC midnight.
Free tiers change without notice. When a model id or limit goes stale, a one-line PR to providers.toml fixes it for everyone.
The proxy is meant for local/single-user use. It binds to 127.0.0.1 by default; if you expose it, set a key (--api-key).
The Claude Code / Anthropic path is experimental (text and tool use; no vision).
These are free tiers shared by everyone — don't abuse them.

How it compares

Tool	What it is	Install	Keyless start	CLI / library / proxy / MCP
freellmpool	Pools many providers' free tiers	`pip install`	Yes (2 providers)	All four
OpenRouter	Hosted paid aggregator (some free models)	API key	No	API only
LiteLLM	Multi-provider SDK/proxy (bring your own keys)	`pip install`	No	Library + proxy
Self-hosted free-API servers	A server you deploy	Docker + config	No	Server only

freellmpool's niche is the keyless, pip-installable client for squeezing the hosted free tiers — not a server you deploy, and not a paid aggregator.

FAQ

Is there a free, OpenAI-compatible LLM API gateway? Yes — freellmpool is a free, MIT-licensed gateway that exposes one OpenAI-compatible endpoint backed by the free tiers of 16 providers. pip install freellmpool and point any OpenAI client at the local proxy.

How do I use multiple free LLM APIs at once? freellmpool pools them: each request goes to a provider you have access to, fails over to the next when one is rate-limited or down, and tracks per-day usage so load spreads across tiers.

Can I run Claude Code or Codex on free models? Yes — the proxy speaks both the OpenAI and Anthropic APIs. Set OPENAI_BASE_URL or ANTHROPIC_BASE_URL to the proxy and run Codex, Claude Code, aider, Cline, Continue, or Cursor unchanged. See freellmpool code <agent>. (Claude Code path is experimental: text + tools, no vision.)

Do I need an API key? No — Pollinations and OVHcloud work with no key, so a fresh install answers immediately. Add free keys for the other providers for more models and higher limits.

Is it free and open source? Yes, MIT-licensed. More at the project page.

Contributing

New providers and fixes to stale limits are the most useful contributions, and both are usually a small change to providers.toml. See CONTRIBUTING.md. Tests run with no network access:

pip install -e ".[dev]" && pytest && ruff check src tests

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.11.2

Jun 11, 2026

0.11.1

Jun 11, 2026

This version

0.11.0

Jun 6, 2026

0.10.1

Jun 3, 2026

0.9.3

Jun 3, 2026

0.9.2

Jun 3, 2026

0.9.1

Jun 3, 2026

0.9.0

Jun 3, 2026

0.8.1

Jun 3, 2026

0.8.0

Jun 3, 2026

0.7.0

Jun 3, 2026

0.6.0

Jun 3, 2026

0.5.0

Jun 3, 2026

0.4.0

Jun 3, 2026

0.3.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freellmpool-0.11.0.tar.gz (152.1 kB view details)

Uploaded Jun 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

freellmpool-0.11.0-py3-none-any.whl (84.4 kB view details)

Uploaded Jun 6, 2026 Python 3

File details

Details for the file freellmpool-0.11.0.tar.gz.

File metadata

Download URL: freellmpool-0.11.0.tar.gz
Upload date: Jun 6, 2026
Size: 152.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for freellmpool-0.11.0.tar.gz
Algorithm	Hash digest
SHA256	`27c191fb4219f84fd3991cc18101060de99b547d3907eacdcbcf6c5eaed30225`
MD5	`e3d163b2699c876d45e4ab4d1e729705`
BLAKE2b-256	`2a9f87cc6c632e86603b355f7b9077c9829cae7acf8dab28142305fd18a0352b`

See more details on using hashes here.

File details

Details for the file freellmpool-0.11.0-py3-none-any.whl.

File metadata

Download URL: freellmpool-0.11.0-py3-none-any.whl
Upload date: Jun 6, 2026
Size: 84.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for freellmpool-0.11.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`253ef27305a42173fe01b19f7368b7b2a938e60e089ef9b977f7778eae2c3d80`
MD5	`a1efafcfabe4f490b70081ba6277c88a`
BLAKE2b-256	`378f50583894814ea8d635588c1ff042a46d6d50e33ed76c63d572d0dd27e422`

See more details on using hashes here.

freellmpool 0.11.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

freellmpool

Run a coding agent on free models

Install

Command line

As a proxy

As a library

Benchmark your providers

Capacity & provider health

As an MCP server

In Simon Willison's llm CLI

Provider keys

How routing works

Limitations

How it compares

FAQ

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

In Simon Willison's `llm` CLI