One always-up LLM client over free-tier providers (OpenRouter, Google AI Studio, NVIDIA NIM) with auto key-rotation, failover, circuit breaking and quota-aware routing.
Project description
freelm
One always-up LLM client over free-tier providers. Drop in your OpenRouter, Google AI Studio, and/or NVIDIA NIM keys, and freelm gives you a single chat call that auto-rotates keys, fails over across providers, paces itself to each tier's limits, and trips circuit breakers on dead keys — so your app keeps talking to an LLM even when one source rate-limits or dies.
Python first. JS/TS and Go ports planned (the core is spec-driven for portability).
Why
LLMs show up in nearly every project, and they cost money — but there's a lot of free capacity scattered across providers:
- OpenRouter — free models (
:free), ~50 req/day under $10 credit, ~1000/day at ≥$10. - Google AI Studio (Gemini) — generous free tier; Tier 1 (billing on) lifts limits hard.
- NVIDIA NIM (
build.nvidia.com) — many models free against build credits.
freelm pools them behind one fault-tolerant client.
Install
pip install freelm
Quick start
import freelm
llm = freelm.FreeLLM.from_env() # reads keys from environment
print(llm.text("Explain black holes in one sentence."))
Explicit config:
from freelm import FreeLLM, OpenRouter, GoogleAIStudio, NIM
llm = FreeLLM(
providers=[
OpenRouter("sk-or-...", tier="free"), # or tier="credit" if ≥ $10
GoogleAIStudio("AIza...", tier="free"), # or tier="tier1"
NIM("nvapi-..."),
],
strategy="quota_aware", # priority | round_robin | quota_aware | latency
)
resp = llm.chat(
[{"role": "user", "content": "Write a haiku about failover."}],
model="chat:fast", # virtual model, see below
)
print(resp.text, "via", resp.provider)
Async is symmetric:
from freelm import AsyncFreeLLM
async with AsyncFreeLLM.from_env() as llm:
print(await llm.text("hi"))
Drop-in OpenAI shim
# from openai import OpenAI
from freelm.compat import OpenAI
client = OpenAI() # backed by FreeLLM.from_env()
r = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "hi"}],
)
print(r.choices[0].message.content)
Environment variables
| Provider | Key vars (first match wins) | Tier var |
|---|---|---|
| OpenRouter | OPENROUTER_API_KEY / FREELM_OPENROUTER_KEYS |
FREELM_OPENROUTER_TIER (free|credit) |
| Google AI Studio | GEMINI_API_KEY / GOOGLE_API_KEY / GOOGLE_AI_STUDIO_KEY / FREELM_GOOGLE_KEYS |
FREELM_GOOGLE_TIER (free|tier1) |
| NVIDIA NIM | NVIDIA_API_KEY / NIM_API_KEY / FREELM_NIM_KEYS |
FREELM_NIM_TIER (free) |
Multiple keys per provider: comma-separate them.
Virtual models
Names differ per provider, so ask by intent and freelm maps to a concrete model:
| Alias | Meaning |
|---|---|
auto / chat |
any available chat model (registry order) |
chat:large / large |
a larger/stronger model |
chat:fast / fast |
a fast/cheap model |
chat:small / small |
smallest model |
vendor/model-id |
passthrough — use exactly this model |
Override the table per provider with models=[ModelSpec(...)].
Dynamic model discovery
Free model IDs churn constantly, so freelm doesn't trust its hardcoded list. For OpenRouter (on by default), it queries GET /models on first use, derives tags (large/fast/small, plus tools/vision/reasoning from supported_parameters), and caches the list to disk.
Resolution order: live API → disk cache → hardcoded fallback (so it still works offline / key-less).
from freelm import list_free_models
for m in list_free_models()[:5]: # live OpenRouter free models, cached
print(m.id, m.tags, m.ctx)
Control it:
OpenRouter("sk-or-...", discover=True, discover_free_only=True, cache_ttl=3600)
GoogleAIStudio("AIza...", discover=True) # opt-in for other providers' /models
llm.refresh_models() # force re-fetch on next call
| Env var | Default | Meaning |
|---|---|---|
FREELM_CACHE_DIR |
~/.cache/freelm |
where the model cache lives (file is 0600) |
FREELM_CACHE_TTL |
3600 |
cache lifetime in seconds |
How "always-up" works
- Key pool per provider, round-robined to spread load.
- Failover chain: key → next key → next provider until one succeeds.
- Circuit breaker per key: opens after repeated failures, half-opens after a cooldown — no hammering a dead key.
- Retry classification:
429→ cool the key & rotate;5xx/timeout → breaker + backoff;401/403→ disable the key;4xxmodel errors → try another model/provider; other4xx→ surfaced as a caller bug. - Quota guard: per-key requests/minute (token bucket) + requests/day counter, so a key predicted to be exhausted is skipped before you waste a call.
wait=True(optional): briefly sleep until a key frees up instead of failing, bounded bymax_wait.
Inspect live state any time:
for row in llm.health():
print(row) # provider, key (masked), ready, breaker, rpd_used, last_error, latency
Roadmap
- v1.1 — streaming (SSE normalization across providers)
- v1.2 — persistent quota tracking (sqlite/json) + tighter tier pacing
- v1.3 — tool / function-calling normalization
- v2 — embeddings, vision; JS/TS and Go ports
License
MIT © Shahriar Labs
Free-tier model lists change often —
freelmdiscovers OpenRouter models live and caches them, so you rarely touch the hardcoded list. Tier rate-limit numbers are still heuristic defaults; overriderpm/rpd/tieras providers evolve.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file freelm-0.1.0.tar.gz.
File metadata
- Download URL: freelm-0.1.0.tar.gz
- Upload date:
- Size: 24.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8333fae659249be12674a8e87b499f64f290a39186ffe3b73e730cca955c33ad
|
|
| MD5 |
0c2aea9aa627202c8b098744849733b4
|
|
| BLAKE2b-256 |
6b921806b599a2c5352699aa6c218bef99c4f5f0e35ed439c527fb6af9444e31
|
File details
Details for the file freelm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: freelm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e6934bcd1766613aa156cc632f368f4c8d4233284b9743ad87c4dea994def20
|
|
| MD5 |
3e95cdba581e26e386d82482ac9bc6e6
|
|
| BLAKE2b-256 |
ee134d1e7a98e6ccf818f477269b4198e70a3c41a24cfe3d5850ea30d01eca07
|