One always-up LLM client over free-tier providers (OpenRouter, Google AI Studio, NVIDIA NIM) with auto key-rotation, failover, circuit breaking and quota-aware routing.

These details have not been verified by PyPI

Project links

Project description

freelm

One always-up LLM client over free-tier providers. Drop in your OpenRouter, Google AI Studio, and/or NVIDIA NIM keys, and freelm gives you a single chat call that auto-rotates keys, fails over across providers, paces itself to each tier's limits, and trips circuit breakers on dead keys — so your app keeps talking to an LLM even when one source rate-limits or dies.

Python first. JS/TS and Go ports planned (the core is spec-driven for portability).

Why

LLMs show up in nearly every project, and they cost money — but there's a lot of free capacity scattered across providers:

OpenRouter — free models (:free), ~50 req/day under $10 credit, ~1000/day at ≥$10.
Google AI Studio (Gemini) — generous free tier; Tier 1 (billing on) lifts limits hard.
NVIDIA NIM (build.nvidia.com) — many models free against build credits.

freelm pools them behind one fault-tolerant client.

Install

pip install freelm

Quick start

import freelm

llm = freelm.FreeLLM.from_env()          # reads keys from environment
print(llm.text("Explain black holes in one sentence."))

Explicit config:

from freelm import FreeLLM, OpenRouter, GoogleAIStudio, NIM

llm = FreeLLM(
    providers=[
        OpenRouter("sk-or-...", tier="free"),       # or tier="credit" if ≥ $10
        GoogleAIStudio("AIza...", tier="free"),      # or tier="tier1"
        NIM("nvapi-..."),
    ],
    strategy="quota_aware",   # priority | round_robin | quota_aware | latency
)

resp = llm.chat(
    [{"role": "user", "content": "Write a haiku about failover."}],
    model="chat:fast",        # virtual model, see below
)
print(resp.text, "via", resp.provider)

Async is symmetric:

from freelm import AsyncFreeLLM

async with AsyncFreeLLM.from_env() as llm:
    print(await llm.text("hi"))

Drop-in OpenAI shim

# from openai import OpenAI
from freelm.compat import OpenAI

client = OpenAI()                          # backed by FreeLLM.from_env()
r = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "hi"}],
)
print(r.choices[0].message.content)

Environment variables

Provider	Key vars (first match wins)	Tier var
OpenRouter	`OPENROUTER_API_KEY` / `FREELM_OPENROUTER_KEYS`	`FREELM_OPENROUTER_TIER` (`free`\|`credit`)
Google AI Studio	`GEMINI_API_KEY` / `GOOGLE_API_KEY` / `GOOGLE_AI_STUDIO_KEY` / `FREELM_GOOGLE_KEYS`	`FREELM_GOOGLE_TIER` (`free`\|`tier1`)
NVIDIA NIM	`NVIDIA_API_KEY` / `NIM_API_KEY` / `FREELM_NIM_KEYS`	`FREELM_NIM_TIER` (`free`)

Multiple keys per provider: comma-separate them.

Virtual models

Names differ per provider, so ask by intent and freelm maps to a concrete model:

Alias	Meaning
`auto` / `chat`	any available chat model (registry order)
`chat:large` / `large`	a larger/stronger model
`chat:fast` / `fast`	a fast/cheap model
`chat:small` / `small`	smallest model
`vendor/model-id`	passthrough — use exactly this model

Override the table per provider with models=[ModelSpec(...)].

Dynamic model discovery

Free model IDs churn constantly, so freelm doesn't trust its hardcoded list. For OpenRouter (on by default), it queries GET /models on first use, derives tags (large/fast/small, plus tools/vision/reasoning from supported_parameters), and caches the list to disk.

Resolution order: live API → disk cache → hardcoded fallback (so it still works offline / key-less).

from freelm import list_free_models

for m in list_free_models()[:5]:        # live OpenRouter free models, cached
    print(m.id, m.tags, m.ctx)

Control it:

OpenRouter("sk-or-...", discover=True, discover_free_only=True, cache_ttl=3600)
GoogleAIStudio("AIza...", discover=True)   # opt-in for other providers' /models

llm.refresh_models()                        # force re-fetch on next call

Env var	Default	Meaning
`FREELM_CACHE_DIR`	`~/.cache/freelm`	where the model cache lives (file is `0600`)
`FREELM_CACHE_TTL`	`3600`	cache lifetime in seconds

How "always-up" works

Key pool per provider, round-robined to spread load.
Failover chain: key → next key → next provider until one succeeds.
Circuit breaker per key: opens after repeated failures, half-opens after a cooldown — no hammering a dead key.
Retry classification: 429 → cool the key & rotate; 5xx/timeout → breaker + backoff; 401/403 → disable the key; 4xx model errors → try another model/provider; other 4xx → surfaced as a caller bug.
Quota guard: per-key requests/minute (token bucket) + requests/day counter, so a key predicted to be exhausted is skipped before you waste a call.
wait=True (optional): briefly sleep until a key frees up instead of failing, bounded by max_wait.

Inspect live state any time:

for row in llm.health():
    print(row)   # provider, key (masked), ready, breaker, rpd_used, last_error, latency

Roadmap

v1.1 — streaming (SSE normalization across providers)
v1.2 — persistent quota tracking (sqlite/json) + tighter tier pacing
v1.3 — tool / function-calling normalization
v2 — embeddings, vision; JS/TS and Go ports

License

MIT © Shahriar Labs

Free-tier model lists change often — freelm discovers OpenRouter models live and caches them, so you rarely touch the hardcoded list. Tier rate-limit numbers are still heuristic defaults; override rpm/rpd/tier as providers evolve.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.2

Jun 7, 2026

0.2.1

Jun 7, 2026

0.2.0

Jun 7, 2026

0.1.1

Jun 7, 2026

This version

0.1.0

Jun 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freelm-0.1.0.tar.gz (24.0 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

freelm-0.1.0-py3-none-any.whl (28.0 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file freelm-0.1.0.tar.gz.

File metadata

Download URL: freelm-0.1.0.tar.gz
Upload date: Jun 7, 2026
Size: 24.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for freelm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8333fae659249be12674a8e87b499f64f290a39186ffe3b73e730cca955c33ad`
MD5	`0c2aea9aa627202c8b098744849733b4`
BLAKE2b-256	`6b921806b599a2c5352699aa6c218bef99c4f5f0e35ed439c527fb6af9444e31`

See more details on using hashes here.

File details

Details for the file freelm-0.1.0-py3-none-any.whl.

File metadata

Download URL: freelm-0.1.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 28.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for freelm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e6934bcd1766613aa156cc632f368f4c8d4233284b9743ad87c4dea994def20`
MD5	`3e95cdba581e26e386d82482ac9bc6e6`
BLAKE2b-256	`ee134d1e7a98e6ccf818f477269b4198e70a3c41a24cfe3d5850ea30d01eca07`

See more details on using hashes here.

freelm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

freelm

Why

Install

Quick start

Drop-in OpenAI shim

Environment variables

Virtual models

Dynamic model discovery

How "always-up" works

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes