Skip to main content

Ollama for free cloud inference. Local OpenAI-compatible gateway routing across OpenRouter, Groq, NVIDIA NIM, Cloudflare Workers AI, HuggingFace, Cerebras, and your own Ollama with automatic failover.

Project description

FreeRide

The OpenAI-compatible gateway for every free-tier provider.

freeride-now-supports-claudecode

One local endpoint that fans out across OpenRouter, Groq, NVIDIA NIM, HuggingFace, Cerebras, Cloudflare Workers AI, and your own Ollama. Hit a rate limit, fail over to the next provider. Your agent never knows.

Also wraps Claude Code, OpenAI Codex, and Google Gemini CLI — run all three without paying any of their vendors.

102M+ tokens served in 35 days. $0 spent. Routed through community free-tier keys via this gateway. Daily traffic: free-ride.xyz/models

curl -sSL https://api.free-ride.xyz/install.sh | sh
freeride run claude    # or: freeride run codex / freeride run gemini

That's it. No accounts, no subscriptions, no FreeRide cloud. Local-first, BYO keys, your machine talks to providers directly.


Install

macOS / Linux:

curl -sSL https://api.free-ride.xyz/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://api.free-ride.xyz/install.ps1 | iex"

The installer picks up uv, then pipx, then plain pip — whichever is on your system. Or install from source.

freeride init           # interactive — collects keys, writes ~/.freeride/.env
freeride serve          # gateway listens on localhost:11343

Get keys (any one is enough; more = better failover):

Provider Free tier Get a key
OpenRouter rotating free models openrouter.ai/keys
Groq daily token cap console.groq.com/keys
NVIDIA NIM credits per account build.nvidia.com
HuggingFace $0.10/mo Free, $2/mo PRO huggingface.co/settings/tokens
Cerebras RPM / TPM caps cloud.cerebras.ai
Cloudflare Workers AI 10K neurons/day dash.cloudflare.com
Ollama (local) no quota install from ollama.com

Run a coding agent — free

Three of the major coding CLIs ship a freeride run wrapper that works with no per-vendor key and no login. The gateway translates between each CLI's native wire protocol and our routing layer; you get the polished agent UX of each CLI, paid for entirely by free-tier providers.

Claude Code

freeride run claude

Inside the session, switch routing per request via /model:

You type What happens
/model claude-opus-4-7 Your Pro/Max subscription answers (passthrough to api.anthropic.com) — only if claude login has run
/model freeride/free Free providers answer; smart-router picks the model
/model freeride/fast Free; prefers Groq (low TTFT)
/model freeride/quality Free; prefers OpenRouter (widest catalog)
/model freeride/coding Free; pinned to a code-tuned model that reliably emits tool_use blocks

Full guide: docs/agents/claude-code.md.

OpenAI Codex

freeride run codex

Whatever model the CLI picks (gpt-5-codex, gpt-5, etc.) is routed to a free upstream provider. The gateway translates the Responses-API wire format (with full SSE event protocol — response.output_item.addedoutput_text.deltaoutput_item.doneresponse.completed) so the CLI parses everything natively.

Note: codex uses bubblewrap for shell-tool sandboxing; on systems without it, file/shell tool calls fail (the model still works). Full guide: docs/agents/codex.md.

Google Gemini CLI

freeride run gemini

Any gemini-* model name routes to a free upstream provider. Translator handles Google's {contents, tools, generationConfig} shape both directions. Full guide: docs/agents/gemini.md.

Any other agent / SDK

# Aider / Continue.dev / hermes / your-own-tool — anything that speaks OpenAI:
freeride bind aider
freeride bind continue
# or just point it at the gateway directly:
OPENAI_API_BASE=http://localhost:11343/v1
OPENAI_API_KEY=any-string-here

How failover works

Per-request the chain is (provider, key), sorted by recent health:

  1. Try the head pair.
  2. RATE_LIMIT or AUTH error → mark the key as cooling, try the next key on the same provider.
  3. MODEL_NOT_FOUND or QUOTA_EXHAUSTED → skip to the next provider.
  4. 5xx / TIMEOUT → next pair.
  5. First successful response — stamp X-FreeRide-Provider + X-FreeRide-Request-Id headers and ship.

If every pair fails, you get a structured 503 with a per-provider breakdown so debugging is one log line, not five round-trips. Mid-stream errors after the first chunk shipped are logged but don't break the client (we can't un-ship bytes).

Smart routing for model: "auto": the resolver scores every free model in the catalog by health × popularity (from the public models leaderboard) and picks the best one. Run freeride audit-models once after install to cache health probes locally so the first real request isn't a cold start.

Deeper: docs/architecture/failover.md.


Providers

Provider Surface Notes
OpenRouter chat, streaming, tools, vision, structured outputs, embeddings full surface — the most-used provider in our routing
NVIDIA NIM chat + embeddings curated free-model allowlist; NVIDIA_NIM_FREE_MODELS_OVERRIDE to expand
Groq chat Llama 3.x, Gemma 2, Mixtral, DeepSeek-R1-distill; daily token cap
Cloudflare Workers AI chat cheap-per-neuron models; needs CLOUDFLARE_ACCOUNT_ID
HuggingFace Inference chat + embeddings full HF router catalog; budget governs access
Cerebras chat fastest Llama / Qwen inference; no embeddings
Ollama (local) chat local-only; can mix with remote in the same failover chain

Adding a new provider: implement freeride.core.provider.Provider in freeride/providers/<name>.py, register it in the conformance suite. See CONTRIBUTING.md.


Multi-key rotation

Provide more than one key per provider with a numbered suffix:

OPENROUTER_API_KEY=sk-or-v1-aaa     # primary
OPENROUTER_API_KEY_2=sk-or-v1-bbb
OPENROUTER_API_KEY_3=sk-or-v1-ccc

The router tries them in health order. A 429 on one key cools it for the next 60s and rotates to the sibling key — no provider switch needed. On startup freeride keys shows which keys are available vs cooling.


See what the gateway is doing

freeride doctor                # static checks: keys, ports, /etc/hosts, common gotchas
freeride doctor --claude-code  # the same + Claude-Code-specific probes
freeride audit-models          # probe every free model on every key; cache the results
freeride bench                 # measure p50/p95/tok-s per provider

Tail live events:

tail -f ~/.freeride/events.jsonl

Each line is a JSON event: routing decisions, provider attempts, response statuses, mid-stream errors. Same schema the marketing site reads to render the live token counter and provider leaderboard.


Telemetry

A small beacon ships hourly with counts only: tokens served, request count, active providers, uptime hours, OS, version, and a per-install UUID. Never sent: prompts, completions, model IDs, API keys, hostname, IP.

freeride telemetry        # audit what the next beacon would post
freeride telemetry off    # opt out

The aggregate is what powers free-ride.xyz/models. Default on; explicit disclosure banner prints on first run.


Commands

freeride init           interactive setup wizard — prompts for keys, writes ~/.freeride/.env
freeride serve          start the gateway on :11343
freeride run <cli>      wrap a CLI (claude / codex / gemini / anything) — points it at the gateway
freeride bind <agent>   write the agent's config so it uses the gateway permanently
freeride doctor         pre-flight checks: keys, ports, hosts file, common gotchas
freeride keys           which provider keys are available vs cooling
freeride audit-models   probe every free model; cache health locally
freeride bench          measure p50/p95/tok-s per provider
freeride list           list available free models
freeride telemetry      manage the hourly aggregate beacon

Docs


License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freeride_gateway-0.4.0a20.tar.gz (361.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

freeride_gateway-0.4.0a20-py3-none-any.whl (206.6 kB view details)

Uploaded Python 3

File details

Details for the file freeride_gateway-0.4.0a20.tar.gz.

File metadata

  • Download URL: freeride_gateway-0.4.0a20.tar.gz
  • Upload date:
  • Size: 361.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for freeride_gateway-0.4.0a20.tar.gz
Algorithm Hash digest
SHA256 9bc6bf00a4a0f51c1cd4e69c522f984420cbbca15dcaa7e55ec18c5be4a6207c
MD5 37118de47a6167ed8e79dfea1f6edbaa
BLAKE2b-256 908679a9963a0b9e911a9eecce774135b911b6bd8131ba5f745c5c8f3e740f42

See more details on using hashes here.

Provenance

The following attestation bundles were made for freeride_gateway-0.4.0a20.tar.gz:

Publisher: release.yml on Shaivpidadi/FreeRideV3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file freeride_gateway-0.4.0a20-py3-none-any.whl.

File metadata

File hashes

Hashes for freeride_gateway-0.4.0a20-py3-none-any.whl
Algorithm Hash digest
SHA256 69f5fbcaf0536908be3ed378617aa1f127a0652b51e9f05b8fe08d14536d8a86
MD5 64cc31496a337d8cda3c867fba7d3454
BLAKE2b-256 83bf9921c32c73d481239e594a26f6118d0ca2ce5c2344628467a08d5730c02a

See more details on using hashes here.

Provenance

The following attestation bundles were made for freeride_gateway-0.4.0a20-py3-none-any.whl:

Publisher: release.yml on Shaivpidadi/FreeRideV3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page