Skip to main content

Route LLM API calls through subscription-authenticated coding agents (Claude Code, Codex, Gemini CLI)

Project description

SubLLM Banner


SubLLM

Route standard LLM API calls through subscription-authenticated coding agents instead of API keys.

Use your Claude Pro/Max, ChatGPT Plus/Pro, or Google AI Pro/Ultra subscription as the backend for programmatic LLM calls. SubLLM provides an OpenAI-compatible unified interface that abstracts Claude Code, Codex, and Gemini CLIs behind a standard completion() API.

Why?

You already pay for a subscription — Claude Pro/Max, ChatGPT Plus/Pro, Google AI Pro/Ultra — but the moment you need to run a script, you're forced to pay again for API access to the same models. Subscription plans offer flat-rate access but lock you into chat UIs with no programmatic interface. API keys unlock programmatic access but bill per-token on top of what you're already paying.

SubLLM eliminates the double-pay problem. It routes standard completion() calls through CLI tools that authenticate via your existing subscriptions — turning flat-rate chat plans into programmable LLM backends.

  • Flat-rate pricing — pay your subscription, not per-token. Heavy usage costs the same as light usage.
  • Zero API key management — CLIs authenticate through your subscription. No keys to provision, rotate, or secure.
  • OpenAI-compatible interface — standard completion() API with OpenAI ChatCompletion response format. Swap SubLLM in/out of existing code with minimal changes.
  • Cross-provider — same API surface for Claude, GPT, and Gemini. Switch models by changing a string.
Approach Cost (heavy usage) Overhead Flexibility
Direct API (per-token) $50-500+/mo ~0s Full control
SubLLM (subscription) $0-200/mo flat ~270ms Good for batch/dev
Direct API + LiteLLM $50-500+/mo ~0s Multi-provider

Best for: development, prototyping, batch jobs, CI/CD, personal automation, cost-sensitive pipelines.

Not for: real-time chat UIs, latency-sensitive production services, multi-tenant SaaS (ToS constraints), workloads requiring tool use / function calling / prompt caching / logprobs / stop sequences (these features execute inside CLI sandboxes and are not exposed).

Quick Start

1. Install

uv add subllm              # Core (CLI subprocess mode)
uv add subllm[server]      # + OpenAI-compatible proxy server
uv add subllm[sdk]         # + Claude Agent SDK integration

2. Authenticate Your CLIs

Claude Code (subscription auth):

curl -fsSL https://claude.ai/install.sh | bash
claude login
unset ANTHROPIC_API_KEY          # Force subscription auth

For headless/CI:

claude setup-token
export CLAUDE_CODE_OAUTH_TOKEN="your-token-here"

Codex (subscription auth):

npm install -g @openai/codex
codex login

Gemini CLI (free tier or subscription):

npm install -g @google/gemini-cli
gemini                           # Complete Google login
# Or use API key:
export GEMINI_API_KEY="your-key"

3. Use

Python API:

import asyncio
import subllm

async def main():
    # Non-streaming
    response = await subllm.completion(
        model="claude-code/sonnet-4-5",
        messages=[{"role": "user", "content": "Explain monads in one sentence"}],
    )
    print(response.choices[0].message.content)

    # Streaming
    stream = await subllm.completion(
        model="gemini/gemini-3-flash-preview",
        messages=[{"role": "user", "content": "Write a haiku about Rust"}],
        stream=True,
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content, end="")

asyncio.run(main())

CLI:

subllm auth                                        # Check all providers
subllm models                                      # List available models
subllm complete "What is 2+2?" -m claude-code/sonnet-4-5
subllm complete "Write a haiku" -m gemini/gemini-3-flash-preview --stream

OpenAI-compatible proxy:

subllm serve --port 8080

# Then use ANY OpenAI-compatible client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
response = client.chat.completions.create(
    model="claude-code/sonnet-4-5",
    messages=[{"role": "user", "content": "hello"}],
)

Available Models

Model ID Backend Auth
claude-code/opus-4-6 Claude Opus 4.6 Claude Max ($200)
claude-code/sonnet-4-5 Claude Sonnet 4.5 Claude Pro ($20) / Max ($100-200)
claude-code/haiku-4-5 Claude Haiku 4.5 Claude Pro ($20) / Max ($100-200)
codex/gpt-5.2 GPT-5.2 ChatGPT Plus ($20) / Pro ($200)
codex/gpt-5.2-codex GPT-5.2-Codex ChatGPT Plus ($20) / Pro ($200)
codex/gpt-4.1 GPT-4.1 ChatGPT Plus ($20) / Pro ($200)
codex/gpt-5-mini GPT-5 Mini ChatGPT Plus ($20) / Pro ($200)
gemini/gemini-3-pro-preview Gemini 3 Pro Preview API key / AI Pro / AI Ultra
gemini/gemini-3-flash-preview Gemini 3 Flash Preview API key / AI Pro / AI Ultra

Architecture

User Code ──→ subllm.completion() ──→ Router
                                       ├── ClaudeCodeProvider
                                       │     └── claude --print (subprocess)
                                       │         or claude-agent-sdk (async)
                                       ├── CodexProvider
                                       │     └── codex exec (subprocess)
                                       └── GeminiCLIProvider
                                             └── gemini -p (subprocess)

All providers delegate auth entirely to the underlying CLIs. SubLLM never stores or manages tokens directly. Multi-turn conversations use stateless message replay — the full conversation history is sent each turn.

Batch Processing

results = await subllm.batch([
    {"model": "claude-code/sonnet-4-5", "messages": [...]},
    {"model": "gemini/gemini-3-flash-preview", "messages": [...]},
    {"model": "codex/gpt-5.2", "messages": [...]},
], concurrency=5)

Runs completions in parallel with a concurrency semaphore. Each provider's CLI handles its own rate limiting internally.

Benchmarks

Measured on macOS. Timings include full CLI subprocess overhead (spawn, auth, inference, response parsing). Single run — expect variance across sessions.

Auth Check

Provider Method Latency
claude-code claude auth status ~266ms
codex subscription check ~97ms
gemini OAuth credential file ~2ms
all (parallel) asyncio.gather ~270ms

Auth is bounded by the slowest provider. Previous sequential approach with inference roundtrips: ~30s total.

Completion

Provider Model Non-streaming Streaming
claude-code sonnet-4-5 ~13-17s ~9-15s
codex gpt-5.2 ~11-15s ~11-12s
gemini gemini-3-flash-preview ~8-9s ~8-9s

Multi-turn

Provider Model Turn 1 Turn 2
claude-code sonnet-4-5 ~16s ~14s
codex gpt-5.2 ~15s ~12s
gemini gemini-3-flash-preview ~13s ~14s

Full conversation history replayed each turn (stateless). Turn 2 carries Turn 1 context.

Cross-provider Handoff

Message history replayed across different providers within a single conversation:

Turn Provider Latency
1 (remember) claude-code/sonnet-4-5 ~16s
2 (recall) codex/gpt-5.2 ~11s
3 (verify) gemini/gemini-3-flash-preview ~14s

Batch (3 parallel completions)

Scope Latency
claude-code ~17s
codex ~14s
gemini ~9s
cross-provider ~14s

Parallel execution bounded by the slowest request.

Terms of Service & Disclaimer

SubLLM routes completion calls through CLI tools that authenticate via your subscription. It does not circumvent authentication, store credentials, or proxy third-party access. Users are responsible for compliance with each provider's terms of service.

  • Anthropic — Anthropic prohibits third-party developers from offering claude.ai login for their products. The "user brings their own authenticated CLI" pattern is established by Cline, Zed, and Repo Prompt. Safe for personal and team use. Do not ship as a SaaS product.
  • OpenAI — Codex CLI explicitly supports ChatGPT subscription auth and programmatic exec mode. Officially supported pattern.
  • Google — Gemini CLI supports Google OAuth and API key auth. Standard usage pattern.

Review each provider's current ToS before use. Terms change.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subllm-0.3.0.tar.gz (472.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

subllm-0.3.0-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file subllm-0.3.0.tar.gz.

File metadata

  • Download URL: subllm-0.3.0.tar.gz
  • Upload date:
  • Size: 472.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for subllm-0.3.0.tar.gz
Algorithm Hash digest
SHA256 96d4593072b9729232cc44036bab3ba90fb43811267460a61fe097f6cd9b3851
MD5 407fc92111f2188e07548fd22cbe5bf4
BLAKE2b-256 229564447d7683bd05929a1971c948a9940d013a2c32cc396d576a6482bc29d9

See more details on using hashes here.

File details

Details for the file subllm-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: subllm-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for subllm-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b5d6673cc64cea89f32124afcda2dfdb3f0d5c6a7d698b59cd7ebd2259096bcb
MD5 6aaf1a98d3ffbdf59511815ae6696b81
BLAKE2b-256 61f9ee289afe46f5e720d787b4f82dcf65417c9d7fb1d686b487218dd1b3eb0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page