Route LLM API calls through subscription-authenticated coding agents (Claude Code, Codex, Gemini CLI)

These details have not been verified by PyPI

Project links

Project description

SubLLM Banner

SubLLM

Route standard LLM API calls through subscription-authenticated coding agents instead of API keys.

Use your Claude Pro/Max, ChatGPT Plus/Pro, or Google AI Pro/Ultra subscription as the backend for programmatic LLM calls. SubLLM provides an OpenAI-compatible unified interface that abstracts Claude Code, Codex, and Gemini CLIs behind a standard completion() API.

Why?

You already pay for a subscription — Claude Pro/Max, ChatGPT Plus/Pro, Google AI Pro/Ultra — but the moment you need to run a script, you're forced to pay again for API access to the same models. Subscription plans offer flat-rate access but lock you into chat UIs with no programmatic interface. API keys unlock programmatic access but bill per-token on top of what you're already paying.

SubLLM eliminates the double-pay problem. It routes standard completion() calls through CLI tools that authenticate via your existing subscriptions — turning flat-rate chat plans into programmable LLM backends.

Flat-rate pricing — pay your subscription, not per-token. Heavy usage costs the same as light usage.
Zero API key management — CLIs authenticate through your subscription. No keys to provision, rotate, or secure.
OpenAI-compatible interface — standard completion() API with OpenAI ChatCompletion response format. Swap SubLLM in/out of existing code with minimal changes.
Cross-provider — same API surface for Claude, GPT, and Gemini. Switch models by changing a string.

Approach	Cost (heavy usage)	Overhead	Flexibility
Direct API (per-token)	$50-500+/mo	~0s	Full control
SubLLM (subscription)	$0-200/mo flat	~270ms	Good for batch/dev
Direct API + LiteLLM	$50-500+/mo	~0s	Multi-provider

Best for: development, prototyping, batch jobs, CI/CD, personal automation, cost-sensitive pipelines.

Not for: real-time chat UIs, latency-sensitive production services, multi-tenant SaaS (ToS constraints), workloads requiring tool use / function calling / prompt caching / logprobs / stop sequences (these features execute inside CLI sandboxes and are not exposed).

Quick Start

1. Install

uv add subllm              # Core (includes Claude Agent SDK)
uv add subllm[server]      # + OpenAI-compatible proxy server

2. Authenticate Your CLIs

Claude Code (subscription auth):

curl -fsSL https://claude.ai/install.sh | bash
claude login
unset ANTHROPIC_API_KEY          # Force subscription auth

For headless/CI:

claude setup-token
export CLAUDE_CODE_OAUTH_TOKEN="your-token-here"

Codex (subscription auth):

npm install -g @openai/codex
codex login

Gemini CLI (free tier or subscription):

npm install -g @google/gemini-cli
gemini                           # Complete Google login
# Or use API key:
export GEMINI_API_KEY="your-key"

3. Use

Python API:

import asyncio
import subllm

async def main():
    # Non-streaming
    response = await subllm.completion(
        model="claude-code/sonnet-4-5",
        messages=[{"role": "user", "content": "Explain monads in one sentence"}],
    )
    print(response.choices[0].message.content)

    # Streaming
    stream = await subllm.completion(
        model="gemini/gemini-3-flash-preview",
        messages=[{"role": "user", "content": "Write a haiku about Rust"}],
        stream=True,
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content, end="")

asyncio.run(main())

CLI:

subllm auth                                        # Check all providers
subllm models                                      # List available models
subllm complete "What is 2+2?" -m claude-code/sonnet-4-5
subllm complete "Write a haiku" -m gemini/gemini-3-flash-preview --stream

OpenAI-compatible proxy:

subllm serve --port 8080

# Then use ANY OpenAI-compatible client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
response = client.chat.completions.create(
    model="claude-code/sonnet-4-5",
    messages=[{"role": "user", "content": "hello"}],
)

Available Models

Model ID	Backend	Auth
`claude-code/opus-4-6`	Claude Opus 4.6	Claude Max ($200)
`claude-code/sonnet-4-5`	Claude Sonnet 4.5	Claude Pro ($20) / Max ($100-200)
`claude-code/haiku-4-5`	Claude Haiku 4.5	Claude Pro ($20) / Max ($100-200)
`codex/gpt-5.2`	GPT-5.2	ChatGPT Plus ($20) / Pro ($200)
`codex/gpt-5.2-codex`	GPT-5.2-Codex	ChatGPT Plus ($20) / Pro ($200)
`codex/gpt-4.1`	GPT-4.1	ChatGPT Plus ($20) / Pro ($200)
`codex/gpt-5-mini`	GPT-5 Mini	ChatGPT Plus ($20) / Pro ($200)
`gemini/gemini-3-pro-preview`	Gemini 3 Pro Preview	API key / AI Pro / AI Ultra
`gemini/gemini-3-flash-preview`	Gemini 3 Flash Preview	API key / AI Pro / AI Ultra

Architecture

User Code ──→ subllm.completion() ──→ Router
                                       ├── ClaudeCodeProvider
                                       │     └── claude-agent-sdk (persistent client)
                                       ├── CodexProvider
                                       │     └── codex exec (subprocess)
                                       └── GeminiCLIProvider
                                             └── gemini -p (subprocess)

All providers delegate auth entirely to the underlying CLIs. SubLLM never stores or manages tokens directly. Multi-turn conversations use stateless message replay — the full conversation history is sent each turn.

Batch Processing

results = await subllm.batch([
    {"model": "claude-code/sonnet-4-5", "messages": [...]},
    {"model": "gemini/gemini-3-flash-preview", "messages": [...]},
    {"model": "codex/gpt-5.2", "messages": [...]},
], concurrency=5)

Runs completions in parallel with a concurrency semaphore. Each provider's CLI handles its own rate limiting internally.

Benchmarks

Measured on macOS. Timings include full CLI subprocess overhead (spawn, auth, inference, response parsing). Single run — expect variance across sessions.

Auth Check

Provider	Method	Latency
claude-code	`claude auth status`	~266ms
codex	subscription check	~97ms
gemini	OAuth credential file	~2ms
all (parallel)	`asyncio.gather`	~270ms

Auth is bounded by the slowest provider. Previous sequential approach with inference roundtrips: ~30s total.

Completion

Provider	Model	Non-streaming	Streaming
claude-code	`sonnet-4-5`	~13-17s	~9-15s
codex	`gpt-5.2`	~11-15s	~11-12s
gemini	`gemini-3-flash-preview`	~8-9s	~8-9s

Multi-turn

Provider	Model	Turn 1	Turn 2
claude-code	`sonnet-4-5`	~16s	~14s
codex	`gpt-5.2`	~15s	~12s
gemini	`gemini-3-flash-preview`	~13s	~14s

Full conversation history replayed each turn (stateless). Turn 2 carries Turn 1 context.

Cross-provider Handoff

Message history replayed across different providers within a single conversation:

Turn	Provider	Latency
1 (remember)	`claude-code/sonnet-4-5`	~16s
2 (recall)	`codex/gpt-5.2`	~11s
3 (verify)	`gemini/gemini-3-flash-preview`	~14s

Batch (3 parallel completions)

Scope	Latency
claude-code	~17s
codex	~14s
gemini	~9s
cross-provider	~14s

Parallel execution bounded by the slowest request.

Terms of Service & Disclaimer

SubLLM routes completion calls through CLI tools that authenticate via your subscription. It does not circumvent authentication, store credentials, or proxy third-party access. Users are responsible for compliance with each provider's terms of service.

Anthropic — Anthropic prohibits third-party developers from offering claude.ai login for their products. The "user brings their own authenticated CLI" pattern is established by Cline, Zed, and Repo Prompt. Safe for personal and team use. Do not ship as a SaaS product.
OpenAI — Codex CLI explicitly supports ChatGPT subscription auth and programmatic exec mode. Officially supported pattern.
Google — Gemini CLI supports Google OAuth and API key auth. Standard usage pattern.

Review each provider's current ToS before use. Terms change.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Feb 15, 2026

0.3.0

Feb 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subllm-0.4.0.tar.gz (654.8 kB view details)

Uploaded Feb 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

subllm-0.4.0-py3-none-any.whl (23.8 kB view details)

Uploaded Feb 15, 2026 Python 3

File details

Details for the file subllm-0.4.0.tar.gz.

File metadata

Download URL: subllm-0.4.0.tar.gz
Upload date: Feb 15, 2026
Size: 654.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for subllm-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`6b6af1eb3b1ab026d39c42e7c6f78c013fae216248266e41c975da52e87be480`
MD5	`a5c02dd62d490c50f6a73050fe9f6785`
BLAKE2b-256	`b746359c683fc49789c9e7804f98bf6b8b09e5356adb1d3ca5d41efe57cc0269`

See more details on using hashes here.

File details

Details for the file subllm-0.4.0-py3-none-any.whl.

File metadata

Download URL: subllm-0.4.0-py3-none-any.whl
Upload date: Feb 15, 2026
Size: 23.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for subllm-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aeb0f279e2e60c76d29297bb749628f1cc1e6fd25463b88f14898d3ec4e28393`
MD5	`08aef1796c3a10b46aaba2fd09403645`
BLAKE2b-256	`52dc597c63c303fd127f910c423c06036afb4f8fa182b0aae9e68e0f9f5e8380`

See more details on using hashes here.

subllm 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SubLLM

Why?

Quick Start

1. Install

2. Authenticate Your CLIs

3. Use

Available Models

Architecture

Batch Processing

Benchmarks

Auth Check

Completion

Multi-turn

Cross-provider Handoff

Batch (3 parallel completions)

Terms of Service & Disclaimer

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes