Free multi-LLM backend — decompose prompts into subtasks and route to the best free model for each task
Reason this release was yanked:
Not working
Project description
Why use one LLM when you can use them all?
Smart routing to the best model for each task — picks the right model tier automatically.
Free by default. Optimizes paid tokens when available.
Who is this for?
- Developers without a paid subscription who want a powerful AI coding assistant using free LLMs.
- Developers with a paid API budget who want to make it last — SmartSplit routes simple tasks to free models and saves your paid tokens (OpenAI, Anthropic) for complex work. No config needed, it's the default behavior.
- Teams who want to combine multiple LLMs without changing their existing tools.
- Anyone frustrated by a single model that's great at code but bad at everything else.
The problem
You ask your coding assistant to write a function, explain an algorithm, translate a comment, and find the latest docs. It sends everything to the same model — and that model is average at most of these tasks.
Before SmartSplit:
You: "Write a Python CSV parser, explain the edge cases, and translate the docstrings to French"
→ Everything goes to one model
→ Code is okay, explanation is shallow, translation is awkward
After SmartSplit:
Same prompt, same client, same workflow
→ Code subtask → best code model (deep, accurate)
→ Reasoning subtask → best reasoning model (thorough)
→ Translation → language specialist (native quality)
→ Simple boilerplate → fast cheap model (saves your budget)
→ Combined into one coherent response
Same tool. Better answers. No config change.
What makes SmartSplit different
Multiply your free tier. Instead of burning through one provider's quota, SmartSplit spreads requests across all your configured providers — each one contributing its free tier. More providers = more capacity.
Self-healing. A provider goes down or hits its rate limit? You won't even notice. SmartSplit detects failures, disables the provider temporarily, and routes to the next best one — automatically.
Web-aware. When your prompt needs current data ("latest", "news", "2026"...), SmartSplit detects it and searches the web before answering. No plugin needed — it's built in.
Stretch your paid tokens. Got an OpenAI or Anthropic API key? Add it, and SmartSplit picks the right model for each task automatically:
Simple task (boilerplate, summary) → Haiku / GPT-4o-mini (cheap)
Complex task (code, reasoning) → Sonnet / GPT-4o (best)
Everything else → Free models first
No config needed — SmartSplit detects task complexity and chooses the best model tier automatically.
Your coding assistant (Continue, Cline, Aider, Cursor...)
|
SmartSplit (localhost:8420)
|
┌────┼──────────────────────────┐
| | | |
Code Search Translate Reasoning
| | | |
Best Best Best Best
model engine model model
| | | |
└────┼──────────┼────────────────┘
|
Combined response
Quick Start
1. Install
pip install smartsplit
# or: uv pip install smartsplit
2. Get a free API key (2 minutes)
You need one key to start. Sign up at groq.com and copy your API key.
Add more providers later for better routing. Each new provider = better results, more fallbacks. See Providers.
3. Start SmartSplit
export GROQ_API_KEY="gsk_..."
smartsplit
SmartSplit — Multi-LLM backend
http://127.0.0.1:8420/v1
Mode: balanced
Or use Docker
# Create a .env file with your API keys
echo 'GROQ_API_KEY=gsk_...' > .env
# Run with Docker
docker run -p 8420:8420 --env-file .env ghcr.io/dsteinberger/smartsplit
# Or with Docker Compose
docker compose up -d
4. Connect your coding tool
Continue (VS Code / JetBrains)
Copy examples/.continuerc.json to your project as .continuerc.json, or add to ~/.continue/config.yaml:
models:
- name: SmartSplit
provider: openai
model: smartsplit
apiBase: http://localhost:8420/v1
apiKey: free
Cline (VS Code)
In the Cline sidebar, click the gear icon:
- Select OpenAI Compatible as provider
- Base URL:
http://localhost:8420/v1 - API Key:
free - Model ID:
smartsplit
Aider (Terminal)
Copy examples/.aider.conf.yml to your project as .aider.conf.yml, or run:
aider --model openai/smartsplit --openai-api-base http://localhost:8420/v1 --openai-api-key free
OpenCode (Terminal)
Copy examples/opencode.json to your project root, run opencode providers to add the API key (free), then select the model with /models.
Tabby (Self-hosted autocomplete)
Add to ~/.tabby/config.toml:
[model.chat.http]
kind = "openai/chat"
model_name = "smartsplit"
api_endpoint = "http://localhost:8420/v1"
api_key = "free"
Void (Open-source IDE)
In Void settings:
- Find OpenAI-Compatible section → set Base URL
http://localhost:8420/v1, API Keyfree - In Models section → Add Model, select OpenAI-Compatible, name:
smartsplit
Any OpenAI-compatible client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8420/v1", api_key="free")
SmartSplit works with any tool that supports a custom OpenAI endpoint: Continue, Cline, Aider, OpenCode, Tabby, Void, Cursor, Open WebUI, Chatbox, LibreChat, Jan, and more.
That's it. Three steps: install, add one API key, connect your tool. Your assistant now has access to every top free LLM.
How It Works
Every request is automatically classified into one of two modes:
RESPOND — route to the best model
Your prompt is analyzed, split into subtasks if needed, and each one is routed to the best provider:
"Write a Python function to parse CSV and handle errors"
[code] → best code model
[reasoning] → best reasoning model
[synthesis] → combines results
→ One coherent response
ENRICH — search the web first, then route
When the prompt needs current data, SmartSplit searches the web first:
"What are the new features in Python 3.13?"
[web_search] → search engine
[summarize] → best summarization model
→ Response with real, current data
Context-aware
SmartSplit passes your full conversation history to the LLM — system prompts, previous messages, everything. For multi-subtask prompts, a context summary is injected into each subtask so no information is lost.
Built-in reliability
| Feature | What it does |
|---|---|
| Circuit breaker | 3 failures in 5 min → provider auto-disabled for 30 min |
| Quality gates | Detects refusals ("I cannot...") → auto-escalation to next provider |
| Fallback chains | Provider fails → next best one takes over, seamlessly |
| Decompose cache | Repeated prompts skip analysis (LRU, 24h TTL) |
| Context preservation | Full conversation history passed to each LLM |
| Adaptive scoring | Learns which providers work best from real results (MAB/UCB1) |
Providers
Supported providers
| Provider | Type | Best at |
|---|---|---|
| Cerebras | Free | Reasoning, general (Qwen 3 235B) |
| Groq | Free | Fast inference (LLaMA 3.3 70B) |
| Gemini | Free | Math, reasoning (Gemini 2.5 Flash) |
| OpenRouter | Free | Code (Qwen3 Coder 480B) |
| Mistral | Free | Translation (Mistral Small) |
| HuggingFace | Free backup | Code (Qwen2.5 Coder 32B) |
| Cloudflare | Free backup | General (LLaMA 3.3 70B) |
| DeepSeek | Paid | Code, reasoning |
| Anthropic | Paid | Complex tasks (Claude) |
| OpenAI | Paid | Complex tasks (GPT-4o) |
| Serper | Free | Web search |
| Tavily | Free | Web search |
Add providers by setting environment variables:
export GROQ_API_KEY="gsk_..."
export GEMINI_API_KEY="AIza..."
export DEEPSEEK_API_KEY="sk-..."
export CEREBRAS_API_KEY="csk-..."
export MISTRAL_API_KEY="..."
export OPENROUTER_API_KEY="sk-or-..."
export HF_TOKEN="hf_..."
export CLOUDFLARE_API_KEY="..."
export CLOUDFLARE_ACCOUNT_ID="..."
export SERPER_API_KEY="..."
More providers = better routing, more fallbacks, higher resilience.
Format translation is automatic. Most providers use the OpenAI format natively. Gemini uses Google's own format — SmartSplit translates on the fly. Your client talks OpenAI, SmartSplit handles the rest.
Paid providers (Anthropic, OpenAI) are also supported as optional fallbacks. They're disabled by default.
Routing table
Task Best free providers (ranked)
─────────────────────────────────────────────
code OpenRouter > Cerebras = Gemini > Groq = HuggingFace
reasoning Cerebras > Gemini = OpenRouter > Groq
summarize Cerebras > Groq = Gemini = Mistral = OpenRouter
translation Mistral > Gemini > Groq = Cerebras
web search Serper or Tavily
boilerplate Cerebras = Groq > Gemini = Mistral = OpenRouter
math OpenRouter = Gemini > Cerebras > Groq
general Cerebras > Gemini = OpenRouter > Groq = Mistral
Backups: HuggingFace, Cloudflare (lower quality, high availability)
Metrics
curl http://localhost:8420/metrics
{
"requests": { "total": 142, "enrich": 42, "respond": 100 },
"savings": { "tokens_saved": 45000, "cost_saved_usd": 0.135 },
"cache": { "hits": 23, "hit_rate": 16.2 },
"circuit_breaker": { "unhealthy_providers": [] }
}
Also available: GET /health · GET /savings
Configuration
CLI options
smartsplit # defaults: port 8420, balanced mode
smartsplit --port 3456 # custom port
smartsplit --mode economy # max free usage
smartsplit --mode quality # prefer quality over speed
smartsplit --log-level DEBUG # verbose logging
Config file (alternative to env vars)
cp smartsplit.example.json smartsplit.json
# Edit with your API keys
You can also tune provider settings and routing:
{
"mode": "balanced",
"free_llm_priority": ["cerebras", "groq", "gemini", "openrouter", "mistral", "huggingface", "cloudflare"],
"providers": {
"groq": {
"model": "llama-3.3-70b-versatile",
"temperature": 0.3,
"max_tokens": 4096
},
"serper": {
"max_search_results": 5
}
}
}
| Option | Default | What it does |
|---|---|---|
free_llm_priority |
cerebras, groq, gemini, openrouter, mistral, huggingface, cloudflare | Fallback order for free LLM calls |
providers.*.model |
per-provider default | Override the default model |
providers.*.temperature |
0.3 |
LLM temperature |
providers.*.max_tokens |
4096 |
Max output tokens |
providers.*.max_search_results |
5 |
Number of web search results |
Docker
# Using the published image
docker run -p 8420:8420 --env-file .env ghcr.io/dsteinberger/smartsplit
# Or build locally
docker build -t smartsplit .
docker run -p 8420:8420 --env-file .env smartsplit
Create a .env file with your API keys:
GROQ_API_KEY=gsk_...
SERPER_API_KEY=...
GEMINI_API_KEY=AIza...
Never commit
.envto git — it's already in.gitignore.
Or use Docker Compose:
docker compose up -d
See docker-compose.yml for the full setup.
Development
Prerequisites: Python 3.11+ and uv (recommended) or pip.
git clone https://github.com/dsteinberger/smartsplit.git
cd smartsplit
make install # or: pip install -e ".[dev]"
make check # lint + format check + tests
make test # tests only
make run # start server (requires at least one API key)
make help # all commands
Note:
make testruns all tests without any API key — no provider needed for development.
See CONTRIBUTING.md for guidelines.
Architecture
smartsplit/
proxy.py HTTP server + LLM-based triage + CLI
formats.py OpenAI format conversion + SSE streaming
planner.py Prompt decomposition + synthesis + LRU cache
router.py Provider scoring + routing + quality gates
learning.py MAB (UCB1) adaptive scoring — learns from real results
quota.py Usage tracking + savings report
config.py Configuration + env vars
models.py Pydantic models + StrEnum
exceptions.py Custom error hierarchy
providers/ One file per provider (3 lines for OpenAI-compatible)
Adding a new provider is 2 lines (model is set in config):
class NewProvider(OpenAICompatibleProvider):
name = "new"
api_url = "https://api.new.com/v1/chat/completions"
Disclaimer
SmartSplit is a personal development tool. Each user must provide their own API keys and comply with the terms of service of each provider they use. SmartSplit does not store, share, or redistribute API keys or access. The authors are not responsible for any misuse or ToS violations by end users.
MIT License · Contributing · Security · Changelog
Star this repo to follow updates — new providers, streaming, and more coming soon.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smartsplit-0.1.0.tar.gz.
File metadata
- Download URL: smartsplit-0.1.0.tar.gz
- Upload date:
- Size: 4.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ab671a5b730e225b46207a266f7b91263feb271bb677c0069d0d6fd7ed9e422
|
|
| MD5 |
52d99bd9f45517914ad2af2e4db9835f
|
|
| BLAKE2b-256 |
274e95e7e11afdf43e2915aa30746c291d7685bc851d73d73a008a0b69dd862a
|
Provenance
The following attestation bundles were made for smartsplit-0.1.0.tar.gz:
Publisher:
publish.yml on dsteinberger/smartsplit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
smartsplit-0.1.0.tar.gz -
Subject digest:
0ab671a5b730e225b46207a266f7b91263feb271bb677c0069d0d6fd7ed9e422 - Sigstore transparency entry: 1271531226
- Sigstore integration time:
-
Permalink:
dsteinberger/smartsplit@edc3c17ede24afe56588187a6a10a96a90dacc6f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/dsteinberger
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@edc3c17ede24afe56588187a6a10a96a90dacc6f -
Trigger Event:
push
-
Statement type:
File details
Details for the file smartsplit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: smartsplit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 50.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
faf20ff1a0ce374e24d2d0aacaf02564fbfbbcdc1e2297f1e152b7a42a924dcc
|
|
| MD5 |
b23b965b925b76416fc0a59b94bbd8e7
|
|
| BLAKE2b-256 |
3282434228521884acc3ffb8655f3a49eacec9c57163ec2072c1e772c945de2d
|
Provenance
The following attestation bundles were made for smartsplit-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on dsteinberger/smartsplit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
smartsplit-0.1.0-py3-none-any.whl -
Subject digest:
faf20ff1a0ce374e24d2d0aacaf02564fbfbbcdc1e2297f1e152b7a42a924dcc - Sigstore transparency entry: 1271531232
- Sigstore integration time:
-
Permalink:
dsteinberger/smartsplit@edc3c17ede24afe56588187a6a10a96a90dacc6f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/dsteinberger
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@edc3c17ede24afe56588187a6a10a96a90dacc6f -
Trigger Event:
push
-
Statement type: