Skip to main content

Free multi-LLM backend — decompose prompts into subtasks and route to the best free model for each task

Reason this release was yanked:

Not working

Project description

SmartSplit — Why use one LLM when you can use them all?

CI Python 3.11+ License: MIT Coverage Ruff

Why use one LLM when you can use them all?

Smart routing to the best model for each task — picks the right model tier automatically.
Free by default. Optimizes paid tokens when available.

Quick Start · How It Works · Providers · Metrics


Who is this for?

  • Developers without a paid subscription who want a powerful AI coding assistant using free LLMs.
  • Developers with a paid API budget who want to make it last — SmartSplit routes simple tasks to free models and saves your paid tokens (OpenAI, Anthropic) for complex work. No config needed, it's the default behavior.
  • Teams who want to combine multiple LLMs without changing their existing tools.
  • Anyone frustrated by a single model that's great at code but bad at everything else.

The problem

You ask your coding assistant to write a function, explain an algorithm, translate a comment, and find the latest docs. It sends everything to the same model — and that model is average at most of these tasks.

Before SmartSplit:

You: "Write a Python CSV parser, explain the edge cases, and translate the docstrings to French"

→ Everything goes to one model
→ Code is okay, explanation is shallow, translation is awkward

After SmartSplit:

Same prompt, same client, same workflow

→ Code subtask      → best code model (deep, accurate)
→ Reasoning subtask → best reasoning model (thorough)
→ Translation       → language specialist (native quality)
→ Simple boilerplate → fast cheap model (saves your budget)
→ Combined into one coherent response

Same tool. Better answers. No config change.

What makes SmartSplit different

Multiply your free tier. Instead of burning through one provider's quota, SmartSplit spreads requests across all your configured providers — each one contributing its free tier. More providers = more capacity.

Self-healing. A provider goes down or hits its rate limit? You won't even notice. SmartSplit detects failures, disables the provider temporarily, and routes to the next best one — automatically.

Web-aware. When your prompt needs current data ("latest", "news", "2026"...), SmartSplit detects it and searches the web before answering. No plugin needed — it's built in.

Stretch your paid tokens. Got an OpenAI or Anthropic API key? Add it, and SmartSplit picks the right model for each task automatically:

Simple task (boilerplate, summary)  → Haiku / GPT-4o-mini  (cheap)
Complex task (code, reasoning)      → Sonnet / GPT-4o      (best)
Everything else                     → Free models first

No config needed — SmartSplit detects task complexity and chooses the best model tier automatically.

Your coding assistant (Continue, Cline, Aider, Cursor...)
         |
    SmartSplit (localhost:8420)
         |
    ┌────┼──────────────────────────┐
    |    |          |                |
   Code  Search   Translate       Reasoning
    |    |          |                |
  Best   Best     Best            Best
  model  engine   model           model
    |    |          |                |
    └────┼──────────┼────────────────┘
         |
    Combined response

Quick Start

1. Install

pip install smartsplit
# or: uv pip install smartsplit

2. Get a free API key (2 minutes)

You need one key to start. Sign up at groq.com and copy your API key.

Add more providers later for better routing. Each new provider = better results, more fallbacks. See Providers.

3. Start SmartSplit

export GROQ_API_KEY="gsk_..."
smartsplit
  SmartSplit — Multi-LLM backend
  http://127.0.0.1:8420/v1
  Mode: balanced
Or use Docker
# Create a .env file with your API keys
echo 'GROQ_API_KEY=gsk_...' > .env

# Run with Docker
docker run -p 8420:8420 --env-file .env ghcr.io/dsteinberger/smartsplit

# Or with Docker Compose
docker compose up -d

4. Connect your coding tool

Continue (VS Code / JetBrains)

Copy examples/.continuerc.json to your project as .continuerc.json, or add to ~/.continue/config.yaml:

models:
  - name: SmartSplit
    provider: openai
    model: smartsplit
    apiBase: http://localhost:8420/v1
    apiKey: free
Cline (VS Code)

In the Cline sidebar, click the gear icon:

  1. Select OpenAI Compatible as provider
  2. Base URL: http://localhost:8420/v1
  3. API Key: free
  4. Model ID: smartsplit
Aider (Terminal)

Copy examples/.aider.conf.yml to your project as .aider.conf.yml, or run:

aider --model openai/smartsplit --openai-api-base http://localhost:8420/v1 --openai-api-key free
OpenCode (Terminal)

Copy examples/opencode.json to your project root, run opencode providers to add the API key (free), then select the model with /models.

Tabby (Self-hosted autocomplete)

Add to ~/.tabby/config.toml:

[model.chat.http]
kind = "openai/chat"
model_name = "smartsplit"
api_endpoint = "http://localhost:8420/v1"
api_key = "free"
Void (Open-source IDE)

In Void settings:

  1. Find OpenAI-Compatible section → set Base URL http://localhost:8420/v1, API Key free
  2. In Models section → Add Model, select OpenAI-Compatible, name: smartsplit
Any OpenAI-compatible client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8420/v1", api_key="free")

SmartSplit works with any tool that supports a custom OpenAI endpoint: Continue, Cline, Aider, OpenCode, Tabby, Void, Cursor, Open WebUI, Chatbox, LibreChat, Jan, and more.

That's it. Three steps: install, add one API key, connect your tool. Your assistant now has access to every top free LLM.


How It Works

Every request is automatically classified into one of two modes:

RESPOND — route to the best model

Your prompt is analyzed, split into subtasks if needed, and each one is routed to the best provider:

"Write a Python function to parse CSV and handle errors"

  [code]       → best code model
  [reasoning]  → best reasoning model
  [synthesis]  → combines results

  → One coherent response

ENRICH — search the web first, then route

When the prompt needs current data, SmartSplit searches the web first:

"What are the new features in Python 3.13?"

  [web_search] → search engine
  [summarize]  → best summarization model

  → Response with real, current data

Context-aware

SmartSplit passes your full conversation history to the LLM — system prompts, previous messages, everything. For multi-subtask prompts, a context summary is injected into each subtask so no information is lost.

Built-in reliability

Feature What it does
Circuit breaker 3 failures in 5 min → provider auto-disabled for 30 min
Quality gates Detects refusals ("I cannot...") → auto-escalation to next provider
Fallback chains Provider fails → next best one takes over, seamlessly
Decompose cache Repeated prompts skip analysis (LRU, 24h TTL)
Context preservation Full conversation history passed to each LLM
Adaptive scoring Learns which providers work best from real results (MAB/UCB1)

Providers

Supported providers

Provider Type Best at
Cerebras Free Reasoning, general (Qwen 3 235B)
Groq Free Fast inference (LLaMA 3.3 70B)
Gemini Free Math, reasoning (Gemini 2.5 Flash)
OpenRouter Free Code (Qwen3 Coder 480B)
Mistral Free Translation (Mistral Small)
HuggingFace Free backup Code (Qwen2.5 Coder 32B)
Cloudflare Free backup General (LLaMA 3.3 70B)
DeepSeek Paid Code, reasoning
Anthropic Paid Complex tasks (Claude)
OpenAI Paid Complex tasks (GPT-4o)
Serper Free Web search
Tavily Free Web search

Add providers by setting environment variables:

export GROQ_API_KEY="gsk_..."
export GEMINI_API_KEY="AIza..."
export DEEPSEEK_API_KEY="sk-..."
export CEREBRAS_API_KEY="csk-..."
export MISTRAL_API_KEY="..."
export OPENROUTER_API_KEY="sk-or-..."
export HF_TOKEN="hf_..."
export CLOUDFLARE_API_KEY="..."
export CLOUDFLARE_ACCOUNT_ID="..."
export SERPER_API_KEY="..."

More providers = better routing, more fallbacks, higher resilience.

Format translation is automatic. Most providers use the OpenAI format natively. Gemini uses Google's own format — SmartSplit translates on the fly. Your client talks OpenAI, SmartSplit handles the rest.

Paid providers (Anthropic, OpenAI) are also supported as optional fallbacks. They're disabled by default.

Routing table
Task          Best free providers (ranked)
─────────────────────────────────────────────
code          OpenRouter > Cerebras = Gemini > Groq = HuggingFace
reasoning     Cerebras > Gemini = OpenRouter > Groq
summarize     Cerebras > Groq = Gemini = Mistral = OpenRouter
translation   Mistral > Gemini > Groq = Cerebras
web search    Serper or Tavily
boilerplate   Cerebras = Groq > Gemini = Mistral = OpenRouter
math          OpenRouter = Gemini > Cerebras > Groq
general       Cerebras > Gemini = OpenRouter > Groq = Mistral

Backups:      HuggingFace, Cloudflare (lower quality, high availability)

Metrics

curl http://localhost:8420/metrics
{
  "requests": { "total": 142, "enrich": 42, "respond": 100 },
  "savings": { "tokens_saved": 45000, "cost_saved_usd": 0.135 },
  "cache": { "hits": 23, "hit_rate": 16.2 },
  "circuit_breaker": { "unhealthy_providers": [] }
}

Also available: GET /health · GET /savings


Configuration

CLI options
smartsplit                          # defaults: port 8420, balanced mode
smartsplit --port 3456              # custom port
smartsplit --mode economy           # max free usage
smartsplit --mode quality           # prefer quality over speed
smartsplit --log-level DEBUG        # verbose logging
Config file (alternative to env vars)
cp smartsplit.example.json smartsplit.json
# Edit with your API keys

You can also tune provider settings and routing:

{
  "mode": "balanced",
  "free_llm_priority": ["cerebras", "groq", "gemini", "openrouter", "mistral", "huggingface", "cloudflare"],
  "providers": {
    "groq": {
      "model": "llama-3.3-70b-versatile",
      "temperature": 0.3,
      "max_tokens": 4096
    },
    "serper": {
      "max_search_results": 5
    }
  }
}
Option Default What it does
free_llm_priority cerebras, groq, gemini, openrouter, mistral, huggingface, cloudflare Fallback order for free LLM calls
providers.*.model per-provider default Override the default model
providers.*.temperature 0.3 LLM temperature
providers.*.max_tokens 4096 Max output tokens
providers.*.max_search_results 5 Number of web search results
Docker
# Using the published image
docker run -p 8420:8420 --env-file .env ghcr.io/dsteinberger/smartsplit

# Or build locally
docker build -t smartsplit .
docker run -p 8420:8420 --env-file .env smartsplit

Create a .env file with your API keys:

GROQ_API_KEY=gsk_...
SERPER_API_KEY=...
GEMINI_API_KEY=AIza...

Never commit .env to git — it's already in .gitignore.

Or use Docker Compose:

docker compose up -d

See docker-compose.yml for the full setup.


Development

Prerequisites: Python 3.11+ and uv (recommended) or pip.

git clone https://github.com/dsteinberger/smartsplit.git
cd smartsplit
make install              # or: pip install -e ".[dev]"

make check                # lint + format check + tests
make test                 # tests only
make run                  # start server (requires at least one API key)
make help                 # all commands

Note: make test runs all tests without any API key — no provider needed for development.

See CONTRIBUTING.md for guidelines.


Architecture

smartsplit/
  proxy.py           HTTP server + LLM-based triage + CLI
  formats.py         OpenAI format conversion + SSE streaming
  planner.py         Prompt decomposition + synthesis + LRU cache
  router.py          Provider scoring + routing + quality gates
  learning.py        MAB (UCB1) adaptive scoring — learns from real results
  quota.py           Usage tracking + savings report
  config.py          Configuration + env vars
  models.py          Pydantic models + StrEnum
  exceptions.py      Custom error hierarchy
  providers/         One file per provider (3 lines for OpenAI-compatible)

Adding a new provider is 2 lines (model is set in config):

class NewProvider(OpenAICompatibleProvider):
    name = "new"
    api_url = "https://api.new.com/v1/chat/completions"

Disclaimer

SmartSplit is a personal development tool. Each user must provide their own API keys and comply with the terms of service of each provider they use. SmartSplit does not store, share, or redistribute API keys or access. The authors are not responsible for any misuse or ToS violations by end users.


MIT License · Contributing · Security · Changelog

Star this repo to follow updates — new providers, streaming, and more coming soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartsplit-0.1.0.tar.gz (4.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smartsplit-0.1.0-py3-none-any.whl (50.3 kB view details)

Uploaded Python 3

File details

Details for the file smartsplit-0.1.0.tar.gz.

File metadata

  • Download URL: smartsplit-0.1.0.tar.gz
  • Upload date:
  • Size: 4.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smartsplit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0ab671a5b730e225b46207a266f7b91263feb271bb677c0069d0d6fd7ed9e422
MD5 52d99bd9f45517914ad2af2e4db9835f
BLAKE2b-256 274e95e7e11afdf43e2915aa30746c291d7685bc851d73d73a008a0b69dd862a

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartsplit-0.1.0.tar.gz:

Publisher: publish.yml on dsteinberger/smartsplit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smartsplit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: smartsplit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smartsplit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 faf20ff1a0ce374e24d2d0aacaf02564fbfbbcdc1e2297f1e152b7a42a924dcc
MD5 b23b965b925b76416fc0a59b94bbd8e7
BLAKE2b-256 3282434228521884acc3ffb8655f3a49eacec9c57163ec2072c1e772c945de2d

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartsplit-0.1.0-py3-none-any.whl:

Publisher: publish.yml on dsteinberger/smartsplit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page