smartsplit

Free multi-LLM backend — decompose prompts into subtasks and route to the best free model for each task

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dsteinberger

These details have not been verified by PyPI

Project description

Why use one LLM when you can use them all?

Smart routing to the best model for each task — picks the right model tier automatically.
Free by default. Optimizes paid tokens when available.

Quick Start · How It Works · Providers · Metrics

Who is this for?

Developers without a paid subscription who want a powerful AI coding assistant using free LLMs.
Developers with a paid API budget who want to make it last — SmartSplit routes simple tasks to free models and saves your paid tokens (OpenAI, Anthropic) for complex work. No config needed, it's the default behavior.
Teams who want to combine multiple LLMs without changing their existing tools.
Anyone frustrated by a single model that's great at code but bad at everything else.

The problem

You ask your coding assistant to write a function, explain an algorithm, translate a comment, and find the latest docs. It sends everything to the same model — and that model is average at most of these tasks.

Before SmartSplit:

You: "Write a Python CSV parser, explain the edge cases, and translate the docstrings to French"

→ Everything goes to one model
→ Code is okay, explanation is shallow, translation is awkward

After SmartSplit:

Same prompt, same client, same workflow

→ Code subtask      → best code model (deep, accurate)
→ Reasoning subtask → best reasoning model (thorough)
→ Translation       → language specialist (native quality)
→ Simple boilerplate → fast cheap model (saves your budget)
→ Combined into one coherent response

Same tool. Better answers. No config change.

What makes SmartSplit different

Multiply your free tier. Instead of burning through one provider's quota, SmartSplit spreads requests across all your configured providers — each one contributing its free tier. More providers = more capacity.

Self-healing. A provider goes down or hits its rate limit? You won't even notice. SmartSplit detects failures, disables the provider temporarily, and routes to the next best one — automatically.

Web-aware. When your prompt needs current data ("latest", "news", "2026"...), SmartSplit detects it and searches the web before answering. No plugin needed — it's built in.

Stretch your paid tokens. Got an OpenAI or Anthropic API key? Add it, and SmartSplit picks the right model for each task automatically:

Simple task (boilerplate, summary)  → Haiku / GPT-4o-mini  (cheap)
Complex task (code, reasoning)      → Sonnet / GPT-4o      (best)
Everything else                     → Free models first

No config needed — SmartSplit detects task complexity and chooses the best model tier automatically.

Your coding assistant (Continue, Cline, Aider, Cursor...)
         |
    SmartSplit (localhost:8420)
         |
    ┌────┼──────────────────────────┐
    |    |          |                |
   Code  Search   Translate       Reasoning
    |    |          |                |
  Best   Best     Best            Best
  model  engine   model           model
    |    |          |                |
    └────┼──────────┼────────────────┘
         |
    Combined response

Quick Start

1. Install

pip install smartsplit
# or: uv pip install smartsplit

2. Get a free API key (2 minutes)

You need one key to start. Sign up at groq.com and copy your API key.

Add more providers later for better routing. Each new provider = better results, more fallbacks. See Providers.

3. Start SmartSplit

export GROQ_API_KEY="gsk_..."
smartsplit

  SmartSplit — Multi-LLM backend
  http://127.0.0.1:8420/v1
  Mode: balanced

Or use Docker

# Create a .env file with your API keys
echo 'GROQ_API_KEY=gsk_...' > .env

# Run with Docker
docker run -p 8420:8420 --env-file .env ghcr.io/dsteinberger/smartsplit

# Or with Docker Compose
docker compose up -d

4. Connect your coding tool

Continue (VS Code / JetBrains)

Copy examples/.continuerc.json to your project as .continuerc.json, or add to ~/.continue/config.yaml:

models:
  - name: SmartSplit
    provider: openai
    model: smartsplit
    apiBase: http://localhost:8420/v1
    apiKey: free

Cline (VS Code)

In the Cline sidebar, click the gear icon:

Select OpenAI Compatible as provider
Base URL: http://localhost:8420/v1
API Key: free
Model ID: smartsplit

Aider (Terminal)

Copy examples/.aider.conf.yml to your project as .aider.conf.yml, or run:

aider --model openai/smartsplit --openai-api-base http://localhost:8420/v1 --openai-api-key free

OpenCode (Terminal)

Copy examples/opencode.json to your project root, run opencode providers to add the API key (free), then select the model with /models.

Tabby (Self-hosted autocomplete)

Add to ~/.tabby/config.toml:

[model.chat.http]
kind = "openai/chat"
model_name = "smartsplit"
api_endpoint = "http://localhost:8420/v1"
api_key = "free"

Void (Open-source IDE)

In Void settings:

Find OpenAI-Compatible section → set Base URL http://localhost:8420/v1, API Key free
In Models section → Add Model, select OpenAI-Compatible, name: smartsplit

Any OpenAI-compatible client

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8420/v1", api_key="free")

SmartSplit works with any tool that supports a custom OpenAI endpoint: Continue, Cline, Aider, OpenCode, Tabby, Void, Cursor, Open WebUI, Chatbox, LibreChat, Jan, and more.

That's it. Three steps: install, add one API key, connect your tool. Your assistant now has access to every top free LLM.

How It Works

Every request is automatically classified into one of two modes:

RESPOND — route to the best model

Your prompt is analyzed, split into subtasks if needed, and each one is routed to the best provider:

"Write a Python function to parse CSV and handle errors"

  [code]       → best code model
  [reasoning]  → best reasoning model
  [synthesis]  → combines results

  → One coherent response

ENRICH — search the web first, then route

When the prompt needs current data, SmartSplit searches the web first:

"What are the new features in Python 3.13?"

  [web_search] → search engine
  [summarize]  → best summarization model

  → Response with real, current data

Context-aware

SmartSplit passes your full conversation history to the LLM — system prompts, previous messages, everything. For multi-subtask prompts, a context summary is injected into each subtask so no information is lost.

Built-in reliability

Feature	What it does
Circuit breaker	3 failures in 5 min → provider auto-disabled for 30 min
Quality gates	Detects refusals ("I cannot...") → auto-escalation to next provider
Fallback chains	Provider fails → next best one takes over, seamlessly
Decompose cache	Repeated prompts skip analysis (LRU, 24h TTL)
Context preservation	Full conversation history passed to each LLM
Adaptive scoring	Learns which providers work best from real results (MAB/UCB1)

Providers

Supported providers

Provider	Type	Best at
Cerebras	Free	Reasoning, general (Qwen 3 235B)
Groq	Free	Fast inference (LLaMA 3.3 70B)
Gemini	Free	Math, reasoning (Gemini 2.5 Flash)
OpenRouter	Free	Code (Qwen3 Coder 480B)
Mistral	Free	Translation (Mistral Small)
HuggingFace	Free backup	Code (Qwen2.5 Coder 32B)
Cloudflare	Free backup	General (LLaMA 3.3 70B)
DeepSeek	Paid	Code, reasoning
Anthropic	Paid	Complex tasks (Claude)
OpenAI	Paid	Complex tasks (GPT-4o)
Serper	Free	Web search
Tavily	Free	Web search

Add providers by setting environment variables:

export GROQ_API_KEY="gsk_..."
export GEMINI_API_KEY="AIza..."
export DEEPSEEK_API_KEY="sk-..."
export CEREBRAS_API_KEY="csk-..."
export MISTRAL_API_KEY="..."
export OPENROUTER_API_KEY="sk-or-..."
export HF_TOKEN="hf_..."
export CLOUDFLARE_API_KEY="..."
export CLOUDFLARE_ACCOUNT_ID="..."
export SERPER_API_KEY="..."

More providers = better routing, more fallbacks, higher resilience.

Format translation is automatic. Most providers use the OpenAI format natively. Gemini uses Google's own format — SmartSplit translates on the fly. Your client talks OpenAI, SmartSplit handles the rest.

Paid providers (Anthropic, OpenAI) are also supported as optional fallbacks. They're disabled by default.

Routing table

Task          Best free providers (ranked)
─────────────────────────────────────────────
code          OpenRouter > Cerebras = Gemini > Groq = HuggingFace
reasoning     Cerebras > Gemini = OpenRouter > Groq
summarize     Cerebras > Groq = Gemini = Mistral = OpenRouter
translation   Mistral > Gemini > Groq = Cerebras
web search    Serper or Tavily
boilerplate   Cerebras = Groq > Gemini = Mistral = OpenRouter
math          OpenRouter = Gemini > Cerebras > Groq
general       Cerebras > Gemini = OpenRouter > Groq = Mistral

Backups:      HuggingFace, Cloudflare (lower quality, high availability)

Metrics

curl http://localhost:8420/metrics

{
  "requests": { "total": 142, "enrich": 42, "respond": 100 },
  "savings": { "tokens_saved": 45000, "cost_saved_usd": 0.135 },
  "cache": { "hits": 23, "hit_rate": 16.2 },
  "circuit_breaker": { "unhealthy_providers": [] }
}

Also available: GET /health · GET /savings

Configuration

CLI options

smartsplit                          # defaults: port 8420, balanced mode
smartsplit --port 3456              # custom port
smartsplit --mode economy           # max free usage
smartsplit --mode quality           # prefer quality over speed
smartsplit --log-level DEBUG        # verbose logging

Config file (alternative to env vars)

cp smartsplit.example.json smartsplit.json
# Edit with your API keys

You can also tune provider settings and routing:

{
  "mode": "balanced",
  "free_llm_priority": ["cerebras", "groq", "gemini", "openrouter", "mistral", "huggingface", "cloudflare"],
  "providers": {
    "groq": {
      "model": "llama-3.3-70b-versatile",
      "temperature": 0.3,
      "max_tokens": 4096
    },
    "serper": {
      "max_search_results": 5
    }
  }
}

Option	Default	What it does
`free_llm_priority`	cerebras, groq, gemini, openrouter, mistral, huggingface, cloudflare	Fallback order for free LLM calls
`providers.*.model`	per-provider default	Override the default model
`providers.*.temperature`	`0.3`	LLM temperature
`providers.*.max_tokens`	`4096`	Max output tokens
`providers.*.max_search_results`	`5`	Number of web search results

Docker

# Using the published image
docker run -p 8420:8420 --env-file .env ghcr.io/dsteinberger/smartsplit

# Or build locally
docker build -t smartsplit .
docker run -p 8420:8420 --env-file .env smartsplit

Create a .env file with your API keys:

GROQ_API_KEY=gsk_...
SERPER_API_KEY=...
GEMINI_API_KEY=AIza...

Never commit .env to git — it's already in .gitignore.

Or use Docker Compose:

docker compose up -d

See docker-compose.yml for the full setup.

Development

Prerequisites: Python 3.11+ and uv (recommended) or pip.

git clone https://github.com/dsteinberger/smartsplit.git
cd smartsplit
make install              # or: pip install -e ".[dev]"

make check                # lint + format check + tests
make test                 # tests only
make run                  # start server (requires at least one API key)
make help                 # all commands

Note: make test runs all tests without any API key — no provider needed for development.

See CONTRIBUTING.md for guidelines.

Architecture

smartsplit/
  proxy.py           HTTP server + LLM-based triage + CLI
  formats.py         OpenAI format conversion + SSE streaming
  planner.py         Prompt decomposition + synthesis + LRU cache
  router.py          Provider scoring + routing + quality gates
  learning.py        MAB (UCB1) adaptive scoring — learns from real results
  quota.py           Usage tracking + savings report
  config.py          Configuration + env vars
  models.py          Pydantic models + StrEnum
  exceptions.py      Custom error hierarchy
  providers/         One file per provider (3 lines for OpenAI-compatible)

Adding a new provider is 2 lines (model is set in config):

class NewProvider(OpenAICompatibleProvider):
    name = "new"
    api_url = "https://api.new.com/v1/chat/completions"

Disclaimer

SmartSplit is a personal development tool. Each user must provide their own API keys and comply with the terms of service of each provider they use. SmartSplit does not store, share, or redistribute API keys or access. The authors are not responsible for any misuse or ToS violations by end users.

MIT License · Contributing · Security · Changelog

Star this repo to follow updates — new providers, streaming, and more coming soon.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dsteinberger

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0 yanked

Apr 10, 2026

Reason this release was yanked:

Not working

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartsplit-0.1.0.tar.gz (4.7 MB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smartsplit-0.1.0-py3-none-any.whl (50.3 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file smartsplit-0.1.0.tar.gz.

File metadata

Download URL: smartsplit-0.1.0.tar.gz
Upload date: Apr 10, 2026
Size: 4.7 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smartsplit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0ab671a5b730e225b46207a266f7b91263feb271bb677c0069d0d6fd7ed9e422`
MD5	`52d99bd9f45517914ad2af2e4db9835f`
BLAKE2b-256	`274e95e7e11afdf43e2915aa30746c291d7685bc851d73d73a008a0b69dd862a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartsplit-0.1.0.tar.gz:

Publisher: publish.yml on dsteinberger/smartsplit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: smartsplit-0.1.0.tar.gz
- Subject digest: 0ab671a5b730e225b46207a266f7b91263feb271bb677c0069d0d6fd7ed9e422
- Sigstore transparency entry: 1271531226
- Sigstore integration time: Apr 10, 2026
Source repository:
- Permalink: dsteinberger/smartsplit@edc3c17ede24afe56588187a6a10a96a90dacc6f
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/dsteinberger
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@edc3c17ede24afe56588187a6a10a96a90dacc6f
- Trigger Event: push

File details

Details for the file smartsplit-0.1.0-py3-none-any.whl.

File metadata

Download URL: smartsplit-0.1.0-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 50.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smartsplit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`faf20ff1a0ce374e24d2d0aacaf02564fbfbbcdc1e2297f1e152b7a42a924dcc`
MD5	`b23b965b925b76416fc0a59b94bbd8e7`
BLAKE2b-256	`3282434228521884acc3ffb8655f3a49eacec9c57163ec2072c1e772c945de2d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartsplit-0.1.0-py3-none-any.whl:

Publisher: publish.yml on dsteinberger/smartsplit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: smartsplit-0.1.0-py3-none-any.whl
- Subject digest: faf20ff1a0ce374e24d2d0aacaf02564fbfbbcdc1e2297f1e152b7a42a924dcc
- Sigstore transparency entry: 1271531232
- Sigstore integration time: Apr 10, 2026
Source repository:
- Permalink: dsteinberger/smartsplit@edc3c17ede24afe56588187a6a10a96a90dacc6f
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/dsteinberger
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@edc3c17ede24afe56588187a6a10a96a90dacc6f
- Trigger Event: push

smartsplit 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Who is this for?

The problem

What makes SmartSplit different

Quick Start

1. Install

2. Get a free API key (2 minutes)

3. Start SmartSplit

4. Connect your coding tool

How It Works

RESPOND — route to the best model

ENRICH — search the web first, then route

Context-aware

Built-in reliability

Providers

Supported providers

Metrics

Configuration

Development

Architecture

Disclaimer

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance