maslul

Smart LLM router — one call, the right model.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ilia.tankelevich

These details have not been verified by PyPI

Project description

maslul

Smart LLM router — one call, the right model.

Async and fully typed, across Anthropic, Gemini, xAI Grok, and OpenAI — routing each request to the right model tier by difficulty. Stop hardcoding model choices and stop re-writing the tool-use / structured-output / web-search / retry plumbing for every provider.

maslul (Hebrew מסלול, "route / lane") is a small library that does exactly two things: routing (pick a model tier per request, or pin one) and provider normalization (one Request/Response shape for every SDK). No server, no CLI, no heavy ML deps — providers live behind extras, and the core is stdlib-only.

import asyncio
from maslul import Router, Request, Message

router = Router.from_toml("maslul.toml")           # tiers + classifier + providers, from config

async def main() -> None:
    resp = await router.complete(Request(messages=[Message(role="user", content="Hello!")]))
    print(resp.text, "·", resp.level_used, "·", resp.usage.output_tokens, "tokens")

asyncio.run(main())

Install

pip install "maslul[anthropic,gemini,grok]"     # or just the providers you use

Each provider's SDK lives behind an extra, so import maslul pulls in none of them — you only install what you route to. maslul[anthropic] → anthropic; maslul[gemini] → google-genai; maslul[grok] → xai-sdk; maslul[openai] → openai.

How it compares

maslul is a library, not a gateway — you embed the routing brain in your app, you don't run a proxy in front of it.

	maslul	RouteLLM	LiteLLM
Shape	async library you embed (no server)	research framework / trained router	unified SDK + proxy server
Routing	difficulty tiers + swappable strategies (`route_default` / `classify` / `classify_and_answer` / `verify_cascade`) + injectable `bypass` / `classifier` / `verifier` hooks	a trained strong-vs-weak router	manual config / fallback lists, load-balancing
Providers	Anthropic · Gemini · Grok · OpenAI, normalized	model-agnostic (you wire models)	100+ providers
Tools / structured / vision	one normalized loop for all	—	per-provider
Web search	one flag, every provider → `Response.sources`	—	per-provider
Caching	exact + semantic (in-process)	—	exact + semantic (proxy)
Typing / footprint	fully typed, `py.typed`; stdlib core, SDKs behind extras	research code	larger; server to operate

Choose maslul when you want a typed async library you embed — difficulty routing with your own strategy + hooks, and one Request/Response over several providers (tools, structured output, vision, web search, retries, cost cache) — without standing up a gateway. Reach for LiteLLM when you want a provider proxy across 100+ models, or RouteLLM when you specifically want a trained router.

The routing brain

flowchart LR
    R["complete(req)"] --> M{"model= pin?"}
    M -- yes --> RUN["run that model"]
    M -- no --> L{"level= pin?"}
    L -- yes --> RUN
    L -- no --> B{"bypass_predicate?"}
    B -- "tier" --> RUN
    B -- "None" --> H{"hard_signal?<br/>(media · code · long · intent verbs)"}
    H -- "yes" --> HARD["HARD tier"] --> RUN
    H -- "no" --> S["strategy<br/>route_default · classify ·<br/>classify_and_answer · verify_cascade"] --> RUN
    RUN --> X["tool loop · web search ·<br/>retry / fallback · usage breakdown"]

Routing

Difficulty is not readable from surface features — a short prompt can be very hard, a long paste trivial — so maslul never applies a short ⇒ simple rule. You choose how each request is routed, in this precedence order:

from maslul import Level

await router.complete(req, model="anthropic:claude-opus-4-8")  # 0. pin an exact model
await router.complete(req, level=Level.HARD)                   # 1. pin a difficulty tier
await router.complete(req)                                     # 2-4. let the router decide

When you don't pin, the routing brain runs: a deterministic bypass (your fast-path, e.g. greetings → SIMPLE) → a hard-signal detector (intent verbs, code, attachments, long context → HARD, up-only) → the configured strategy for the ambiguous middle:

Strategy	Cost for the middle	What it does
`ROUTE_DEFAULT`	0 calls	Default-to-capable (`default_level`). Best for low volume.
`CLASSIFY`	1 classify + 1 answer	A cheap dedicated classifier model labels the level (cached + budget-guarded), then dispatch.
`CLASSIFY_AND_ANSWER`	1 call	The classifier model answers directly, or emits an escalation sentinel to bump to a stronger tier.
`VERIFY_CASCADE`	1 cheap + verify	Answer cheap, run your verifier, escalate if it rejects — catches silent under-escalation.

All three injection points are yours to supply:

def my_classifier(req):      # your own difficulty call (sync or async); None defers to the strategy
    return Level.SIMPLE if is_trivial(req) else None

def my_verifier(req, resp):  # VERIFY_CASCADE: True keeps the cheap answer, False escalates
    return "I don't know" not in resp.text

router = Router.from_toml("maslul.toml", classifier=my_classifier, verifier=my_verifier)

One shape for every capability

The same Request/Response works across all three providers:

from maslul import Request, Message, ToolDef, ToolCall, MediaPart

# Tools — the router runs a provider-agnostic tool-use loop
async def get_weather(call: ToolCall) -> str:
    return f"18°C in {call.input['city']}"

req = Request(
    messages=[Message(role="user", content="Weather in Paris?")],
    tools=[ToolDef(name="get_weather", description="Current weather for a city.",
                   input_schema={"type": "object", "properties": {"city": {"type": "string"}},
                                 "required": ["city"]})],
    tool_executor=get_weather,
)

# Structured output — response_format → resp.structured (parsed)
req = Request(messages=[Message(role="user", content="Extract name + age")],
              response_format={"type": "object", "properties": {"name": {"type": "string"},
                                                                "age": {"type": "integer"}}})

# Vision — images / PDFs
req = Request(messages=[Message(role="user", content="What's in this image?")],
              media=[MediaPart(mime_type="image/png", data=png_bytes)])

# Web search — one flag, grounded on ANY provider (Anthropic web_search / Gemini Google Search /
# Grok Agent Tools); citations land in resp.sources regardless of which model answers.
req = Request(messages=[Message(role="user", content="Latest news on X?")], web_search=True)

Resilience & observability

def on_usage(resp):                         # per-model token breakdown for monitoring
    for rec in resp.usage_records:
        metrics.incr(f"{rec.provider}:{rec.model}", rec.usage.output_tokens)

router = Router.from_toml("maslul.toml", on_complete=on_usage)

Transient errors (RateLimited, Timeout) retry with exponential backoff; on persistent failure the request falls back to the next-higher tier — which may be a different provider, giving you cross-provider failover for free. AuthError fails fast. Hooks: on_route (the RoutingDecision), on_complete (the final Response with usage_records), on_error (each failed attempt).

Build a router with missing_provider="degrade" and any tier whose provider isn't configured (e.g. a Grok tier with no XAI_API_KEY) falls back to the nearest available tier instead of erroring — so one config runs across deploys that have different keys.

Cost cache

A [maslul.cache] config returns a prior Response instead of calling a model — exact (identical request) or semantic (nearest request above a cosine threshold, using an embedder you inject, since maslul ships no embeddings). A hit comes back with cached=True and zeroed usage, so monitoring sees the saving. Tool-using requests are never cached.

[maslul.cache]
mode = "semantic"          # off | exact | semantic
max_entries = 1000
ttl_seconds = 86400
similarity_threshold = 0.95

router = Router.from_toml("maslul.toml", embed=my_async_embed)   # embed only needed for semantic

Configuration

A TOML file (or a plain dict — Router(config={...})):

[maslul]
strategy = "route_default"        # route_default | classify | classify_and_answer | verify_cascade
default_level = "hard"            # default-to-capable for the ambiguous middle
min_tokens_to_classify = 40       # CLASSIFY budget guard
request_timeout = 60              # per-call seconds (optional)
max_retries = 2
fallback = true                   # escalate to a higher tier on persistent failure

[maslul.tiers.simple]
provider = "gemini"
model = "gemini-2.5-flash-lite"
[maslul.tiers.medium]
model = "anthropic:claude-haiku-4-5"   # or the provider:model shorthand
[maslul.tiers.hard]
model = "anthropic:claude-sonnet-4-6"

[maslul.classifier]               # required for the classify strategies
model = "anthropic:claude-haiku-4-5"

[maslul.providers.anthropic]
api_key_env = "ANTHROPIC_API_KEY"      # secrets by env-var name, never inlined
[maslul.providers.gemini]
vertex_project = "my-gcp-project"      # Vertex AI + Application Default Credentials (no key)
vertex_location = "global"
[maslul.providers.grok]
api_key_env = "XAI_API_KEY"

Pointing a capability at a different model or provider is a one-line config change — no code deploy. Providers can also be injected directly (Router(config, providers={...})) for tests or custom wiring.

Providers

Provider	SDK (extra)	Auth
`anthropic`	`anthropic`	`ANTHROPIC_API_KEY`
`gemini`	`google-genai`	Vertex AI + ADC (`vertex_project`), or a Gemini Developer API key
`grok`	`xai-sdk`	`XAI_API_KEY`
`openai`	`openai`	`OPENAI_API_KEY`

Status

Beta (0.2.x), fully typed (py.typed), async-first. Routing, tool use, structured output, vision, web search across all three providers (web_search=True), the four strategies, and retry/fallback resilience are implemented and exercised against live APIs.

License

MIT © Ilia Tankelevich

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ilia.tankelevich

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.1

Jun 17, 2026

0.2.0

Jun 17, 2026

0.1.0

Jun 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maslul-0.2.1.tar.gz (143.4 kB view details)

Uploaded Jun 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

maslul-0.2.1-py3-none-any.whl (38.0 kB view details)

Uploaded Jun 17, 2026 Python 3

File details

Details for the file maslul-0.2.1.tar.gz.

File metadata

Download URL: maslul-0.2.1.tar.gz
Upload date: Jun 17, 2026
Size: 143.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for maslul-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`b46b3f36aed03d40aaf93c525f974c61cd03cf07ee73e49ea07756f6df86af85`
MD5	`cedb3250f83864270d6fbebff58850a4`
BLAKE2b-256	`8a191002469ef7d1843870b54cfb1ffa5f2c0e236e827b30691ad0c3dd1832e5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for maslul-0.2.1.tar.gz:

Publisher: release.yml on iliatankelevich/maslul

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: maslul-0.2.1.tar.gz
- Subject digest: b46b3f36aed03d40aaf93c525f974c61cd03cf07ee73e49ea07756f6df86af85
- Sigstore transparency entry: 1851967840
- Sigstore integration time: Jun 17, 2026
Source repository:
- Permalink: iliatankelevich/maslul@82db99e65d3748226128fd0e9f00c2428f100f1d
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/iliatankelevich
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@82db99e65d3748226128fd0e9f00c2428f100f1d
- Trigger Event: push

File details

Details for the file maslul-0.2.1-py3-none-any.whl.

File metadata

Download URL: maslul-0.2.1-py3-none-any.whl
Upload date: Jun 17, 2026
Size: 38.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for maslul-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae5b5f7f31f304482500aff9f9959631f3d569769fbae97fe3be1649812e0697`
MD5	`d2ee1f6ab1e1b821db36edd88986bd2a`
BLAKE2b-256	`422f57a173ee08c79d7e129995910187fc18b8b34804d361a498d5fc89def320`

See more details on using hashes here.

Provenance

The following attestation bundles were made for maslul-0.2.1-py3-none-any.whl:

Publisher: release.yml on iliatankelevich/maslul

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: maslul-0.2.1-py3-none-any.whl
- Subject digest: ae5b5f7f31f304482500aff9f9959631f3d569769fbae97fe3be1649812e0697
- Sigstore transparency entry: 1851967970
- Sigstore integration time: Jun 17, 2026
Source repository:
- Permalink: iliatankelevich/maslul@82db99e65d3748226128fd0e9f00c2428f100f1d
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/iliatankelevich
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@82db99e65d3748226128fd0e9f00c2428f100f1d
- Trigger Event: push

maslul 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

maslul

Install

How it compares

The routing brain

Routing

One shape for every capability

Resilience & observability

Cost cache

Configuration

Providers

Status

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance