Cross-runtime LLM orchestration for Python, TypeScript, and Rust with L2 caching, retries, key rotation, and config-driven runtime control

Project description

LLMix

Config-driven harness around your LLM SDK. Swap models by editing MDA presets. Keep the SDK you already use.

LLMix sits above your existing LLM client — openai, anthropic, AI SDK, LiteLLM, anything with a callable signature — and wraps it with the production primitives you'd otherwise rebuild from scratch: an MDA-driven config layer, a two-tier response cache, a circuit breaker, key-pool rotation, and singleflight deduplication.

Provider, model, and parameters can live in a .mda preset. Edit the preset, publish or reload config, and get a different model at runtime. No redeploy.

At a Glance

LLMix wraps your existing LLM SDK stack with MDA config, cache, resilience, and key-pool primitives.

Works with: AI SDK v6 · openai (Py/JS) · anthropic · google-genai · LiteLLM · any async callable that returns your model's response.

Three Things It Does

Config-driven model swap. Provider, model, and params are data, not code. Drop in a new MDA preset, the next call can route to a different provider. Built for agent harnesses that reshape behavior via config, not redeploys.

Production resilience, no extra code. Two-tier cache (L1 memory + L2 Redis), circuit breaker, key-pool rotation with auto-eviction of dead keys, single flight dedup, adaptive concurrency, retries that honor Retry-After. Composable with whatever SDK you already ship.

Runtime parity. Python, TypeScript, and Rust share byte-identical cache keys and retry semantics. Config authoring now uses MDA Source Mode across all three runtimes. (llmix-rs is currently beta — see rust/llmix-rs/README.md.)

Quick Start

Python

pip install sno-llmix

from llmix import (
    CallInput, CallPipeline, KeyPool, PipelineConfig,
    TwoTierCache, openai_dispatch,
)

pipeline = CallPipeline(PipelineConfig(
    dispatch=openai_dispatch(),
    response_cache=TwoTierCache("memory"),
))
pipeline.set_key_pool("openai", KeyPool(["sk-..."]))

response = await pipeline.call(CallInput(
    config={
        "provider": "openai",
        "model": "gpt-4.1-mini",
        "common": {"temperature": 0.7, "max_output_tokens": 1024},
        "caching": {"strategy": "memory"},
    },
    messages=[{"role": "user", "content": "Summarize this article..."}],
))

print(response.content, response.cache_hit)

TypeScript

import { CallPipeline, KeyPool, TwoTierCache, openaiDispatch } from "@snoai/llmix";

const pipeline = new CallPipeline({
  dispatch: openaiDispatch(),
  responseCache: new TwoTierCache("memory"),
});
pipeline.setKeyPool("openai", new KeyPool(["sk-..."]));

const response = await pipeline.call({
  config: {
    provider: "openai",
    model: "gpt-4.1-mini",
    common: { temperature: 0.2, maxOutputTokens: 2048 },
    caching: { strategy: "memory" },
  },
  messages: [{ role: "user", content: "Extract entities from this text." }],
});

console.log(response.content, response.usage);

Rust

use llmix_rs::{
    CallInput, CallPipeline, DispatchContext, KeyPool, LlmUsage,
    PipelineConfig, ProviderResult,
};
use serde_json::json;

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let pipeline = CallPipeline::new(PipelineConfig::new(|ctx: DispatchContext| async move {
        Ok(ProviderResult {
            content: format!("echo: {}", ctx.messages.last().and_then(|m| m.get("content")).and_then(|v| v.as_str()).unwrap_or("")),
            model: ctx.model,
            usage: LlmUsage { input_tokens: 1, output_tokens: 2, total_tokens: 3 },
            headers: None,
            tool_calls: None,
        })
    }))?;
    pipeline.set_key_pool("openai", KeyPool::new(vec!["sk-...".into()])?);

    let response = pipeline.call(CallInput {
        config: json!({"provider": "openai", "model": "gpt-4o-mini"}),
        messages: vec![json!({"role": "user", "content": "Extract entities."})],
        singleflight_key: None,
    }).await;

    println!("{}", response.content);
    pipeline.close().await;
    Ok(())
}

MDA Presets

LLMix turns editable MDA presets into immutable registry snapshots that Python, TypeScript, and Rust runtimes can read consistently.

---
name: extraction
description: Entity extraction preset.
metadata:
  snoai-llmix:
    common:
      provider: openai
      model: gpt-4.1-mini
      maxOutputTokens: 2048
      temperature: 0.2
    caching:
      strategy: redis-or-memory
    providerOptions:
      openai:
        reasoningEffort: medium
---
# extraction

Runtime settings plus human-readable operating notes live together.

Inside Every Call

LLMix request pipeline from config and cache lookup through circuit breaker, singleflight, key-pool rotation, retry loop, dispatch, and telemetry.

Concern	What LLMix does
Cache	L1 memory + optional Redis L2; cross-language byte-identical keys
Concurrency	AIMD adaptive semaphore with rate-limit feedback
Dedup	Singleflight collapses identical concurrent calls into one upstream request
Failures	Retry with jittered exponential backoff; `Retry-After` honored
Provider health	Circuit breaker scoped to `(provider, endpoint)`
API keys	Round-robin pools; dead-key eviction on `401/403`; fast rotation on `429`
Request shaping	Provider-specific kwargs transforms and capability filtering
Output	Optional `<think>` token extraction; normalized response objects

Tested Against Real Providers

Every dispatcher below has a real-HTTP integration suite under tests/integration/ — no mocks, no recorded fixtures.

Provider	Dispatcher	Primary model under test
OpenAI	`openai_dispatch` / `openaiDispatch`	`gpt-4o-mini`, `o4-mini`
Anthropic	`anthropic_dispatch` / `anthropicDispatch`	`claude-haiku-4-5-20251001`
Gemini	`gemini_dispatch` / `geminiDispatch`	`gemini-2.5-flash`
OpenRouter	`openrouter_dispatch` / `openrouterDispatch`	`deepseek/deepseek-v4-flash`
DeepInfra	`deepinfra_dispatch` / `deepinfraDispatch`	`Qwen/Qwen3-32B`
Novita	`novita_dispatch` / `novitaDispatch`	`qwen/qwen3.5-27b`
Together	`together_dispatch` / `togetherDispatch`	`Qwen/Qwen2.5-7B-Instruct-Turbo`
Sno GPU	`sno_gpu_dispatch` / `snoGpuDispatch`	`qwen3.6-27b-extract`

OpenRouter, DeepInfra, Novita, and Together are OpenAI-compatible — their dispatchers reuse the OpenAI client / @ai-sdk/openai with a provider-specific base_url. TypeScript dispatchers use AI SDK v6 where the provider supports it.

Cross-cutting suites (test_e2e_cache.py, _concurrency, _parity, _redis, _resilience, _security, _thinking) exercise the pipeline itself across every provider.

Production Config: the Registry

Services that need atomic config updates use the LLMix Config Registry — a publishing layer that turns editable .mda presets into immutable, content-addressed snapshots.

config/llm/
  authoring/         ← editable .mda presets
  snapshots/<rev>/   ← immutable, content-addressed
  current.json       ← the only live switch

Runtime services open the manager once at startup; reads come from resolved JSON snapshot files, not mutable authoring MDA.

from llmix import ConfigRegistryManager, ConfigRegistryPublisher, resolve_config_dir

root = resolve_config_dir().config_dir
ConfigRegistryPublisher(root).publish()

manager = ConfigRegistryManager.open(root)
config = manager.get_preset("search", "summary")

import { ConfigRegistryManager, ConfigRegistryPublisher, resolveConfigDir } from "@snoai/llmix";

const { configDir } = resolveConfigDir();
await new ConfigRegistryPublisher(configDir).publish();

const manager = await ConfigRegistryManager.open(configDir);
const config = await manager.getPreset("search", "summary");

Managers expose the active revision and reload-health metadata so service code can surface which revision is live.

TypeScript authoring tools can use loadMdaConfig / loadMdaConfigPreset; Python authoring tools can use load_mda_config / load_mda_config_preset; Rust authoring tools can use load_config / load_config_preset, which now hard-require .mda files. None of these direct loaders are the production hot path.

What This Is Not

Not a streaming library. Streaming is your SDK's job. LLMix handles calls, not chunks.
Not a provider replacement. It wraps your client, it doesn't replace it.
Not a cross-provider router in the LiteLLM sense. One call, one provider — the one your config names.

Environment Variables

Variable	Purpose
`OPENAI_API_KEY` / `OPENAI_KEYS`	Single key or comma-separated OpenAI key pool
`ANTHROPIC_API_KEY`	Anthropic auth
`GEMINI_API_KEY`	Google / Gemini auth
`OPENROUTER_API_KEY`	OpenRouter auth
`SNO_LLM_API_KEY`	Sno GPU auth
`GPU_BASE_URL`	Sno GPU base URL
`REDIS_URL`	Redis L2 cache
`LLMIX_STATE_DIR`	Lock files, batch metadata, kill switch state

Development

# TypeScript
bun install && bun test
bunx tsc -p tsconfig.check.json

# Python
uv sync && uv run pytest tests/python/
uv run pyright

# Rust
cargo test --manifest-path rust/llmix-rs/Cargo.toml
cargo clippy --manifest-path rust/llmix-rs/Cargo.toml -- -D warnings

License

Apache-2.0

Project details

Release history Release notifications | RSS feed

This version

2.0.0

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sno_llmix-2.0.0.tar.gz (100.3 kB view details)

Uploaded May 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sno_llmix-2.0.0-py3-none-any.whl (119.8 kB view details)

Uploaded May 9, 2026 Python 3

File details

Details for the file sno_llmix-2.0.0.tar.gz.

File metadata

Download URL: sno_llmix-2.0.0.tar.gz
Upload date: May 9, 2026
Size: 100.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sno_llmix-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`7f634307e499fad41ff801eb353883c1a926f3da0262f32eba36b4397e7a1734`
MD5	`74b18e17f3bcc2b5b81fe5788b38a739`
BLAKE2b-256	`c5fe7c66047e07086ce70f663770c6f5e346dd08ed5a5b1ad9e262ff46eec167`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sno_llmix-2.0.0.tar.gz:

Publisher: release.yml on sno-ai/llmix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sno_llmix-2.0.0.tar.gz
- Subject digest: 7f634307e499fad41ff801eb353883c1a926f3da0262f32eba36b4397e7a1734
- Sigstore transparency entry: 1483584055
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: sno-ai/llmix@5335b44ec77e781126c1ef4e0176d0471900ea6c
- Branch / Tag: refs/heads/main
- Owner: https://github.com/sno-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5335b44ec77e781126c1ef4e0176d0471900ea6c
- Trigger Event: workflow_dispatch

File details

Details for the file sno_llmix-2.0.0-py3-none-any.whl.

File metadata

Download URL: sno_llmix-2.0.0-py3-none-any.whl
Upload date: May 9, 2026
Size: 119.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sno_llmix-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`28d6362fabb81ea1e6908fb4d24ec86cf9e97f4ec8c521d98b1a8fcf7450b08e`
MD5	`b3eb99374ddac1a5d1733d0e6cfa2c8e`
BLAKE2b-256	`a8d5b4f6874d29488a7fdabae748868cfab733e033e4e771038ed082abe1ee69`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sno_llmix-2.0.0-py3-none-any.whl:

Publisher: release.yml on sno-ai/llmix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sno_llmix-2.0.0-py3-none-any.whl
- Subject digest: 28d6362fabb81ea1e6908fb4d24ec86cf9e97f4ec8c521d98b1a8fcf7450b08e
- Sigstore transparency entry: 1483584136
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: sno-ai/llmix@5335b44ec77e781126c1ef4e0176d0471900ea6c
- Branch / Tag: refs/heads/main
- Owner: https://github.com/sno-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5335b44ec77e781126c1ef4e0176d0471900ea6c
- Trigger Event: workflow_dispatch

sno-llmix 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LLMix

At a Glance

Three Things It Does

Quick Start

Python

TypeScript

Rust

MDA Presets

Inside Every Call

Tested Against Real Providers

Production Config: the Registry

What This Is Not

Environment Variables

Development

License

Related

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance