Skip to main content

preLLM — One function for small LLM preprocessing before large LLM execution. Like litellm.completion() but with decomposition.

Project description

preLLM

One function for small LLM preprocessing before large LLM execution. Like litellm.completion() — but with a smart preprocessing layer.

from prellm import preprocess_and_execute

result = await preprocess_and_execute(
    query="Deploy app to production",
    small_llm="ollama/qwen2.5:3b",     # local, fast, cheap
    large_llm="anthropic/claude-sonnet-4-20250514",  # cloud, powerful
)
print(result.content)

Install & Run in 60 Seconds

pip install prellm

# CLI — zero config
prellm query "Zdeployuj apkę na prod" --small ollama/qwen2.5:3b --large gpt-4o-mini

# With strategy
prellm query "Refaktoryzuj kod" --strategy structure --json

# Two-agent pipeline (v0.3)
prellm query "Deploy app" --pipeline dual_agent_full

# Docker
docker run prellm/prellm query "Deploy app" --small ollama/qwen2.5:3b --large gpt-4o-mini

How It Works

User Query
  → Small LLM (≤24B, local)    → classify / structure / enrich    → optimized prompt
    Qwen2.5 / Phi3 / Gemma       PromptPipeline (YAML)
  → Large LLM (cloud)          → execute with full context        → validated response
    GPT-4 / Claude / Llama       ResponseValidator (YAML schema)

Result: 70–80% token savings + enterprise-quality output for the price of a small LLM call.


Python API

One Function — Two Execution Paths

from prellm import preprocess_and_execute

# PATH A: Strategy-based (v0.2, default)
result = await preprocess_and_execute(
    query="Deploy app to production",
    small_llm="ollama/qwen2.5:3b",
    large_llm="anthropic/claude-sonnet-4-20250514",
    strategy="structure",                 # classify|structure|split|enrich|passthrough
    user_context="gdansk_embedded_python",
)

# PATH B: Pipeline-based two-agent (v0.3)
result = await preprocess_and_execute(
    query="Deploy app to production",
    small_llm="ollama/qwen2.5:3b",
    large_llm="anthropic/claude-sonnet-4-20250514",
    pipeline="dual_agent_full",           # any pipeline from pipelines.yaml
)

print(result.content)              # Large LLM response
print(result.decomposition)        # Small LLM analysis
print(result.model_used)           # Which large model answered
print(result.small_model_used)     # Which small model preprocessed

Sync Version

from prellm import preprocess_and_execute_sync

result = preprocess_and_execute_sync("Deploy app", large_llm="gpt-4o-mini")
# Works exactly the same, just blocking

Zero-Config

# Defaults: small=ollama/qwen2.5:3b, large=claude-sonnet, strategy=classify
result = await preprocess_and_execute("Refaktoryzuj kod")

LLM Provider Examples

preLLM uses LiteLLM under the hood, so any model string supported by LiteLLM works.

Ollama (local, free)

# Start Ollama: ollama serve
# Pull model:   ollama pull qwen2.5:3b

result = await preprocess_and_execute(
    query="Explain Kubernetes pods",
    small_llm="ollama/qwen2.5:3b",       # local small model
    large_llm="ollama/llama3:70b",        # local large model
)
# Cost: $0.00 — both models run locally

Ollama + OpenAI (hybrid)

result = await preprocess_and_execute(
    query="Review my Python code",
    small_llm="ollama/qwen2.5:3b",       # local preprocessing
    large_llm="gpt-4o-mini",             # OpenAI execution
)
# Cost: $0.00 (local) + ~$0.15 (OpenAI) = $0.15

Ollama + Anthropic (hybrid)

result = await preprocess_and_execute(
    query="Deploy microservices to K8s",
    small_llm="ollama/phi3:mini",         # local preprocessing
    large_llm="anthropic/claude-sonnet-4-20250514",  # Anthropic execution
)

OpenAI only

result = await preprocess_and_execute(
    query="Analyze sales data",
    small_llm="gpt-4o-mini",             # cheap OpenAI preprocessing
    large_llm="gpt-4o",                  # powerful OpenAI execution
)

Anthropic only

result = await preprocess_and_execute(
    query="Write a compliance report",
    small_llm="anthropic/claude-haiku",
    large_llm="anthropic/claude-sonnet-4-20250514",
)

Groq (fast inference)

result = await preprocess_and_execute(
    query="Summarize meeting notes",
    small_llm="groq/llama-3.1-8b-instant",   # fast Groq preprocessing
    large_llm="groq/llama-3.3-70b-versatile", # fast Groq execution
)

Mistral

result = await preprocess_and_execute(
    query="Translate technical docs",
    small_llm="mistral/mistral-small-latest",
    large_llm="mistral/mistral-large-latest",
)

Azure OpenAI

result = await preprocess_and_execute(
    query="Generate quarterly report",
    small_llm="azure/gpt-4o-mini-deployment",
    large_llm="azure/gpt-4o-deployment",
)

AWS Bedrock

result = await preprocess_and_execute(
    query="Optimize Lambda function",
    small_llm="bedrock/anthropic.claude-haiku",
    large_llm="bedrock/anthropic.claude-sonnet",
)

Full provider list: See LiteLLM docs — preLLM supports all 100+ providers.


Integration with Existing LiteLLM Projects

Drop-in Enhancement

If you already use LiteLLM, preLLM adds preprocessing with one line change:

# BEFORE — direct litellm call
import litellm
response = await litellm.acompletion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Deploy app to production"}],
)

# AFTER — preLLM preprocessing + same litellm execution
from prellm import preprocess_and_execute
result = await preprocess_and_execute(
    query="Deploy app to production",
    large_llm="gpt-4o",  # same model, now with preprocessing
)
# result.content == same quality, but with structured decomposition

Use Your Existing .env

preLLM reads the same environment variables as LiteLLM:

# .env — works with both litellm and prellm
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GROQ_API_KEY=gsk_...

# preLLM-specific (optional)
PRELLM_SMALL_MODEL=ollama/qwen2.5:3b
PRELLM_LARGE_MODEL=anthropic/claude-sonnet-4-20250514
PRELLM_STRATEGY=classify

LiteLLM Proxy Integration

If you run a LiteLLM proxy, point preLLM at it:

import os
os.environ["OPENAI_API_BASE"] = "http://localhost:4000"  # your litellm proxy

result = await preprocess_and_execute(
    query="Deploy app",
    small_llm="openai/small-model",   # routed through litellm proxy
    large_llm="openai/large-model",   # routed through litellm proxy
)

OpenAI SDK-Compatible Server

preLLM ships an OpenAI-compatible proxy — use it from any OpenAI SDK client:

# Start preLLM server
prellm serve --port 8080 --small ollama/qwen2.5:3b --large gpt-4o-mini

# Use from OpenAI Python SDK
import openai
client = openai.OpenAI(base_url="http://localhost:8080/v1", api_key="any")
response = client.chat.completions.create(
    model="prellm:default",
    messages=[{"role": "user", "content": "Deploy app to production"}],
)

# Use from curl
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"prellm:qwen→claude","messages":[{"role":"user","content":"Deploy app"}]}'

# Use v0.3 pipeline via API
curl http://localhost:8080/v1/chat/completions \
  -d '{"model":"prellm:default","messages":[{"role":"user","content":"Deploy app"}],"prellm":{"pipeline":"dual_agent_full"}}'

Two-Agent Architecture (v0.3)

The pipeline= parameter activates the new two-agent architecture:

USER QUERY
    │
    ▼
┌─────────────────────────────────────┐
│  PREPROCESSOR AGENT (small LLM)     │
│  PromptRegistry (YAML prompts)      │
│  PromptPipeline (YAML steps)        │
│  → classify → structure → compose   │
│  → IntermediateValidator            │
└──────────────┬──────────────────────┘
               │ structured executor_input
               ▼
┌─────────────────────────────────────┐
│  EXECUTOR AGENT (large LLM)         │
│  → execute with full context        │
│  → ResponseValidator (YAML schema)  │
│  → PreLLMResponse (typed)           │
└─────────────────────────────────────┘

Custom Pipelines (YAML)

Define your own preprocessing pipeline — no Python code changes needed:

# configs/pipelines.yaml
pipelines:
  my_pipeline:
    description: "Custom 3-step pipeline"
    steps:
      - name: classify
        prompt: classify          # from configs/prompts.yaml
        output: classification
      - name: extract
        prompt: structure
        output: fields
      - name: compose
        prompt: compose
        input: [query, classification, fields]
        output: composed_prompt
result = await preprocess_and_execute(
    query="Deploy app",
    pipeline="my_pipeline",  # uses your custom YAML pipeline
)

Available Pipelines

Pipeline Steps Best for
classify classify Quick intent routing
structure classify → structure → compose DevOps, API calls
split classify → split → compose Complex multi-part queries
enrich classify → enrich Incomplete prompts
dual_agent_full context → decompose → optimize → format Maximum quality
passthrough (none) Direct forwarding

Custom Prompts (YAML)

All system prompts are in configs/prompts.yaml with Jinja2 templating:

# configs/prompts.yaml
prompts:
  classify:
    system: |
      You are a query classifier.
      Intents: {{ intents | default("deploy, query, create, delete") }}
      Respond ONLY with JSON: {"intent": "...", "confidence": 0.0-1.0}
    max_tokens: 256
    temperature: 0.1

Response Validation (YAML)

Validate LLM outputs with schemas — no code changes:

# configs/response_schemas.yaml
schemas:
  classification:
    required_fields: [intent, confidence]
    types:
      intent: string
      confidence: float
    constraints:
      confidence: {min: 0.0, max: 1.0}
      intent: {enum: [deploy, query, create, delete, other]}

5 Decomposition Strategies (v0.2)

Strategy What it does Best for
classify Classify intent + domain General queries, routing
structure Extract action, target, params DevOps commands, API calls
split Break into sub-queries Complex multi-part requests
enrich Add missing context Incomplete prompts, safety
passthrough No preprocessing Simple/direct queries

With Domain Rules

result = await preprocess_and_execute(
    query="Usuń bazę danych klientów",
    small_llm="ollama/qwen2.5:3b",
    large_llm="gpt-4o-mini",
    domain_rules=[{
        "name": "destructive_db",
        "keywords": ["delete", "drop", "usuń"],
        "required_fields": ["target_database", "backup_confirmed"],
        "severity": "critical",
    }],
)
print(result.decomposition.missing_fields)  # ["target_database", "backup_confirmed"]

Use Cases

1. Code Refactoring

result = await preprocess_and_execute(
    query="Popraw mój projekt z hardcode'em",
    small_llm="ollama/qwen2.5:3b",
    large_llm="anthropic/claude-sonnet-4-20250514",
    strategy="structure",
    user_context="gdansk_embedded_python",
)
# Small LLM: classify intent, extract structure, compose prompt
# Large LLM: complete refactored code with tests
# Cost: $0.01 + $0.45 = $0.46

2. Kubernetes Diagnostics

result = await preprocess_and_execute(
    query="Zdiagnozuj problem z K8s podami",
    small_llm="ollama/qwen2.5:3b",
    large_llm="gpt-4o-mini",
    pipeline="structure",
    user_context={"cluster": "k8s-prod", "namespace": "backend"},
)
# Preprocessor: parse context, identify missing fields, compose prompt
# Executor: root cause + K8s manifests + Prometheus rules
# Cost: $0.02 + $0.38 = $0.40

3. Business Automation

result = await preprocess_and_execute(
    query="Zautomatyzuj kalkulację leasingu dla camper van",
    small_llm="ollama/qwen2.5:3b",
    large_llm="anthropic/claude-sonnet-4-20250514",
    pipeline="enrich",
    user_context="PL_automotive_leasing",
)
# Preprocessor: domain=automotive, locale=PL, required=[VAT, WIBOR]
# Executor: Python calculator + Excel generator + PDF templates
# Cost: $0.015 + $0.52 = $0.535

Configuration (YAML)

# configs/prellm_config.yaml
small_model:
  model: "ollama/qwen2.5:3b"
  fallback: ["phi3:mini"]
  max_tokens: 512

large_model:
  model: "gpt-4o-mini"
  fallback: ["llama3", "mistral"]
  max_tokens: 2048

default_strategy: classify

domain_rules:
  - name: production_deploy
    keywords: ["deploy", "push", "release"]
    required_fields: ["environment", "version"]
    severity: critical
    strategy: structure

Per-Domain Defaults

Ready-to-use configs in configs/defaults/:

Domain File Covers
DevOps configs/defaults/devops.yaml deploy, K8s, monitoring, CI/CD
Coding configs/defaults/coding.yaml refactoring, review, debugging
Business configs/defaults/business.yaml leasing, invoicing, compliance
Embedded configs/defaults/embedded.yaml RPi, ESP32, sensors, IoT

Process Chains (DevOps Workflows)

from prellm import PreLLM, ProcessChain

engine = PreLLM("configs/prellm_config.yaml")
chain = ProcessChain("configs/deploy.yaml", engine=engine)
result = await chain.execute(env="production", dry_run=True)

for step in result.steps:
    print(f"{step.step_name}: {step.status}")

Architecture

preprocess_and_execute(query, small_llm, large_llm, strategy= | pipeline=)
    │
    ├── [strategy path — v0.2]
    │   ├── ContextEngine (env/git/system)
    │   ├── QueryDecomposer (small LLM)
    │   │   └── classify → structure → split → enrich → compose
    │   └── LLMProvider (large LLM via litellm)
    │
    ├── [pipeline path — v0.3]
    │   ├── PreprocessorAgent
    │   │   ├── PromptRegistry (YAML, Jinja2)
    │   │   ├── PromptPipeline (YAML-configurable steps)
    │   │   │   ├── LLM steps (small LLM calls)
    │   │   │   └── Algorithmic steps (validation, formatting)
    │   │   └── ContextEngine + UserMemory (SQLite)
    │   ├── ExecutorAgent
    │   │   ├── LLMProvider (large LLM via litellm)
    │   │   └── ResponseValidator (YAML schemas)
    │   └── 100+ models via LiteLLM
    │
    └── PreLLMResponse (Pydantic v2 validated)

Development

git clone https://github.com/wronai/prellm
cd prellm
poetry install
poetry run pytest                 # 219+ tests
poetry run pytest --cov           # coverage report
poetry run ruff check prellm/     # linting

Roadmap

See ROADMAP.md for the full plan.

License

Apache License 2.0 - see LICENSE for details.

Author

Created by Tom Sapletta - tom@sapletta.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prellm-0.3.10.tar.gz (64.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prellm-0.3.10-py3-none-any.whl (61.2 kB view details)

Uploaded Python 3

File details

Details for the file prellm-0.3.10.tar.gz.

File metadata

  • Download URL: prellm-0.3.10.tar.gz
  • Upload date:
  • Size: 64.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for prellm-0.3.10.tar.gz
Algorithm Hash digest
SHA256 ae95bbcb6ce272a60a4141ca8d7a81141bcbb56a765f21ab3e02e6f84316069b
MD5 c03d9b982a80d8bdf65fd05fa27202f8
BLAKE2b-256 b297f5ca5897437f95f67f1fa01e333b88b5564cc3ec0e78b2a22b5a491a240c

See more details on using hashes here.

File details

Details for the file prellm-0.3.10-py3-none-any.whl.

File metadata

  • Download URL: prellm-0.3.10-py3-none-any.whl
  • Upload date:
  • Size: 61.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for prellm-0.3.10-py3-none-any.whl
Algorithm Hash digest
SHA256 79976d13ec296916c8e386b1774f47ffd9944dd835463b13e715cbb9f25cf384
MD5 27085560b22227abb505fe1356607142
BLAKE2b-256 e1fd57c5766f1eddcfb8c952afb17a73b248dc43f5fe36c92637664647411f4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page