Skip to main content

preLLM — One function for small LLM preprocessing before large LLM execution. Like litellm.completion() but with decomposition.

Project description

🧠 preLLM

One function for small LLM preprocessing before large LLM execution. Like litellm.completion() but with decomposition.

from prellm import preprocess_and_execute

result = await preprocess_and_execute(
    query="Deploy app to production",
    small_llm="ollama/qwen2.5:3b",
    large_llm="gpt-4o-mini",
)
print(result.content)

Install & Run in 60 Seconds

pip install prellm

# CLI — zero config
prellm query "Zdeployuj apkę na prod" --small ollama/qwen2.5:3b --large gpt-4o-mini

# With strategy
prellm query "Refaktoryzuj kod" --strategy structure --json

# Docker
docker run prellm/prellm query "Deploy app" --small ollama/qwen2.5:3b --large gpt-4o-mini

How It Works

User Query → Small LLM (≤3B, local) → classify/structure/enrich → Large LLM (cloud) → Validated Response
              Qwen2.5 / Phi3 / Gemma      decomposition pipeline     GPT-4 / Claude / Llama

Result: 70-80% token savings + enterprise-quality output for the price of a small LLM call.

Python API

One Function (recommended)

from prellm import preprocess_and_execute

# Zero-config — just query + models
result = await preprocess_and_execute("Refaktoryzuj kod")

# Full control
result = await preprocess_and_execute(
    query="Deploy app to production",
    small_llm="ollama/qwen2.5:3b",      # local preprocessing
    large_llm="anthropic/claude-sonnet-4-20250514",  # cloud execution
    strategy="structure",                 # classify|structure|split|enrich|passthrough
    user_context="gdansk_embedded_python",
)

print(result.content)              # Large LLM response
print(result.decomposition)        # Small LLM analysis
print(result.model_used)           # "anthropic/claude-sonnet-4-20250514"
print(result.small_model_used)     # "ollama/qwen2.5:3b"

Sync Version

from prellm import preprocess_and_execute_sync

result = preprocess_and_execute_sync("Deploy app", large_llm="gpt-4o-mini")

With Domain Rules

result = await preprocess_and_execute(
    query="Usuń bazę danych klientów",
    small_llm="ollama/qwen2.5:3b",
    large_llm="gpt-4o-mini",
    domain_rules=[{
        "name": "destructive_db",
        "keywords": ["delete", "drop", "usuń"],
        "required_fields": ["target_database", "backup_confirmed"],
        "severity": "critical",
    }],
)
print(result.decomposition.missing_fields)  # ["target_database", "backup_confirmed"]

With YAML Config

result = await preprocess_and_execute(
    query="Deploy to staging",
    config_path="configs/prellm_config.yaml",
)

Use Cases

1. Code Refactoring

result = await preprocess_and_execute(
    query="Popraw mój projekt z hardcode'em",
    small_llm="ollama/qwen2.5:3b",
    large_llm="anthropic/claude-sonnet-4-20250514",
    strategy="structure",
    user_context="gdansk_embedded_python",
)
# Small LLM: classify intent, extract structure, compose prompt
# Large LLM: complete refactored code with tests
# Cost: $0.01 + $0.45 = $0.46

2. Kubernetes Diagnostics

result = await preprocess_and_execute(
    query="Zdiagnozuj problem z K8s podami",
    small_llm="ollama/qwen2.5:3b",
    large_llm="gpt-4o-mini",
    strategy="enrich",
    user_context={"cluster": "k8s-prod", "namespace": "backend"},
)
# Small LLM: parse context, identify missing fields, enrich prompt
# Large LLM: root cause + K8s manifests + Prometheus rules
# Cost: $0.02 + $0.38 = $0.40

3. Business Automation

result = await preprocess_and_execute(
    query="Zautomatyzuj kalkulację leasingu dla camper van",
    small_llm="ollama/qwen2.5:3b",
    large_llm="anthropic/claude-sonnet-4-20250514",
    strategy="enrich",
    user_context="PL_automotive_leasing",
)
# Small LLM: domain=automotive, locale=PL, required=[VAT, WIBOR]
# Large LLM: Python calculator + Excel generator + PDF templates
# Cost: $0.015 + $0.52 = $0.535

5 Decomposition Strategies

Strategy What it does Best for
classify Classify intent + domain General queries, routing
structure Extract action, target, params DevOps commands, API calls
split Break into sub-queries Complex multi-part requests
enrich Add missing context Incomplete prompts, safety
passthrough No preprocessing Simple/direct queries

Configuration (YAML)

# configs/prellm_config.yaml
small_model:
  model: "ollama/qwen2.5:3b"
  fallback: ["phi3:mini"]
  max_tokens: 512

large_model:
  model: "gpt-4o-mini"
  fallback: ["llama3", "mistral"]
  max_tokens: 2048

default_strategy: classify

domain_rules:
  - name: production_deploy
    keywords: ["deploy", "push", "release"]
    required_fields: ["environment", "version"]
    severity: critical
    strategy: structure

Process Chains (DevOps Workflows)

from prellm import PreLLM, ProcessChain

engine = PreLLM("configs/prellm_config.yaml")
chain = ProcessChain("configs/deploy.yaml", engine=engine)
result = await chain.execute(env="production", dry_run=True)

for step in result.steps:
    print(f"{step.step_name}: {step.status}")

Architecture

preprocess_and_execute(query, small_llm, large_llm)
    │
    ├── ContextEngine (env/git/system)
    ├── QueryDecomposer (small LLM ≤3B)
    │   ├── classify → intent + domain
    │   ├── structure → action + target + params
    │   ├── split → sub-queries
    │   ├── enrich → missing fields + context
    │   └── compose → optimized prompt
    ├── LLMProvider (large LLM via litellm)
    │   ├── retry + fallback chain
    │   └── 100+ models (OpenAI, Anthropic, Ollama, etc.)
    └── PreLLMResponse (Pydantic v2 validated)

Development

git clone https://github.com/wronai/prellm
cd prellm
poetry install
poetry run pytest          # 144+ tests
poetry run pytest --cov    # ~80% coverage

Roadmap

See ROADMAP.md for the full 12-month plan to make preLLM a standard.

License

Apache License 2.0 - see LICENSE for details.

Author

Created by Tom Sapletta - tom@sapletta.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prellm-0.3.1.tar.gz (25.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prellm-0.3.1-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file prellm-0.3.1.tar.gz.

File metadata

  • Download URL: prellm-0.3.1.tar.gz
  • Upload date:
  • Size: 25.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for prellm-0.3.1.tar.gz
Algorithm Hash digest
SHA256 cae3cf9f960875c7e459ca66d1680aa3deccb1a225e4ec05ebf9379dd1b2c34c
MD5 be1e227906eca77f90bbe6702cc84caa
BLAKE2b-256 ae4bd7122135737c1b68c7d74f0a1a220fc4c2f49e6f047ba106719b99e96d43

See more details on using hashes here.

File details

Details for the file prellm-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: prellm-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for prellm-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2dedb01381f5c5540fe0edc8c1266bc437036d538ccc2f25f728f31ff03de7ab
MD5 8b72335d9338f955bec256753bcc577b
BLAKE2b-256 dce2af947ad7d0f8937e4913f9ed16ee4d3464e8d8c8e6c888bf3943025d1a99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page