A toolkit for batch LLM API calls driven by YAML configuration.
Project description
promptloom
promptloom is a Python toolkit that turns a prompt template and a
YAML config file into a fully managed batch of LLM API calls. You
write a Markdown prompt with {{PLACEHOLDER}} slots, declare your
tasks and models in YAML, and the toolkit takes care of the rest:
prompt assembly, concurrent async execution via
LiteLLM, structured JSON extraction,
schema & custom validation, a multi-turn correction loop for invalid
responses, and detailed YAML reporting — including automatic
re-generation configs for any failed runs.
Features
- YAML-driven configuration — define experiments, tasks, models, parameters, and settings in a single YAML file.
- Arbitrary prompt parameters — use
{{PLACEHOLDER}}syntax in prompt templates; parameter values come from the YAML task definition. - File references — prefix parameter values with
file:to read content from disk (e.g.file:data/law.txt). - System prompts — optional system-level messages, supporting both
literal strings and
file:references with placeholder substitution. - Multi-provider support — any model supported by LiteLLM (OpenAI, Anthropic, Google Gemini, Ollama, etc.).
- Fully asynchronous — all API calls run concurrently via
asynciowith configurable concurrency limits. - Response processing — built-in JSON extraction from raw LLM responses (handles code fences, raw JSON, and heuristic extraction).
- Validation pipeline — chain validators (JSON Schema, custom Python functions) to check structured output for correctness.
- Multi-turn correction loop — when validation fails, automatically send the error report back to the LLM and retry (configurable number of correction turns).
- Three-phase pre-flight checks:
- Model validation — two-tier approach: first checks litellm's built-in registry (instant, local); if unknown, queries the provider's model-list API (e.g. OpenRouter, Ollama) to confirm availability. Also verifies API key availability.
- Placeholder validation — ensures every
{{PLACEHOLDER}}in each template has a matching parameter; warns about unused parameters. - Validation config check — verifies response format, correction prompt existence, schema files, and validator specs.
- Structured YAML reports — timestamped report with per-task, per-model status, error details, timing, token usage, and correction attempt counts.
- Failed-run config — automatically generates a YAML config containing only the failed (task, model) pairs for convenient re-runs.
- Programmatic API — use from Python scripts and Jupyter notebooks
with both sync (
run_experiment) and async (run_experiment_async) entry points.
Installation
pip install promptloom
For development (clone the repo, then install in editable mode):
git clone https://github.com/Nobulax/promptloom.git
cd promptloom
pip install -e ".[dev]"
Quick start
1. Create a prompt template
Write a Markdown file with {{PLACEHOLDER}} syntax for variable parts:
# Task
{{INSTRUCTION}}
# Document
{{DOCUMENT}}
2. Create a YAML config
experiment:
name: "My Experiment"
defaults:
models:
- "openai/gpt-4o"
prompt_template: "prompt.md"
system_prompt: "You are a helpful assistant."
output_dir: "output/"
max_completion_tokens: 4000
timeout: 120
max_concurrency: 5
tasks:
- id: "task-1"
params:
document: "file:data/input.txt"
instruction: "Summarize this document."
- id: "task-2"
params:
document: "Inline text content."
instruction: "Translate this to German."
models:
- "openai/gpt-4o"
- "anthropic/claude-sonnet-4-20250514"
3. Set up API keys
Copy .env.example to .env and fill in your keys:
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-ant-..."
GEMINI_API_KEY="..."
4. Run
From the command line:
promptloom run config.yaml
promptloom run config.yaml --dry-run # validate only, no API calls
promptloom run config.yaml --skip-preflight # skip pre-flight checks
Or from Python:
from promptloom import run_experiment
results = run_experiment("config.yaml")
Structured output with validation
For tasks that require structured JSON output, the toolkit provides a complete processing pipeline: extract → validate → correct.
Example config
defaults:
models:
- "openai/gpt-4o"
prompt_template: "prompt.md"
system_prompt: "You are a helpful assistant that responds with valid JSON."
response_format: "json"
validators:
- type: json_schema
schema: "schemas/output_schema.json"
correction_prompt: "correction.md"
max_corrections: 3
tasks:
- id: "generate-data"
params:
instruction: "Generate a structured summary."
document: "file:data/input.txt"
Correction prompt template
The correction prompt is a Markdown file with a {{ERROR}} placeholder:
Your previous response was invalid. Here is the error report:
{{ERROR}}
Please correct your output. Return only the valid JSON object.
Custom validators
For domain-specific checks (e.g., data integrity, allowed labels, no duplicate IDs), write a Python function and reference it by its import path:
validators:
- type: json_schema
schema: "schemas/output.json"
- type: custom
callable: "mypackage.validators.check_integrity"
The function must have this signature:
from promptloom.validation import ValidationResult
def check_integrity(data, context):
"""
data: the parsed response (e.g., dict from JSON)
context: {"task_id": ..., "params": {...}, "model": ..., "attempt": ...}
"""
errors = []
ids = [item["id"] for item in data["items"]]
if len(ids) != len(set(ids)):
errors.append("Duplicate IDs found")
if errors:
return ValidationResult.fail("\n".join(errors))
return ValidationResult.ok(data)
The error string from ValidationResult.fail() is substituted into the
{{ERROR}} placeholder of the correction prompt and sent back to the LLM.
How the correction loop works
- LLM responds → response processor runs (e.g., JSON extraction).
- Validators run in order. First failure stops the chain.
- If validation fails and corrections remain:
- The assistant's response is appended to the conversation.
- A correction prompt with
{{ERROR}}filled in is appended. - The LLM is called again with the full conversation history.
- Repeat up to
max_correctionstimes. - If all corrections are exhausted, the task is marked as failed (but the last output is still saved for inspection).
YAML config reference
A fully-commented reference config showing every available field is at
examples/config_full.yaml.
experiment (optional)
| Field | Type | Description |
|---|---|---|
name |
string | Human-readable experiment name. |
description |
string | Longer description. |
defaults (optional)
Global defaults applied to all tasks unless overridden per-task.
| Field | Type | Default | Description |
|---|---|---|---|
models |
list | — | LiteLLM model identifiers. |
prompt_template |
string | — | Path to the prompt template file. |
system_prompt |
string | null |
System message (literal or file: reference). |
output_dir |
string | output |
Base output directory. |
max_completion_tokens |
int | 64000 |
Max tokens in LLM response. |
timeout |
int | null |
Timeout in seconds per API call. |
max_concurrency |
int | 10 |
Max parallel API calls. |
ignore_unused_params |
bool | false |
Auto-continue on unused-param warnings. |
response_format |
string | text |
Response processor: "text" or "json". |
validators |
list | [] |
Ordered list of validator specs. |
correction_prompt |
string | null |
Path to correction prompt template (needs {{ERROR}}). |
max_corrections |
int | 0 |
Max correction turns on validation failure. |
tasks (required)
List of task objects. Each task defines one prompt sent to one or more
models. All defaults fields can be overridden per-task.
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | yes | Unique task identifier. |
params |
dict | no | Key-value pairs substituted into the prompt template. |
models |
list | no | Override default models for this task. |
prompt_template |
string | no | Override default prompt template. |
system_prompt |
string | no | Override default system prompt. |
output_dir |
string | no | Override output directory. |
max_completion_tokens |
int | no | Override max tokens. |
timeout |
int | no | Override timeout. |
response_format |
string | no | Override response processor. |
validators |
list | no | Override validators. |
correction_prompt |
string | no | Override correction prompt template. |
max_corrections |
int | no | Override max corrections. |
Parameter values
Parameter values in params are strings by default. To include file
contents, prefix the value with file::
params:
document: "file:data/input.txt" # reads file content
instruction: "Summarize this." # literal string
File paths are resolved relative to the YAML config file's directory.
Output structure
output/
task-1/
task-1_openai_gpt-4o.txt # text format
task-2/
task-2_openai_gpt-4o.json # json format (pretty-printed)
task-2_anthropic_claude-sonnet-4-20250514.json
config_report_20260319_143000.yaml # timestamped report
config_failed.yaml # only if there were failures
Pre-flight checks
Before dispatching API calls, three checks run:
- Model validation — uses a two-tier approach. First, litellm's
built-in model registry is checked (instant, local). If the model
is not in litellm's static registry (common for aggregator providers
like OpenRouter), a lightweight
GET /v1/modelscall is made to the provider to confirm the model actually exists. This remote check is free (no tokens consumed), fast, and cached per provider. Also verifies that the required API keys / environment variables are set. - Placeholder validation — for each task, verifies that every
{{PLACEHOLDER}}in the prompt template has a matching key in the task'sparams. Missing params are fatal errors. Unused params are warnings (auto-continued ifignore_unused_params: true). - Validation config — checks that
response_formatis valid, correction prompt files exist and contain{{ERROR}}, schema files exist, and validator specs are well-formed.
All checks run to completion before any abort decision, so you see all problems at once.
Programmatic API
from promptloom import load_config, run_experiment, run_experiment_async
from promptloom.validation import ValidationResult
# Synchronous (from scripts)
results = run_experiment("config.yaml")
# Async (from notebooks or async code)
config = load_config("config.yaml")
results = await run_experiment_async(config, skip_preflight=True)
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptloom-0.3.0.tar.gz.
File metadata
- Download URL: promptloom-0.3.0.tar.gz
- Upload date:
- Size: 41.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
770d40d7493881f7475d5f07f1a51ecad6add5f237e33355809d664b6f05c1f2
|
|
| MD5 |
0b79f03400e34d560aeb44e9b313eda4
|
|
| BLAKE2b-256 |
dba6ca670f28370166dc2018427834eeb26dea6319eb51a03973d91b35d3609f
|
Provenance
The following attestation bundles were made for promptloom-0.3.0.tar.gz:
Publisher:
publish.yml on Nobulax/promptloom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptloom-0.3.0.tar.gz -
Subject digest:
770d40d7493881f7475d5f07f1a51ecad6add5f237e33355809d664b6f05c1f2 - Sigstore transparency entry: 1309391952
- Sigstore integration time:
-
Permalink:
Nobulax/promptloom@7edbaa29ce5c7527c2d8e1933e5427b4b055774a -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/Nobulax
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7edbaa29ce5c7527c2d8e1933e5427b4b055774a -
Trigger Event:
push
-
Statement type:
File details
Details for the file promptloom-0.3.0-py3-none-any.whl.
File metadata
- Download URL: promptloom-0.3.0-py3-none-any.whl
- Upload date:
- Size: 32.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59e2dc52d17dcf1943942b63e05635b5d2ce33f4fbb249066ae01f450551a484
|
|
| MD5 |
e104264be7bde3d298bc1de61e1449d7
|
|
| BLAKE2b-256 |
ff4d48bf4a0afd472bf296c8392a1e194fac5d17906b4fd0f3fe23139345e9c1
|
Provenance
The following attestation bundles were made for promptloom-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on Nobulax/promptloom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptloom-0.3.0-py3-none-any.whl -
Subject digest:
59e2dc52d17dcf1943942b63e05635b5d2ce33f4fbb249066ae01f450551a484 - Sigstore transparency entry: 1309392013
- Sigstore integration time:
-
Permalink:
Nobulax/promptloom@7edbaa29ce5c7527c2d8e1933e5427b4b055774a -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/Nobulax
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7edbaa29ce5c7527c2d8e1933e5427b4b055774a -
Trigger Event:
push
-
Statement type: