Skip to main content

Ask LLMs to return structured JSON and run cross-model tests. API-first.

Project description

Prompture

Structured JSON extraction from any LLM. Schema-enforced, Pydantic-native, multi-provider.

PyPI version Python versions License: MIT Downloads GitHub stars


Prompture is a Python library that turns LLM responses into validated, structured data. Define a schema or Pydantic model, point it at any provider, and get typed output back — with token tracking, cost calculation, and automatic JSON repair built in.

from pydantic import BaseModel
from prompture import extract_with_model

class Person(BaseModel):
    name: str
    age: int
    profession: str

person = extract_with_model(Person, "Maria is 32, a developer in NYC.", model_name="openai/gpt-4")
print(person.name)  # Maria

Key Features

  • Structured output — JSON schema enforcement and direct Pydantic model population
  • 20+ providers — OpenAI, Claude, Google, Groq, Grok, Azure, Ollama, LM Studio, OpenRouter, HuggingFace, Moonshot, ModelScope, Z.ai, Vertex AI, AirLLM, CachiBot, Runway, MiniMax/Hailuo, Kling AI, Fal.ai, and generic HTTP
  • Multi-modal — Drivers for embeddings, image generation (DALL-E, Imagen, Grok, Stability, Runway), video generation (Grok Imagine Video, Runway text/image/video → video), text-to-speech (OpenAI, ElevenLabs, Runway), sound effects, voice dubbing / isolation / conversion (Runway), and speech-to-text (Whisper, ElevenLabs)
  • Multi-model fallback — Try a list of models in sequence with per-attempt cost, token, and capability accounting
  • Strategy cascade — Auto-selects between provider-native JSON mode, tool-call extraction, and prompted repair so extraction works on any model
  • TOON input conversion — 45-60% token savings when sending structured data via Token-Oriented Object Notation
  • Stepwise extraction — Per-field prompts with smart type coercion (shorthand numbers, multilingual booleans, dates)
  • Field registry — 50+ predefined extraction fields with template variables and Pydantic integration
  • Conversations — Stateful multi-turn sessions with sync and async support
  • Tool use — Function calling and streaming across supported providers, with automatic prompt-based simulation for models without native tool support
  • Caching — Built-in response cache with memory, SQLite, and Redis backends
  • Plugin system — Register custom drivers via entry points
  • Usage tracking — Token counts and cost calculation on every call
  • Auto-repair — Optional second LLM pass to fix malformed JSON
  • Batch testing — Spec-driven suites to compare models side by side

Built With Prompture

Projects powered by Prompture at their core:

  • CachiBot — AI-powered bot built on Prompture's structured extraction and multi-provider driver system
  • AgentSite — Agent-driven web platform using Prompture for LLM orchestration and structured output

Installation

pip install prompture

Optional extras:

pip install prompture[redis]     # Redis cache backend
pip install prompture[serve]     # FastAPI server mode
pip install prompture[airllm]    # AirLLM local inference

Configuration

Set API keys for the providers you use. Prompture reads from environment variables or a .env file:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
GROQ_API_KEY=...
GROK_API_KEY=...
# optional xAI-compatible alias for Grok APIs
XAI_API_KEY=...
OPENROUTER_API_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_API_KEY=...

Local providers (Ollama, LM Studio) work out of the box with no keys required.

Runtime API Keys (No Environment Variables)

Pass API keys at runtime via ProviderEnvironment — useful for multi-tenant apps, web backends, or anywhere you don't want to set os.environ:

from prompture import AsyncAgent, ProviderEnvironment

env = ProviderEnvironment(
    openai_api_key="sk-...",
    claude_api_key="sk-ant-...",
)

agent = AsyncAgent("openai/gpt-4o", env=env)
result = await agent.run("Hello!")

Works on Agent, AsyncAgent, Conversation, and AsyncConversation.

Providers

Model strings use "provider/model" format. The provider prefix routes to the correct driver automatically.

Provider Example Model Cost
openai openai/gpt-4 Automatic
claude claude/claude-3 Automatic
google google/gemini-1.5-pro Automatic
google_vertexai google_vertexai/gemini-1.5-pro Automatic
groq groq/llama2-70b-4096 Automatic
grok grok/grok-4-fast-reasoning Automatic
azure azure/deployed-name Automatic
openrouter openrouter/anthropic/claude-2 Automatic
moonshot moonshot/kimi-k2 Automatic
modelscope modelscope/Qwen2.5-72B-Instruct Automatic
zai zai/glm-4 Automatic
cachibot cachibot/openai/gpt-4o-mini Automatic
ollama ollama/llama3.1:8b Free (local)
lmstudio lmstudio/local-model Free (local)
huggingface hf/model-name Free (local)
airllm airllm/Qwen2-7B Free (local)
local_http local_http/self-hosted Free
runway runway/gen4.5 (video), runway/gpt_image_2 (image), runway/eleven_multilingual_v2 (TTS) Automatic
minimax minimax/MiniMax-Text-01 (LLM), minimax/MiniMax-Hailuo-2.3 (video) Automatic
kling kling/kling-v2-1 (image + video) Automatic
fal fal/fal-ai/flux/dev (image), fal/fal-ai/kling-video/v2.6/pro/image-to-video (video) Automatic

Aliases (anthropic, gemini, chatgpt, xai, lm_studio, zhipu, hf, dalle, runwayml, hailuo) route to their canonical providers.

Multi-Modal

Beyond text LLMs, Prompture exposes drivers for adjacent modalities under the same provider/model routing:

  • Embeddings — OpenAI (text-embedding-3-*) and Ollama (nomic-embed-text)
  • Image generation — OpenAI DALL-E + GPT image, Google Imagen, Grok, Stability AI, Runway (gen4_image, gen4_image_turbo, gpt_image_2, gemini_image3_pro, gemini_2.5_flash), Kling AI, Fal.ai
  • Video generation — Grok Imagine Video; Runway text/image/video → video (gen4.5, gen4_turbo, gen3a_turbo, gen4_aleph, veo3, veo3.1, veo3.1_fast); MiniMax / Hailuo; Kling AI; Fal.ai
  • Text-to-speech — OpenAI (tts-1), ElevenLabs, Runway (eleven_multilingual_v2)
  • Sound effects — Runway (eleven_text_to_sound_v2)
  • Audio transforms — Runway voice dubbing, voice isolation, speech-to-speech (RunwayAudioTransformDriver)
  • Speech-to-text — OpenAI Whisper and ElevenLabs
from prompture.drivers.img_gen_registry import get_img_gen_driver_for_model

driver = get_img_gen_driver_for_model("openai/dall-e-3")
result = driver.generate_image(
    "a cat on a surfboard at sunset",
    {"size": "1024x1024", "quality": "hd"},
)
print(result["meta"]["cost"], result["meta"]["image_count"])

Video generation uses the same provider/model routing. Set GROK_API_KEY or XAI_API_KEY, then request a Grok video model:

from prompture import get_video_gen_driver_for_model

driver = get_video_gen_driver_for_model("grok/grok-imagine-video")
result = driver.generate_video(
    "wide shot of a crystal-powered rocket launching from red desert dunes",
    {"duration": 8, "aspect_ratio": "16:9", "resolution": "720p"},
)

video = result["videos"][0]
print(video.url)
print(result["meta"]["request_id"], result["meta"]["cost"])

For local smoke tests without waiting on the render, pass {"poll": False} to get the provider request ID. The async factory is available as get_async_video_gen_driver_for_model().

Runnable example: python examples/grok_video_generation_example.py.

Runway

Runway is a single API surface covering image, video, and audio. One key (RUNWAY_API_KEY, or RUNWAYML_API_SECRET) unlocks all of it:

from prompture.drivers.img_gen_registry import get_img_gen_driver_for_model
from prompture.drivers.video_gen_registry import get_video_gen_driver_for_model
from prompture.drivers.audio_registry import get_tts_driver_for_model
from prompture.drivers import RunwayAudioTransformDriver

# Image — text_to_image, optionally with reference images
img = get_img_gen_driver_for_model("runway/gpt_image_2").generate_image(
    "A cinematic wide shot of a neon-lit Tokyo alleyway at night in the rain",
    {"ratio": "1920:1080", "quality": "high"},
)

# Video — one driver, three modes (auto-detected from inputs)
vid = get_video_gen_driver_for_model("runway/gen4.5").generate_video(
    "wide cinematic shot of a rocket launching from desert dunes",
    {"ratio": "1280:720", "duration": 5},          # text_to_video
)
# Pass `image=...` → image_to_video; `video=...` → video_to_video (gen4_aleph).

# Speech and sound effects
tts = get_tts_driver_for_model("runway/eleven_multilingual_v2").synthesize(
    "Hello from Runway via Prompture.", {"voice": "Maya"},
)
sfx = get_tts_driver_for_model("runway/eleven_text_to_sound_v2").synthesize(
    "Heavy tropical rain on a metal roof", {"duration": 5},
)

# Voice transforms (audio in → audio out, not a registered modality)
dub = RunwayAudioTransformDriver().dub("https://.../speech.mp3", target_lang="es")

Inspect any model's capabilities (operations, endpoints, cost) as data — no need to instantiate the driver:

from prompture.drivers import get_runway_model_info, get_runway_models_by_op

get_runway_model_info("gen4.5")
# {'modality': 'video',
#  'operations': ['text_to_video', 'image_to_video'],
#  'endpoints':  ['/v1/text_to_video', '/v1/image_to_video'],
#  'cost': '$0.12 per second'}

get_runway_models_by_op("text_to_video")
# ['gen4.5', 'veo3', 'veo3.1', 'veo3.1_fast']

Runnable examples:

  • python examples/runway_image_generation_example.py
  • python examples/runway_video_generation_example.py
  • python examples/runway_audio_example.py

Usage

One-Shot Pydantic Extraction

Single LLM call, returns a validated Pydantic instance:

from typing import List, Optional
from pydantic import BaseModel
from prompture import extract_with_model

class Person(BaseModel):
    name: str
    age: int
    profession: str
    city: str
    hobbies: List[str]
    education: Optional[str] = None

person = extract_with_model(
    Person,
    "Maria is 32, a software developer in New York. She loves hiking and photography.",
    model_name="openai/gpt-4"
)
print(person.model_dump())

Stepwise Extraction

One LLM call per field. Higher accuracy, per-field error recovery:

from prompture import stepwise_extract_with_model

result = stepwise_extract_with_model(
    Person,
    "Maria is 32, a software developer in New York. She loves hiking and photography.",
    model_name="openai/gpt-4"
)
print(result["model"].model_dump())
print(result["usage"])  # per-field and total token usage
Aspect extract_with_model stepwise_extract_with_model
LLM calls 1 N (one per field)
Speed / cost Faster, cheaper Slower, higher
Accuracy Good global coherence Higher per-field accuracy
Error handling All-or-nothing Per-field recovery

JSON Schema Extraction

For raw JSON output with full control:

from prompture import ask_for_json

schema = {
    "type": "object",
    "required": ["name", "age"],
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    }
}

result = ask_for_json(
    content_prompt="Extract the person's info from: John is 28 and lives in Miami.",
    json_schema=schema,
    model_name="openai/gpt-4"
)
print(result["json_object"])  # {"name": "John", "age": 28}
print(result["usage"])        # token counts and cost

Strategy Cascade

Prompture picks how to obtain structured JSON based on each model's capabilities. The cascade is provider_native (built-in JSON mode / schema enforcement) → tool_call (encode the schema as a function definition and read it back from the tool call) → prompted_repair (prompt for JSON, repair malformed output via AI cleanup). Pass strategy="auto" (default) to let Prompture select per model, or pin a specific strategy via the StructuredOutputStrategy enum or its string value. The strategy used is recorded in the response so you can see which path each call took.

Multi-Model Fallback

Try a list of models in priority order, with full per-attempt accounting — every model tried (success, failure, or skipped) is recorded with its cost, tokens, duration, capabilities, and strategy. The first success wins; if all fail, an optional fallback Pydantic instance is returned instead of raising.

from prompture import extract_with_models

result = extract_with_models(
    Person,
    "Maria is 32, a software developer in NYC.",
    models=[
        "openai/gpt-4o-mini",        # try first
        "claude/claude-3-5-haiku",   # fallback
        "ollama/llama3.1:8b",        # last resort, free
    ],
    fallback=Person(name="unknown", age=0, profession="unknown"),
)

print(result["selected_model"])     # winning model string
print(result["model"])              # validated Pydantic instance
print(result["total_cost"])         # cumulative cost across all attempts
print(result["total_attempts"])     # number of models actually called

for attempt in result["attempts"]:
    print(
        attempt["model"],
        attempt["status"],          # "success" | "failed" | "skipped"
        attempt["strategy"],        # "single" | "stepwise"
        attempt["cost"],
        attempt["prompt_tokens"],
        attempt["completion_tokens"],
        attempt["duration_ms"],
        attempt["capabilities"],    # {"json_mode": bool, "json_schema": bool}
    )

If every model fails and no fallback is provided, an ExtractionError is raised with the full attempts list, total_cost, and total_tokens attached as attributes.

TOON Input — Token Savings

Analyze structured data with automatic TOON conversion for 45-60% fewer tokens:

from prompture import extract_from_data

products = [
    {"id": 1, "name": "Laptop", "price": 999.99, "rating": 4.5},
    {"id": 2, "name": "Book", "price": 19.99, "rating": 4.2},
    {"id": 3, "name": "Headphones", "price": 149.99, "rating": 4.7},
]

result = extract_from_data(
    data=products,
    question="What is the average price and highest rated product?",
    json_schema={
        "type": "object",
        "properties": {
            "average_price": {"type": "number"},
            "highest_rated": {"type": "string"}
        }
    },
    model_name="openai/gpt-4"
)

print(result["json_object"])
# {"average_price": 389.99, "highest_rated": "Headphones"}

print(f"Token savings: {result['token_savings']['percentage_saved']}%")

Works with Pandas DataFrames via extract_from_pandas().

Field Definitions

Use the built-in field registry for consistent extraction across models:

from pydantic import BaseModel
from prompture import field_from_registry, stepwise_extract_with_model

class Person(BaseModel):
    name: str = field_from_registry("name")
    age: int = field_from_registry("age")
    email: str = field_from_registry("email")
    occupation: str = field_from_registry("occupation")

result = stepwise_extract_with_model(
    Person,
    "John Smith, 25, software engineer at TechCorp, john@example.com",
    model_name="openai/gpt-4"
)

Register custom fields with template variables:

from prompture import register_field

register_field("document_date", {
    "type": "str",
    "description": "Document creation date",
    "instructions": "Use {{current_date}} if not specified",
    "default": "{{current_date}}",
    "nullable": False
})

Conversations

Stateful multi-turn sessions:

from prompture import Conversation

conv = Conversation(model_name="openai/gpt-4")
conv.add_message("system", "You are a helpful assistant.")
response = conv.send("What is the capital of France?")
follow_up = conv.send("What about Germany?")  # retains context

Tool Use

Register Python functions as tools the LLM can call during a conversation:

from prompture import Conversation, ToolRegistry

registry = ToolRegistry()

@registry.tool
def get_weather(city: str, units: str = "celsius") -> str:
    """Get the current weather for a city."""
    return f"Weather in {city}: 22 {units}"

conv = Conversation("openai/gpt-4", tools=registry)
result = conv.ask("What's the weather in London?")

For models without native function calling (Ollama, LM Studio, etc.), Prompture automatically simulates tool use by describing tools in the prompt and parsing structured JSON responses:

# Auto-detect: uses native tool calling if available, simulation otherwise
conv = Conversation("ollama/llama3.1:8b", tools=registry, simulated_tools="auto")

# Force simulation even on capable models
conv = Conversation("openai/gpt-4", tools=registry, simulated_tools=True)

# Disable tool use entirely
conv = Conversation("openai/gpt-4", tools=registry, simulated_tools=False)

The simulation loop describes tools in the system prompt, asks the model to respond with JSON (tool_call or final_answer), executes tools, and feeds results back — all transparent to the caller.

Budget Control

Set cost and token limits with policy-based enforcement:

from prompture import AsyncAgent

agent = AsyncAgent(
    "openai/gpt-4o",
    max_cost=0.50,
    budget_policy="hard_stop",       # accepts strings or BudgetPolicy enum
    fallback_models=["openai/gpt-4o-mini"],
)

Policies: "hard_stop" (raise BudgetExceededError on exceed), "warn_and_continue" (log and proceed), "degrade" (auto-switch to cheaper model at 80% budget).

Provider Utilities

Extract provider info from model strings:

from prompture import provider_for_model, parse_model_string

provider_for_model("claude/claude-sonnet-4-6")                  # "claude"
provider_for_model("claude/claude-sonnet-4-6", canonical=True)  # "anthropic"
parse_model_string("openai/gpt-4o")                             # ("openai", "gpt-4o")

Model Discovery

Auto-detect available models from configured providers:

from prompture import get_available_models

models = get_available_models()
for model in models:
    print(model)  # "openai/gpt-4", "ollama/llama3:latest", ...

For non-LLM modalities, use the matching helper:

from prompture.infra.discovery import (
    get_available_image_gen_models,
    get_available_video_gen_models,
    get_available_audio_models,
)

get_available_image_gen_models()        # ['runway/gpt_image_2', 'openai/dall-e-3', ...]
get_available_video_gen_models()        # ['runway/gen4.5', 'runway/gen4_aleph', ...]
get_available_audio_models(modality="tts")  # ['runway/eleven_multilingual_v2', ...]

Logging and Debugging

import logging
from prompture import configure_logging

configure_logging(logging.DEBUG)

Response Shape

All extraction functions return a consistent structure:

{
    "json_string": str,       # raw JSON text
    "json_object": dict,      # parsed result
    "usage": {
        "prompt_tokens": int,
        "completion_tokens": int,
        "total_tokens": int,
        "cost": float,
        "model_name": str
    }
}

CLI

prompture run <spec-file>

Run spec-driven extraction suites for cross-model comparison.

Integrating Prompture into Your Project

FastAPI + AsyncAgent with Tools

The most common integration pattern — an AI chat endpoint with database-backed tools:

from fastapi import APIRouter, Depends
from prompture import AsyncAgent, ToolRegistry, ProviderEnvironment, BudgetExceededError

router = APIRouter()

def build_tools(db) -> ToolRegistry:
    registry = ToolRegistry()

    @registry.tool
    async def search_records(query: str) -> str:
        """Search the database for matching records."""
        results = await db.execute(...)
        return format_results(results)

    return registry

@router.post("/chat")
async def chat(message: str, db=Depends(get_db)):
    env = ProviderEnvironment(openai_api_key=get_api_key_from_db(db))

    agent = AsyncAgent(
        "openai/gpt-4o",
        env=env,
        tools=build_tools(db),
        system_prompt="You are a helpful assistant with database access.",
        max_cost=0.25,
        budget_policy="hard_stop",
    )

    try:
        result = await agent.run(message)
        return {"reply": result.output_text, "usage": result.usage}
    except BudgetExceededError:
        return {"error": "Cost limit exceeded"}, 429

SSE Streaming Endpoint

Stream responses via Server-Sent Events:

from fastapi.responses import StreamingResponse
from prompture import AsyncAgent, StreamEventType

@router.post("/chat/stream")
async def chat_stream(message: str):
    agent = AsyncAgent("claude/claude-sonnet-4-6", env=env, system_prompt="...")

    async def event_stream():
        async for event in agent.run_stream(message):
            match event.event_type:
                case StreamEventType.text_delta:
                    yield f"data: {json.dumps({'type': 'text', 'content': event.data})}\n\n"
                case StreamEventType.tool_call:
                    yield f"data: {json.dumps({'type': 'tool_call', 'name': event.data['name']})}\n\n"
                case StreamEventType.output:
                    yield f"data: {json.dumps({'type': 'done'})}\n\n"

    return StreamingResponse(event_stream(), media_type="text/event-stream")

Structured Extraction in Endpoints

Use AsyncConversation.ask_for_json() for one-shot structured data extraction:

from prompture import AsyncConversation

@router.get("/insights")
async def get_insights():
    conv = AsyncConversation("openai/gpt-4o", system_prompt="You analyze data.")
    result = await conv.ask_for_json(
        f"Analyze this data and produce insights:\n\n{context}",
        {"type": "object", "properties": {
            "insights": {"type": "array", "items": {"type": "object", ...}},
            "summary": {"type": "string"},
        }},
    )
    return result["json_object"]

Error Handling

Key exceptions to catch in production:

from prompture import BudgetExceededError, DriverError, ExtractionError, ValidationError

try:
    result = await agent.run(message)
except BudgetExceededError:
    # Cost or token limit exceeded — return 429
    pass
except DriverError:
    # Provider API error (auth, rate limit, network) — return 502
    pass
except ExtractionError:
    # JSON parsing/validation failed — return 422
    pass
except ValidationError:
    # Schema validation failed — return 422
    pass

Development

# Install with dev dependencies
pip install -e ".[test,dev]"

# Run tests
pytest

# Run integration tests (requires live LLM access)
pytest --run-integration

# Lint and format
ruff check .
ruff format .

Contributing

PRs welcome. Please add tests for new functionality and examples under examples/ for new drivers or patterns.

License

MIT

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompture-1.0.57.tar.gz (459.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompture-1.0.57-py3-none-any.whl (512.0 kB view details)

Uploaded Python 3

File details

Details for the file prompture-1.0.57.tar.gz.

File metadata

  • Download URL: prompture-1.0.57.tar.gz
  • Upload date:
  • Size: 459.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for prompture-1.0.57.tar.gz
Algorithm Hash digest
SHA256 5043a510c769fcf23a2e918edaec025998dc428f58f5fa2bbfefda468a1c78fc
MD5 c258dd6e844d0ca34e770772b0c1a610
BLAKE2b-256 958f6fd467f32fae48f7b93be5bb135a8675714cdbfbac43ac491da85fea3668

See more details on using hashes here.

File details

Details for the file prompture-1.0.57-py3-none-any.whl.

File metadata

  • Download URL: prompture-1.0.57-py3-none-any.whl
  • Upload date:
  • Size: 512.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for prompture-1.0.57-py3-none-any.whl
Algorithm Hash digest
SHA256 40f85d4191e55a154ba604a96784dfc179119f60ac91e4134a58130d0f30b4db
MD5 41926ac617a027d09404c1a78a8bb0e9
BLAKE2b-256 efce47757f9fe5f688e5f0b9ad910cbd666ff689d8064da826650465d4e912cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page