Skip to main content

Ask LLMs to return structured JSON and run cross-model tests. API-first.

Project description

Prompture

Structured JSON extraction from any LLM. Schema-enforced, Pydantic-native, multi-provider.

PyPI version Python versions License: MIT Downloads GitHub stars


Prompture is a Python library that turns LLM responses into validated, structured data. Define a schema or Pydantic model, point it at any provider, and get typed output back — with token tracking, cost calculation, and automatic JSON repair built in.

from pydantic import BaseModel
from prompture import extract_with_model

class Person(BaseModel):
    name: str
    age: int
    profession: str

person = extract_with_model(Person, "Maria is 32, a developer in NYC.", model_name="openai/gpt-4")
print(person.name)  # Maria

Key Features

  • Structured output — JSON schema enforcement and direct Pydantic model population
  • 12 providers — OpenAI, Claude, Google, Groq, Grok, Azure, Ollama, LM Studio, OpenRouter, HuggingFace, AirLLM, and generic HTTP
  • TOON input conversion — 45-60% token savings when sending structured data via Token-Oriented Object Notation
  • Stepwise extraction — Per-field prompts with smart type coercion (shorthand numbers, multilingual booleans, dates)
  • Field registry — 50+ predefined extraction fields with template variables and Pydantic integration
  • Conversations — Stateful multi-turn sessions with sync and async support
  • Tool use — Function calling and streaming across supported providers, with automatic prompt-based simulation for models without native tool support
  • Caching — Built-in response cache with memory, SQLite, and Redis backends
  • Plugin system — Register custom drivers via entry points
  • Usage tracking — Token counts and cost calculation on every call
  • Auto-repair — Optional second LLM pass to fix malformed JSON
  • Batch testing — Spec-driven suites to compare models side by side

Built With Prompture

Projects powered by Prompture at their core:

  • CachiBot — AI-powered bot built on Prompture's structured extraction and multi-provider driver system
  • AgentSite — Agent-driven web platform using Prompture for LLM orchestration and structured output

Installation

pip install prompture

Optional extras:

pip install prompture[redis]     # Redis cache backend
pip install prompture[serve]     # FastAPI server mode
pip install prompture[airllm]    # AirLLM local inference

Configuration

Set API keys for the providers you use. Prompture reads from environment variables or a .env file:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
GROQ_API_KEY=...
GROK_API_KEY=...
OPENROUTER_API_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_API_KEY=...

Local providers (Ollama, LM Studio) work out of the box with no keys required.

Runtime API Keys (No Environment Variables)

Pass API keys at runtime via ProviderEnvironment — useful for multi-tenant apps, web backends, or anywhere you don't want to set os.environ:

from prompture import AsyncAgent, ProviderEnvironment

env = ProviderEnvironment(
    openai_api_key="sk-...",
    claude_api_key="sk-ant-...",
)

agent = AsyncAgent("openai/gpt-4o", env=env)
result = await agent.run("Hello!")

Works on Agent, AsyncAgent, Conversation, and AsyncConversation.

Providers

Model strings use "provider/model" format. The provider prefix routes to the correct driver automatically.

Provider Example Model Cost
openai openai/gpt-4 Automatic
claude claude/claude-3 Automatic
google google/gemini-1.5-pro Automatic
groq groq/llama2-70b-4096 Automatic
grok grok/grok-4-fast-reasoning Automatic
azure azure/deployed-name Automatic
openrouter openrouter/anthropic/claude-2 Automatic
ollama ollama/llama3.1:8b Free (local)
lmstudio lmstudio/local-model Free (local)
huggingface hf/model-name Free (local)
http http/self-hosted Free

Usage

One-Shot Pydantic Extraction

Single LLM call, returns a validated Pydantic instance:

from typing import List, Optional
from pydantic import BaseModel
from prompture import extract_with_model

class Person(BaseModel):
    name: str
    age: int
    profession: str
    city: str
    hobbies: List[str]
    education: Optional[str] = None

person = extract_with_model(
    Person,
    "Maria is 32, a software developer in New York. She loves hiking and photography.",
    model_name="openai/gpt-4"
)
print(person.model_dump())

Stepwise Extraction

One LLM call per field. Higher accuracy, per-field error recovery:

from prompture import stepwise_extract_with_model

result = stepwise_extract_with_model(
    Person,
    "Maria is 32, a software developer in New York. She loves hiking and photography.",
    model_name="openai/gpt-4"
)
print(result["model"].model_dump())
print(result["usage"])  # per-field and total token usage
Aspect extract_with_model stepwise_extract_with_model
LLM calls 1 N (one per field)
Speed / cost Faster, cheaper Slower, higher
Accuracy Good global coherence Higher per-field accuracy
Error handling All-or-nothing Per-field recovery

JSON Schema Extraction

For raw JSON output with full control:

from prompture import ask_for_json

schema = {
    "type": "object",
    "required": ["name", "age"],
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    }
}

result = ask_for_json(
    content_prompt="Extract the person's info from: John is 28 and lives in Miami.",
    json_schema=schema,
    model_name="openai/gpt-4"
)
print(result["json_object"])  # {"name": "John", "age": 28}
print(result["usage"])        # token counts and cost

TOON Input — Token Savings

Analyze structured data with automatic TOON conversion for 45-60% fewer tokens:

from prompture import extract_from_data

products = [
    {"id": 1, "name": "Laptop", "price": 999.99, "rating": 4.5},
    {"id": 2, "name": "Book", "price": 19.99, "rating": 4.2},
    {"id": 3, "name": "Headphones", "price": 149.99, "rating": 4.7},
]

result = extract_from_data(
    data=products,
    question="What is the average price and highest rated product?",
    json_schema={
        "type": "object",
        "properties": {
            "average_price": {"type": "number"},
            "highest_rated": {"type": "string"}
        }
    },
    model_name="openai/gpt-4"
)

print(result["json_object"])
# {"average_price": 389.99, "highest_rated": "Headphones"}

print(f"Token savings: {result['token_savings']['percentage_saved']}%")

Works with Pandas DataFrames via extract_from_pandas().

Field Definitions

Use the built-in field registry for consistent extraction across models:

from pydantic import BaseModel
from prompture import field_from_registry, stepwise_extract_with_model

class Person(BaseModel):
    name: str = field_from_registry("name")
    age: int = field_from_registry("age")
    email: str = field_from_registry("email")
    occupation: str = field_from_registry("occupation")

result = stepwise_extract_with_model(
    Person,
    "John Smith, 25, software engineer at TechCorp, john@example.com",
    model_name="openai/gpt-4"
)

Register custom fields with template variables:

from prompture import register_field

register_field("document_date", {
    "type": "str",
    "description": "Document creation date",
    "instructions": "Use {{current_date}} if not specified",
    "default": "{{current_date}}",
    "nullable": False
})

Conversations

Stateful multi-turn sessions:

from prompture import Conversation

conv = Conversation(model_name="openai/gpt-4")
conv.add_message("system", "You are a helpful assistant.")
response = conv.send("What is the capital of France?")
follow_up = conv.send("What about Germany?")  # retains context

Tool Use

Register Python functions as tools the LLM can call during a conversation:

from prompture import Conversation, ToolRegistry

registry = ToolRegistry()

@registry.tool
def get_weather(city: str, units: str = "celsius") -> str:
    """Get the current weather for a city."""
    return f"Weather in {city}: 22 {units}"

conv = Conversation("openai/gpt-4", tools=registry)
result = conv.ask("What's the weather in London?")

For models without native function calling (Ollama, LM Studio, etc.), Prompture automatically simulates tool use by describing tools in the prompt and parsing structured JSON responses:

# Auto-detect: uses native tool calling if available, simulation otherwise
conv = Conversation("ollama/llama3.1:8b", tools=registry, simulated_tools="auto")

# Force simulation even on capable models
conv = Conversation("openai/gpt-4", tools=registry, simulated_tools=True)

# Disable tool use entirely
conv = Conversation("openai/gpt-4", tools=registry, simulated_tools=False)

The simulation loop describes tools in the system prompt, asks the model to respond with JSON (tool_call or final_answer), executes tools, and feeds results back — all transparent to the caller.

Budget Control

Set cost and token limits with policy-based enforcement:

from prompture import AsyncAgent

agent = AsyncAgent(
    "openai/gpt-4o",
    max_cost=0.50,
    budget_policy="hard_stop",       # accepts strings or BudgetPolicy enum
    fallback_models=["openai/gpt-4o-mini"],
)

Policies: "hard_stop" (raise BudgetExceededError on exceed), "warn_and_continue" (log and proceed), "degrade" (auto-switch to cheaper model at 80% budget).

Provider Utilities

Extract provider info from model strings:

from prompture import provider_for_model, parse_model_string

provider_for_model("claude/claude-sonnet-4-6")                  # "claude"
provider_for_model("claude/claude-sonnet-4-6", canonical=True)  # "anthropic"
parse_model_string("openai/gpt-4o")                             # ("openai", "gpt-4o")

Model Discovery

Auto-detect available models from configured providers:

from prompture import get_available_models

models = get_available_models()
for model in models:
    print(model)  # "openai/gpt-4", "ollama/llama3:latest", ...

Logging and Debugging

import logging
from prompture import configure_logging

configure_logging(logging.DEBUG)

Response Shape

All extraction functions return a consistent structure:

{
    "json_string": str,       # raw JSON text
    "json_object": dict,      # parsed result
    "usage": {
        "prompt_tokens": int,
        "completion_tokens": int,
        "total_tokens": int,
        "cost": float,
        "model_name": str
    }
}

CLI

prompture run <spec-file>

Run spec-driven extraction suites for cross-model comparison.

Integrating Prompture into Your Project

FastAPI + AsyncAgent with Tools

The most common integration pattern — an AI chat endpoint with database-backed tools:

from fastapi import APIRouter, Depends
from prompture import AsyncAgent, ToolRegistry, ProviderEnvironment, BudgetExceededError

router = APIRouter()

def build_tools(db) -> ToolRegistry:
    registry = ToolRegistry()

    @registry.tool
    async def search_records(query: str) -> str:
        """Search the database for matching records."""
        results = await db.execute(...)
        return format_results(results)

    return registry

@router.post("/chat")
async def chat(message: str, db=Depends(get_db)):
    env = ProviderEnvironment(openai_api_key=get_api_key_from_db(db))

    agent = AsyncAgent(
        "openai/gpt-4o",
        env=env,
        tools=build_tools(db),
        system_prompt="You are a helpful assistant with database access.",
        max_cost=0.25,
        budget_policy="hard_stop",
    )

    try:
        result = await agent.run(message)
        return {"reply": result.output_text, "usage": result.usage}
    except BudgetExceededError:
        return {"error": "Cost limit exceeded"}, 429

SSE Streaming Endpoint

Stream responses via Server-Sent Events:

from fastapi.responses import StreamingResponse
from prompture import AsyncAgent, StreamEventType

@router.post("/chat/stream")
async def chat_stream(message: str):
    agent = AsyncAgent("claude/claude-sonnet-4-6", env=env, system_prompt="...")

    async def event_stream():
        async for event in agent.run_stream(message):
            match event.event_type:
                case StreamEventType.text_delta:
                    yield f"data: {json.dumps({'type': 'text', 'content': event.data})}\n\n"
                case StreamEventType.tool_call:
                    yield f"data: {json.dumps({'type': 'tool_call', 'name': event.data['name']})}\n\n"
                case StreamEventType.output:
                    yield f"data: {json.dumps({'type': 'done'})}\n\n"

    return StreamingResponse(event_stream(), media_type="text/event-stream")

Structured Extraction in Endpoints

Use AsyncConversation.ask_for_json() for one-shot structured data extraction:

from prompture import AsyncConversation

@router.get("/insights")
async def get_insights():
    conv = AsyncConversation("openai/gpt-4o", system_prompt="You analyze data.")
    result = await conv.ask_for_json(
        f"Analyze this data and produce insights:\n\n{context}",
        {"type": "object", "properties": {
            "insights": {"type": "array", "items": {"type": "object", ...}},
            "summary": {"type": "string"},
        }},
    )
    return result["json_object"]

Error Handling

Key exceptions to catch in production:

from prompture import BudgetExceededError, DriverError, ExtractionError, ValidationError

try:
    result = await agent.run(message)
except BudgetExceededError:
    # Cost or token limit exceeded — return 429
    pass
except DriverError:
    # Provider API error (auth, rate limit, network) — return 502
    pass
except ExtractionError:
    # JSON parsing/validation failed — return 422
    pass
except ValidationError:
    # Schema validation failed — return 422
    pass

Development

# Install with dev dependencies
pip install -e ".[test,dev]"

# Run tests
pytest

# Run integration tests (requires live LLM access)
pytest --run-integration

# Lint and format
ruff check .
ruff format .

Contributing

PRs welcome. Please add tests for new functionality and examples under examples/ for new drivers or patterns.

License

MIT

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompture-1.0.51.tar.gz (447.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompture-1.0.51-py3-none-any.whl (429.9 kB view details)

Uploaded Python 3

File details

Details for the file prompture-1.0.51.tar.gz.

File metadata

  • Download URL: prompture-1.0.51.tar.gz
  • Upload date:
  • Size: 447.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for prompture-1.0.51.tar.gz
Algorithm Hash digest
SHA256 f38a11f7e72aa93cbda2dd045e15e86aaec2d6d50d45194632ec6e7323cce631
MD5 ec843a09e4860c1326b35f85a9d28260
BLAKE2b-256 ac87973f8c62fe577ee18067d54420144d62881111cfc1e2bc27a963061e4df2

See more details on using hashes here.

File details

Details for the file prompture-1.0.51-py3-none-any.whl.

File metadata

  • Download URL: prompture-1.0.51-py3-none-any.whl
  • Upload date:
  • Size: 429.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for prompture-1.0.51-py3-none-any.whl
Algorithm Hash digest
SHA256 179c8aed06aaeb8ddd908b7d1c727538b02f51aef17a6549eb5e584f039e12b9
MD5 718b91b1432246d985bcb1ed1dabfeb5
BLAKE2b-256 60e0e8cea6475fc3e475caaf85a199b3d3dcd1c774c26f060e0071e813d077cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page