Ask LLMs to return structured JSON and run cross-model tests. API-first.
Project description
Prompture
Structured JSON extraction from any LLM. Schema-enforced, Pydantic-native, multi-provider.
Prompture is a Python library that turns LLM responses into validated, structured data. Define a schema or Pydantic model, point it at any provider, and get typed output back — with token tracking, cost calculation, and automatic JSON repair built in.
from pydantic import BaseModel
from prompture import extract_with_model
class Person(BaseModel):
name: str
age: int
profession: str
person = extract_with_model(Person, "Maria is 32, a developer in NYC.", model_name="openai/gpt-4")
print(person.name) # Maria
Key Features
- Structured output — JSON schema enforcement and direct Pydantic model population
- 12 providers — OpenAI, Claude, Google, Groq, Grok, Azure, Ollama, LM Studio, OpenRouter, HuggingFace, AirLLM, and generic HTTP
- TOON input conversion — 45-60% token savings when sending structured data via Token-Oriented Object Notation
- Stepwise extraction — Per-field prompts with smart type coercion (shorthand numbers, multilingual booleans, dates)
- Field registry — 50+ predefined extraction fields with template variables and Pydantic integration
- Conversations — Stateful multi-turn sessions with sync and async support
- Tool use — Function calling and streaming across supported providers, with automatic prompt-based simulation for models without native tool support
- Caching — Built-in response cache with memory, SQLite, and Redis backends
- Plugin system — Register custom drivers via entry points
- Usage tracking — Token counts and cost calculation on every call
- Auto-repair — Optional second LLM pass to fix malformed JSON
- Batch testing — Spec-driven suites to compare models side by side
Built With Prompture
Projects powered by Prompture at their core:
- CachiBot — AI-powered bot built on Prompture's structured extraction and multi-provider driver system
- AgentSite — Agent-driven web platform using Prompture for LLM orchestration and structured output
Installation
pip install prompture
Optional extras:
pip install prompture[redis] # Redis cache backend
pip install prompture[serve] # FastAPI server mode
pip install prompture[airllm] # AirLLM local inference
Configuration
Set API keys for the providers you use. Prompture reads from environment variables or a .env file:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
GROQ_API_KEY=...
GROK_API_KEY=...
OPENROUTER_API_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_API_KEY=...
Local providers (Ollama, LM Studio) work out of the box with no keys required.
Runtime API Keys (No Environment Variables)
Pass API keys at runtime via ProviderEnvironment — useful for multi-tenant apps, web backends, or anywhere you don't want to set os.environ:
from prompture import AsyncAgent, ProviderEnvironment
env = ProviderEnvironment(
openai_api_key="sk-...",
claude_api_key="sk-ant-...",
)
agent = AsyncAgent("openai/gpt-4o", env=env)
result = await agent.run("Hello!")
Works on Agent, AsyncAgent, Conversation, and AsyncConversation.
Providers
Model strings use "provider/model" format. The provider prefix routes to the correct driver automatically.
| Provider | Example Model | Cost |
|---|---|---|
openai |
openai/gpt-4 |
Automatic |
claude |
claude/claude-3 |
Automatic |
google |
google/gemini-1.5-pro |
Automatic |
groq |
groq/llama2-70b-4096 |
Automatic |
grok |
grok/grok-4-fast-reasoning |
Automatic |
azure |
azure/deployed-name |
Automatic |
openrouter |
openrouter/anthropic/claude-2 |
Automatic |
ollama |
ollama/llama3.1:8b |
Free (local) |
lmstudio |
lmstudio/local-model |
Free (local) |
huggingface |
hf/model-name |
Free (local) |
http |
http/self-hosted |
Free |
Usage
One-Shot Pydantic Extraction
Single LLM call, returns a validated Pydantic instance:
from typing import List, Optional
from pydantic import BaseModel
from prompture import extract_with_model
class Person(BaseModel):
name: str
age: int
profession: str
city: str
hobbies: List[str]
education: Optional[str] = None
person = extract_with_model(
Person,
"Maria is 32, a software developer in New York. She loves hiking and photography.",
model_name="openai/gpt-4"
)
print(person.model_dump())
Stepwise Extraction
One LLM call per field. Higher accuracy, per-field error recovery:
from prompture import stepwise_extract_with_model
result = stepwise_extract_with_model(
Person,
"Maria is 32, a software developer in New York. She loves hiking and photography.",
model_name="openai/gpt-4"
)
print(result["model"].model_dump())
print(result["usage"]) # per-field and total token usage
| Aspect | extract_with_model |
stepwise_extract_with_model |
|---|---|---|
| LLM calls | 1 | N (one per field) |
| Speed / cost | Faster, cheaper | Slower, higher |
| Accuracy | Good global coherence | Higher per-field accuracy |
| Error handling | All-or-nothing | Per-field recovery |
JSON Schema Extraction
For raw JSON output with full control:
from prompture import ask_for_json
schema = {
"type": "object",
"required": ["name", "age"],
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
}
}
result = ask_for_json(
content_prompt="Extract the person's info from: John is 28 and lives in Miami.",
json_schema=schema,
model_name="openai/gpt-4"
)
print(result["json_object"]) # {"name": "John", "age": 28}
print(result["usage"]) # token counts and cost
TOON Input — Token Savings
Analyze structured data with automatic TOON conversion for 45-60% fewer tokens:
from prompture import extract_from_data
products = [
{"id": 1, "name": "Laptop", "price": 999.99, "rating": 4.5},
{"id": 2, "name": "Book", "price": 19.99, "rating": 4.2},
{"id": 3, "name": "Headphones", "price": 149.99, "rating": 4.7},
]
result = extract_from_data(
data=products,
question="What is the average price and highest rated product?",
json_schema={
"type": "object",
"properties": {
"average_price": {"type": "number"},
"highest_rated": {"type": "string"}
}
},
model_name="openai/gpt-4"
)
print(result["json_object"])
# {"average_price": 389.99, "highest_rated": "Headphones"}
print(f"Token savings: {result['token_savings']['percentage_saved']}%")
Works with Pandas DataFrames via extract_from_pandas().
Field Definitions
Use the built-in field registry for consistent extraction across models:
from pydantic import BaseModel
from prompture import field_from_registry, stepwise_extract_with_model
class Person(BaseModel):
name: str = field_from_registry("name")
age: int = field_from_registry("age")
email: str = field_from_registry("email")
occupation: str = field_from_registry("occupation")
result = stepwise_extract_with_model(
Person,
"John Smith, 25, software engineer at TechCorp, john@example.com",
model_name="openai/gpt-4"
)
Register custom fields with template variables:
from prompture import register_field
register_field("document_date", {
"type": "str",
"description": "Document creation date",
"instructions": "Use {{current_date}} if not specified",
"default": "{{current_date}}",
"nullable": False
})
Conversations
Stateful multi-turn sessions:
from prompture import Conversation
conv = Conversation(model_name="openai/gpt-4")
conv.add_message("system", "You are a helpful assistant.")
response = conv.send("What is the capital of France?")
follow_up = conv.send("What about Germany?") # retains context
Tool Use
Register Python functions as tools the LLM can call during a conversation:
from prompture import Conversation, ToolRegistry
registry = ToolRegistry()
@registry.tool
def get_weather(city: str, units: str = "celsius") -> str:
"""Get the current weather for a city."""
return f"Weather in {city}: 22 {units}"
conv = Conversation("openai/gpt-4", tools=registry)
result = conv.ask("What's the weather in London?")
For models without native function calling (Ollama, LM Studio, etc.), Prompture automatically simulates tool use by describing tools in the prompt and parsing structured JSON responses:
# Auto-detect: uses native tool calling if available, simulation otherwise
conv = Conversation("ollama/llama3.1:8b", tools=registry, simulated_tools="auto")
# Force simulation even on capable models
conv = Conversation("openai/gpt-4", tools=registry, simulated_tools=True)
# Disable tool use entirely
conv = Conversation("openai/gpt-4", tools=registry, simulated_tools=False)
The simulation loop describes tools in the system prompt, asks the model to respond with JSON (tool_call or final_answer), executes tools, and feeds results back — all transparent to the caller.
Budget Control
Set cost and token limits with policy-based enforcement:
from prompture import AsyncAgent
agent = AsyncAgent(
"openai/gpt-4o",
max_cost=0.50,
budget_policy="hard_stop", # accepts strings or BudgetPolicy enum
fallback_models=["openai/gpt-4o-mini"],
)
Policies: "hard_stop" (raise BudgetExceededError on exceed), "warn_and_continue" (log and proceed), "degrade" (auto-switch to cheaper model at 80% budget).
Provider Utilities
Extract provider info from model strings:
from prompture import provider_for_model, parse_model_string
provider_for_model("claude/claude-sonnet-4-6") # "claude"
provider_for_model("claude/claude-sonnet-4-6", canonical=True) # "anthropic"
parse_model_string("openai/gpt-4o") # ("openai", "gpt-4o")
Model Discovery
Auto-detect available models from configured providers:
from prompture import get_available_models
models = get_available_models()
for model in models:
print(model) # "openai/gpt-4", "ollama/llama3:latest", ...
Logging and Debugging
import logging
from prompture import configure_logging
configure_logging(logging.DEBUG)
Response Shape
All extraction functions return a consistent structure:
{
"json_string": str, # raw JSON text
"json_object": dict, # parsed result
"usage": {
"prompt_tokens": int,
"completion_tokens": int,
"total_tokens": int,
"cost": float,
"model_name": str
}
}
CLI
prompture run <spec-file>
Run spec-driven extraction suites for cross-model comparison.
Integrating Prompture into Your Project
FastAPI + AsyncAgent with Tools
The most common integration pattern — an AI chat endpoint with database-backed tools:
from fastapi import APIRouter, Depends
from prompture import AsyncAgent, ToolRegistry, ProviderEnvironment, BudgetExceededError
router = APIRouter()
def build_tools(db) -> ToolRegistry:
registry = ToolRegistry()
@registry.tool
async def search_records(query: str) -> str:
"""Search the database for matching records."""
results = await db.execute(...)
return format_results(results)
return registry
@router.post("/chat")
async def chat(message: str, db=Depends(get_db)):
env = ProviderEnvironment(openai_api_key=get_api_key_from_db(db))
agent = AsyncAgent(
"openai/gpt-4o",
env=env,
tools=build_tools(db),
system_prompt="You are a helpful assistant with database access.",
max_cost=0.25,
budget_policy="hard_stop",
)
try:
result = await agent.run(message)
return {"reply": result.output_text, "usage": result.usage}
except BudgetExceededError:
return {"error": "Cost limit exceeded"}, 429
SSE Streaming Endpoint
Stream responses via Server-Sent Events:
from fastapi.responses import StreamingResponse
from prompture import AsyncAgent, StreamEventType
@router.post("/chat/stream")
async def chat_stream(message: str):
agent = AsyncAgent("claude/claude-sonnet-4-6", env=env, system_prompt="...")
async def event_stream():
async for event in agent.run_stream(message):
match event.event_type:
case StreamEventType.text_delta:
yield f"data: {json.dumps({'type': 'text', 'content': event.data})}\n\n"
case StreamEventType.tool_call:
yield f"data: {json.dumps({'type': 'tool_call', 'name': event.data['name']})}\n\n"
case StreamEventType.output:
yield f"data: {json.dumps({'type': 'done'})}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")
Structured Extraction in Endpoints
Use AsyncConversation.ask_for_json() for one-shot structured data extraction:
from prompture import AsyncConversation
@router.get("/insights")
async def get_insights():
conv = AsyncConversation("openai/gpt-4o", system_prompt="You analyze data.")
result = await conv.ask_for_json(
f"Analyze this data and produce insights:\n\n{context}",
{"type": "object", "properties": {
"insights": {"type": "array", "items": {"type": "object", ...}},
"summary": {"type": "string"},
}},
)
return result["json_object"]
Error Handling
Key exceptions to catch in production:
from prompture import BudgetExceededError, DriverError, ExtractionError, ValidationError
try:
result = await agent.run(message)
except BudgetExceededError:
# Cost or token limit exceeded — return 429
pass
except DriverError:
# Provider API error (auth, rate limit, network) — return 502
pass
except ExtractionError:
# JSON parsing/validation failed — return 422
pass
except ValidationError:
# Schema validation failed — return 422
pass
Development
# Install with dev dependencies
pip install -e ".[test,dev]"
# Run tests
pytest
# Run integration tests (requires live LLM access)
pytest --run-integration
# Lint and format
ruff check .
ruff format .
Contributing
PRs welcome. Please add tests for new functionality and examples under examples/ for new drivers or patterns.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prompture-1.0.51.tar.gz.
File metadata
- Download URL: prompture-1.0.51.tar.gz
- Upload date:
- Size: 447.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f38a11f7e72aa93cbda2dd045e15e86aaec2d6d50d45194632ec6e7323cce631
|
|
| MD5 |
ec843a09e4860c1326b35f85a9d28260
|
|
| BLAKE2b-256 |
ac87973f8c62fe577ee18067d54420144d62881111cfc1e2bc27a963061e4df2
|
File details
Details for the file prompture-1.0.51-py3-none-any.whl.
File metadata
- Download URL: prompture-1.0.51-py3-none-any.whl
- Upload date:
- Size: 429.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
179c8aed06aaeb8ddd908b7d1c727538b02f51aef17a6549eb5e584f039e12b9
|
|
| MD5 |
718b91b1432246d985bcb1ed1dabfeb5
|
|
| BLAKE2b-256 |
60e0e8cea6475fc3e475caaf85a199b3d3dcd1c774c26f060e0071e813d077cf
|