Extract structured, validated JSON from any LLM ÃƒÆ’Ã†â€™Ãƒâ€šÃ‚Â¢ÃƒÆ’Ã‚Â¢ÃƒÂ¢Ã¢â€šÂ¬Ã…Â¡Ãƒâ€šÃ‚Â¬ÃƒÆ’Ã‚Â¢ÃƒÂ¢Ã¢â‚¬Å¡Ã‚Â¬Ãƒâ€šÃ‚Â OpenAI, Anthropic, Gemini ÃƒÆ’Ã†â€™Ãƒâ€šÃ‚Â¢ÃƒÆ’Ã‚Â¢ÃƒÂ¢Ã¢â€šÂ¬Ã…Â¡Ãƒâ€šÃ‚Â¬ÃƒÆ’Ã‚Â¢ÃƒÂ¢Ã¢â‚¬Å¡Ã‚Â¬Ãƒâ€šÃ‚Â with batch extraction, caching, per-field confidence scoring, schema evolution, multi-schema extraction, output transforms, partial extraction, extraction diff, pipeline extraction, and smart auto-retry.

These details have not been verified by PyPI

Project links

Homepage

Project description

llm-extractor Banner

llm-extractor

Extract structured, validated JSON from any LLM.

pip install llm-extractor — then stop fighting JSON parsing bugs, provider-specific APIs, and silent semantic failures. One unified interface to extract structured data from OpenAI, Anthropic, and Gemini — with automatic retries, semantic rules, and full observability.

The Problem (2026)

Even with native structured outputs, Python developers still hit:

Pain	Reality
Provider fragmentation	OpenAI, Anthropic, Gemini all use different structured output APIs
Semantic failures	Valid JSON with nonsense values (`price: -999`, `email: "not-an-email"`)
Silent failures	Model returns `{}` or truncated object — no error raised
Dumb retries	Most code retries blindly with the same broken prompt
Zero observability	You know it failed but not why or how often

llm-extractor fixes all five.

Installation

pip install llm-extractor                   # core only
pip install "llm-extractor[openai]"         # + OpenAI
pip install "llm-extractor[anthropic]"      # + Anthropic
pip install "llm-extractor[google]"         # + Gemini
pip install "llm-extractor[all]"            # all providers

Quick Start

from llm_extract import extract, Schema, SemanticRule

# 1. Define your output schema
schema = Schema({
    "name": str,
    "age": int,
    "email": str,
    "score": float,
})

# 2. Add semantic rules
schema.add_rule(SemanticRule("age", min_value=0, max_value=150))
schema.add_rule(SemanticRule("score", min_value=0.0, max_value=100.0))
schema.add_rule(SemanticRule("email", pattern=r"^[^@]+@[^@]+\.[^@]+$"))

# 3. Extract structured output — works across all providers
result = extract(
    prompt="Extract info: John Doe, 34 years old, john@example.com, scored 87.5",
    schema=schema,
    provider="openai",          # or "anthropic", "gemini", "auto"
    model="gpt-4o-mini",
    api_key="sk-...",
    max_retries=3,
)

print(result.data)
# {'name': 'John Doe', 'age': 34, 'email': 'john@example.com', 'score': 87.5}

print(result.attempts)   # 1
print(result.provider)   # 'openai'

Pydantic Models

from pydantic import BaseModel
from llm_extract import extract

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    tags: list[str]

result = extract(
    prompt="Extract: Blue Widget, costs $29.99, currently available, tagged as gadget and home",
    schema=Product,
    provider="anthropic",
    model="claude-haiku-4-5-20251001",
    api_key="sk-ant-...",
)

product: Product = result.typed_data(Product)
print(product.price)  # 29.99

Semantic Rules

from llm_extract import SemanticRule, Schema

schema = Schema({"status": str, "count": int, "ratio": float})

# Enum constraint
schema.add_rule(SemanticRule("status", allowed_values=["active", "inactive", "pending"]))

# Range constraint
schema.add_rule(SemanticRule("count", min_value=0))
schema.add_rule(SemanticRule("ratio", min_value=0.0, max_value=1.0))

# Regex pattern
schema.add_rule(SemanticRule("email", pattern=r"^[^@]+@[^@]+\.[^@]+$"))

# Custom validator function
schema.add_rule(SemanticRule("count", validator=lambda v: v % 2 == 0, message="count must be even"))

Observability

from llm_extract import extract, ExtractObserver

observer = ExtractObserver()

result = extract(
    prompt="...",
    schema=schema,
    provider="openai",
    model="gpt-4o-mini",
    api_key="...",
    observer=observer,
)

# Per-call report
report = observer.report()
print(report.total_attempts)       # 2
print(report.validation_failures)  # [ValidationFailure(field='age', reason='below min_value 0')]
print(report.raw_responses)        # ['{"age": -5, ...}', '{"age": 34, ...}']
print(report.latency_ms)           # [342, 289]
print(report.tokens_used)          # {'input': 120, 'output': 45}

Multi-Provider Fallback

result = extract(
    prompt="...",
    schema=schema,
    provider="auto",   # tries providers in priority order
    fallback_chain=[
        {"provider": "openai",    "model": "gpt-4o-mini",               "api_key": "sk-..."},
        {"provider": "anthropic", "model": "claude-haiku-4-5-20251001",  "api_key": "sk-ant-..."},
        {"provider": "gemini",    "model": "gemini-1.5-flash",           "api_key": "AIza..."},
    ],
    max_retries=2,
)
print(result.provider)  # whichever succeeded

Async Support

import asyncio
from llm_extract import aextract

async def main():
    result = await aextract(
        prompt="...",
        schema=schema,
        provider="openai",
        model="gpt-4o-mini",
        api_key="...",
    )
    print(result.data)

asyncio.run(main())

Raise on Failure

from llm_extract import extract, ExtractValidationError

try:
    result = extract(..., raise_on_failure=True)
except ExtractValidationError as e:
    print(e.result.failures)   # list of ValidationFailure
    print(e.result.raw)        # last raw LLM response

JSON Schema Input

from llm_extract import extract, Schema

schema = Schema({
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "year":  {"type": "integer"},
        "rating": {"type": "number"}
    },
    "required": ["title", "year", "rating"]
})

result = extract(prompt="...", schema=schema, ...)

OpenAI-Compatible Endpoints

result = extract(
    prompt="...",
    schema=schema,
    provider="openai",
    model="mistral-7b-instruct",
    api_key="your-key",
    base_url="https://your-openai-compatible-endpoint/v1",
)

Why llm-extractor?

Unified API — one interface for OpenAI, Anthropic, Gemini, and any OpenAI-compatible endpoint
Schema-first — define once with dict, pydantic.BaseModel, or JSON Schema
Semantic rules — enforce business logic, not just types
Smart retries — correction prompts tell the model exactly what went wrong
Full observability — every attempt, failure, token count, and latency recorded
Zero magic — no hidden prompt injection, no global state, fully inspectable

Changelog

v1.2.0 (2026-04-10)

Added Changelog section to README for release traceability
Added advanced extraction utilities: ExtractionCache, RateLimiter, batch_extract, ConfidenceScorer, SchemaEvolver, ExtractionPipeline

v1.1.0

Added ExtractionCache, RateLimiter, batch_extract, ConfidenceScorer, SchemaEvolver, ExtractionPipeline; SEO updates

v1.0.0

Initial release: structured LLM output extraction, multi-provider, semantic validation, auto-retry, observability

License

MIT

Contributing

Contributions are welcome! Here's how to get started:

Fork the repository on GitHub
Create a feature branch: git checkout -b feature/your-feature
Make your changes and add tests
Run the test suite: pytest tests/ -v
Submit a pull request

Please open an issue first for major changes to discuss the approach.

Author

Mahesh Makvana — GitHub · PyPI

MIT License

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.2.6

May 17, 2026

This version

1.2.5

May 16, 2026

1.2.0

Apr 10, 2026

1.1.0

Apr 10, 2026

1.0.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_extractor-1.2.5.tar.gz (31.0 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_extractor-1.2.5-py3-none-any.whl (26.8 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file llm_extractor-1.2.5.tar.gz.

File metadata

Download URL: llm_extractor-1.2.5.tar.gz
Upload date: May 16, 2026
Size: 31.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llm_extractor-1.2.5.tar.gz
Algorithm	Hash digest
SHA256	`4cd237d55122173975013c90607b1c8f260b5e9a3aeb46ab283fedb964a40d3d`
MD5	`01199444d81c18703be47f807fc7dc76`
BLAKE2b-256	`a2c2f68410b9ae0102178a4b70683a7724794b87464af2282d00c00beda59f22`

See more details on using hashes here.

File details

Details for the file llm_extractor-1.2.5-py3-none-any.whl.

File metadata

Download URL: llm_extractor-1.2.5-py3-none-any.whl
Upload date: May 16, 2026
Size: 26.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for llm_extractor-1.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`040237354b91765995a273fc7035fabf13a891ac0e4da79fb7ee7d1298b068cc`
MD5	`48c81c613d583eee3aa934b4b3cc5c0a`
BLAKE2b-256	`28544034aa637b0fd1e9cc563c2971612f8703ecfb971f0b99692c95ddb4c6f1`

See more details on using hashes here.

llm-extractor 1.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llm-extractor

The Problem (2026)

Installation

Quick Start

Pydantic Models

Semantic Rules

Observability

Multi-Provider Fallback

Async Support

Raise on Failure

JSON Schema Input

OpenAI-Compatible Endpoints

Why llm-extractor?

Changelog

v1.2.0 (2026-04-10)

v1.1.0

v1.0.0

License

Contributing

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes