Skip to main content

A Python library for reliably extracting structured JSON from text and validating it with Pydantic models.

Project description

LLMStruct

llmstruct is a Python library for reliably extracting structured JSON from text and validating it with Pydantic models.

Motivation

Many developers prefer to interact directly with an LLM provider's Python module (e.g., anthropic, openai) to maintain flexibility and avoid heavy abstractions. While this approach is powerful, it introduces the recurring challenge of parsing structured data, like JSON, from the model's often unstructured text responses.

LLM outputs can be noisy—containing anything from conversational filler to malformed JSON. Building a resilient extraction and validation pipeline for every project is tedious.

llmstruct addresses this by offering a lightweight, focused solution. It allows you to keep your direct API access while providing a reliable mechanism to extract structured data from text and validate it against Pydantic models. This handles the messy parts of data extraction, letting you focus on your core application logic.

Features

  • Extracts JSON objects or arrays of JSON objects from messy text.
  • Validates extracted JSON against Pydantic models.
  • Resilient to surrounding text and other noise.

Installation

Simply install the library from pipy using pip:

pip install llmstruct

Or add it to your project using uv:

uv add llmstruct

Usage

Here is a quick overview of how to use llmstruct:

from pydantic import BaseModel
from anthropic import Anthropic
from llmstruct import extract_structure_from_text

# Define your data structure using Pydantic
class Superhero(BaseModel):
    real_name: str
    cover_name: str
    origin: str
    interests: tuple[str, ...]
    powers: tuple[str, ...]

# Create a client for your LLM provider
# Note: this requires the ANTHROPIC_API_KEY environment variable to be set
client = Anthropic()

# Create a prompt that asks for JSON data
prompt = f"""
Generate two random superheroes.
Explain how they are similar and different, and how they met.
Then, write the heroes' data as a JSON array,
where each object conforms to this schema:
{Superhero.model_json_schema()}
"""

# Get the response from the LLM
message = client.messages.create(
    model="claude-3-5-sonnet-latest",
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt}],
).content[0].text


# Extract the JSON from the response text
result = extract_structure_from_text(message, Superhero)

# Now you can use the parsed and validated objects
for hero in result.parsed_objects:
    print(hero.model_dump_json(indent=2))

This will output:

{
  "real_name": "Elena Rodriguez",
  "cover_name": "Quantum Shift",
  "origin": "Brilliant physicist exposed to experimental quantum energy",
  "interests": [
    "Theoretical physics",
    "Urban community development",
    "Martial arts"
  ],
  "powers": [
    "Teleportation",
    "Time Control",
    "Telekinesis"
  ]
}
{
  "real_name": "Jack Harper",
  "cover_name": "Stormweaver",
  "origin": "Electrical engineer transformed by freak lightning strike",
  "interests": [
    "Weather science",
    "Rock climbing",
    "Emergency response"
  ],
  "powers": [
    "Flight",
    "Super Speed",
    "Telekinesis"
  ]
}

Note that we did not have to request only JSON from the LLM. In fact, we were able to request the generation of priming context.

Demos

The demos directory contains more examples of how to use this library. Note that these demos use Anthropic's API, so they need an API key for the client to be instantiated.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmstruct-0.1.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmstruct-0.1.1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file llmstruct-0.1.1.tar.gz.

File metadata

  • Download URL: llmstruct-0.1.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.27

File hashes

Hashes for llmstruct-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b6a4bf79e70725e00d7a1e33ab4af3a466718f5c2b093b1a4c8e5ad9be5c7af5
MD5 dd5574044b80fc7f59d775d2de2c7dce
BLAKE2b-256 2bf5d24f2d7348dbc969a46ef4efd53dc5682d21d555b1b8785947ae0e8913fd

See more details on using hashes here.

File details

Details for the file llmstruct-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llmstruct-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.27

File hashes

Hashes for llmstruct-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3632c8e69157f7fc316c63d0592f46f3b6c6324975ec2071dc0c7e1cb2a7899d
MD5 845b4589699833a765989e3c7854a3b0
BLAKE2b-256 1fdeea0bccf7dd2c521f84280710b177fcadb76df4f10295441fd0ab1e1de2ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page