A Python library for reliably extracting structured JSON from text and validating it with Pydantic models.

Project description

LLMStruct

llmstruct is a Python library for reliably extracting structured JSON from text and validating it with Pydantic models.

Motivation

Many developers prefer to interact directly with an LLM provider's Python module (e.g., anthropic, openai) to maintain flexibility and avoid heavy abstractions. While this approach is powerful, it introduces the recurring challenge of parsing structured data, like JSON, from the model's often unstructured text responses.

LLM outputs can be noisy—containing anything from conversational filler to malformed JSON. Building a resilient extraction and validation pipeline for every project is tedious.

llmstruct addresses this by offering a lightweight, focused solution. It allows you to keep your direct API access while providing a reliable mechanism to extract structured data from text and validate it against Pydantic models. This handles the messy parts of data extraction, letting you focus on your core application logic.

Features

Extracts JSON objects or arrays of JSON objects from messy text.
Validates extracted JSON against Pydantic models.
Resilient to surrounding text and other noise.

Installation

Simply install the library from pipy using pip:

pip install llmstruct

Or add it to your project using uv:

uv add llmstruct

Usage

Here is a quick overview of how to use llmstruct:

from pydantic import BaseModel
from anthropic import Anthropic
from llmstruct import extract_structure_from_text

# Define your data structure using Pydantic
class Superhero(BaseModel):
    real_name: str
    cover_name: str
    origin: str
    interests: tuple[str, ...]
    powers: tuple[str, ...]

# Create a client for your LLM provider
# Note: this requires the ANTHROPIC_API_KEY environment variable to be set
client = Anthropic()

# Create a prompt that asks for JSON data
prompt = f"""
Generate two random superheroes.
Explain how they are similar and different, and how they met.
Then, write the heroes' data as a JSON array,
where each object conforms to this schema:
{Superhero.model_json_schema()}
"""

# Get the response from the LLM
message = client.messages.create(
    model="claude-3-5-sonnet-latest",
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt}],
).content[0].text


# Extract the JSON from the response text
result = extract_structure_from_text(message, Superhero)

# Now you can use the parsed and validated objects
for hero in result.parsed_objects:
    print(hero.model_dump_json(indent=2))

This will output:

{
  "real_name": "Elena Rodriguez",
  "cover_name": "Quantum Shift",
  "origin": "Brilliant physicist exposed to experimental quantum energy",
  "interests": [
    "Theoretical physics",
    "Urban community development",
    "Martial arts"
  ],
  "powers": [
    "Teleportation",
    "Time Control",
    "Telekinesis"
  ]
}
{
  "real_name": "Jack Harper",
  "cover_name": "Stormweaver",
  "origin": "Electrical engineer transformed by freak lightning strike",
  "interests": [
    "Weather science",
    "Rock climbing",
    "Emergency response"
  ],
  "powers": [
    "Flight",
    "Super Speed",
    "Telekinesis"
  ]
}

Note that we did not have to request only JSON from the LLM. In fact, we were able to request the generation of priming context.

Demos

The demos directory contains more examples of how to use this library. Note that these demos use Anthropic's API, so they need an API key for the client to be instantiated.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Jun 9, 2025

0.1.0

Jun 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmstruct-0.1.1.tar.gz (8.1 kB view details)

Uploaded Jun 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmstruct-0.1.1-py3-none-any.whl (6.8 kB view details)

Uploaded Jun 9, 2025 Python 3

File details

Details for the file llmstruct-0.1.1.tar.gz.

File metadata

Download URL: llmstruct-0.1.1.tar.gz
Upload date: Jun 9, 2025
Size: 8.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.27

File hashes

Hashes for llmstruct-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b6a4bf79e70725e00d7a1e33ab4af3a466718f5c2b093b1a4c8e5ad9be5c7af5`
MD5	`dd5574044b80fc7f59d775d2de2c7dce`
BLAKE2b-256	`2bf5d24f2d7348dbc969a46ef4efd53dc5682d21d555b1b8785947ae0e8913fd`

See more details on using hashes here.

File details

Details for the file llmstruct-0.1.1-py3-none-any.whl.

File metadata

Download URL: llmstruct-0.1.1-py3-none-any.whl
Upload date: Jun 9, 2025
Size: 6.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.27

File hashes

Hashes for llmstruct-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3632c8e69157f7fc316c63d0592f46f3b6c6324975ec2071dc0c7e1cb2a7899d`
MD5	`845b4589699833a765989e3c7854a3b0`
BLAKE2b-256	`1fdeea0bccf7dd2c521f84280710b177fcadb76df4f10295441fd0ab1e1de2ac`

See more details on using hashes here.

llmstruct 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LLMStruct

Motivation

Features

Installation

Usage

Demos

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes