Extract structured data from text using LLMs with ready-to-use templates

These details have not been verified by PyPI

Project description

🧑‍🍳 Structured Output Cookbook

Python License

A powerful Python library and CLI tool for extracting structured data from unstructured text using Large Language Models (LLMs). Transform raw text into clean, validated JSON with predefined templates or custom YAML schemas.

✨ Features

🎯 Predefined Templates: Built-in schemas for common use cases (job descriptions, recipes, etc.)
📝 Custom YAML Schemas: Define your own extraction schemas with simple YAML files
🔧 CLI Interface: Easy-to-use command-line tool for batch processing
🐍 Python API: Programmatic access for integration into your applications
📊 Token Tracking: Monitor API usage and costs
🧪 Schema Validation: Ensure your custom schemas are properly structured
📁 Auto-organized Output: Automatic timestamped file organization

🚀 Quick Start

Using uv (Recommended)

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and install
git clone https://github.com/mazzasaverio/structured-output-cookbook.git
cd structured-output-cookbook
uv sync

# Set your OpenAI API key
export OPENAI_API_KEY="your-api-key-here"

# Run your first extraction
uv run structured-output extract recipe --text "Pasta with tomato sauce: boil pasta, add sauce, serve hot"

Using pip

pip install structured-output-cookbook
export OPENAI_API_KEY="your-api-key-here"
structured-output extract recipe --text "Your recipe text here"

Using Docker

# Build the image
docker build -t structured-output-cookbook .

# Run with your API key
docker run --rm \
  -e OPENAI_API_KEY="your-api-key-here" \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/config:/app/config \
  structured-output-cookbook \
  extract recipe --text "Pasta with tomato sauce: boil pasta, add sauce, serve hot"

📖 Usage

CLI Commands

# List available predefined templates
structured-output list-templates

# List custom YAML schemas
structured-output list-schemas

# Extract using predefined templates
structured-output extract recipe --input-file examples/recipe.txt
structured-output extract job --text "Software Engineer position at Tech Corp..."

# Extract using custom YAML schemas
structured-output extract-custom news_article --input-file examples/news_article.txt

# Options
structured-output extract recipe \
  --input-file examples/recipe.txt \
  --output my_recipe.json \
  --pretty \
  --no-save

Python API

from structured_output_cookbook import StructuredExtractor, RecipeSchema
from structured_output_cookbook.config import Config

# Initialize
config = Config.from_env()
extractor = StructuredExtractor(config)

# Extract with predefined template
text = "Spaghetti Carbonara: Cook pasta, fry pancetta, mix with eggs..."
result = extractor.extract(text, RecipeSchema)

if result.success:
    print(f"Recipe: {result.data['name']}")
    print(f"Servings: {result.data['servings']}")
else:
    print(f"Error: {result.error}")

# Extract with custom YAML schema
from structured_output_cookbook.utils import SchemaLoader

loader = SchemaLoader("config/schemas")
news_schema = loader.load_schema("news_article")
result = extractor.extract_with_yaml_schema(news_text, news_schema)

🎨 Creating Custom Schemas

Create YAML files in the config/schemas/ directory:

# config/schemas/product_review.yaml
name: "Product Review"
description: "Extract structured information from product reviews"

system_prompt: |
  Extract structured information from the following product review.
  Focus on identifying the product name, rating, pros, cons, and overall sentiment.

schema:
  type: object
  properties:
    product_name:
      type: string
      description: "Name of the product being reviewed"
    rating:
      type: number
      minimum: 1
      maximum: 5
      description: "Rating from 1 to 5 stars"
    pros:
      type: array
      items:
        type: string
      description: "Positive aspects mentioned"
    cons:
      type: array
      items:
        type: string
      description: "Negative aspects mentioned"
    sentiment:
      type: string
      enum: ["positive", "negative", "neutral"]
      description: "Overall sentiment"
  required: ["product_name", "rating", "sentiment"]

🐳 Docker Usage

Development with Docker

# Build development image
docker build -t structured-output-cookbook:dev .

# Run interactive shell
docker run -it --rm \
  -e OPENAI_API_KEY="your-api-key" \
  -v $(pwd):/app \
  structured-output-cookbook:dev \
  /bin/bash

# Run specific command
docker run --rm \
  -e OPENAI_API_KEY="your-api-key" \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/config:/app/config \
  structured-output-cookbook:dev \
  list-templates

Production Deployment

# For production, mount only necessary volumes
docker run -d \
  --name structured-output-service \
  -e OPENAI_API_KEY="your-api-key" \
  -v /path/to/data:/app/data \
  -v /path/to/schemas:/app/config/schemas \
  structured-output-cookbook:latest

🔧 Configuration

Environment Variables

# Required
export OPENAI_API_KEY="your-openai-api-key"

# Optional
export OPENAI_MODEL="gpt-4o-mini"  # Default model
export LOG_LEVEL="INFO"            # Logging level
export MAX_TOKENS=4000            # Response token limit
export TEMPERATURE=0.1            # Model temperature

Configuration File

Create a .env file in your project root:

OPENAI_API_KEY=your-api-key-here
OPENAI_MODEL=gpt-4o-mini
LOG_LEVEL=INFO
MAX_TOKENS=4000
TEMPERATURE=0.1

📊 Examples

Check out the examples/ directory for sample inputs and usage patterns:

examples/recipe.txt - Recipe extraction example
examples/job_description.txt - Job posting extraction
examples/news_article.txt - News article analysis
examples/example_usage.py - Python API examples
examples/usage_examples.ipynb - Jupyter notebook with detailed examples

🧪 Testing

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src/structured_output_cookbook

# Run specific test file
uv run pytest tests/unit/test_extractor.py

# Run integration tests
uv run pytest tests/integration/

🛠️ Development

# Install development dependencies
uv sync --all-extras

# Run linting
uv run ruff check .
uv run black --check .
uv run mypy src/

# Format code
uv run black .
uv run ruff --fix .

# Install pre-commit hooks
uv run pre-commit install

📈 Performance Tips

Batch Processing: Process multiple files in sequence for better efficiency
Model Selection: Use gpt-4o-mini for cost-effective extraction
Schema Optimization: Keep schemas focused and avoid unnecessary fields
Caching: Results are automatically saved with timestamps for reference

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with uv for fast dependency management
Powered by OpenAI's language models
Inspired by the need for reliable structured data extraction

📚 Related Projects

Instructor - Structured outputs with function calling
Marvin - AI toolkit for building reliable AI-powered software
Outlines - Structured generation for LLMs

Made with ❤️ by Saverio Mazza

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

Jun 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structured_output_cookbook-0.1.2.tar.gz (196.6 kB view details)

Uploaded Jun 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

structured_output_cookbook-0.1.2-py3-none-any.whl (28.6 kB view details)

Uploaded Jun 29, 2025 Python 3

File details

Details for the file structured_output_cookbook-0.1.2.tar.gz.

File metadata

Download URL: structured_output_cookbook-0.1.2.tar.gz
Upload date: Jun 29, 2025
Size: 196.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for structured_output_cookbook-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`ed78889e5b49b33ddec4763ed16538c25e49cafc2e67f7e7aa49b4c269d69df7`
MD5	`73cfda0e15f18fffd80c420951b22342`
BLAKE2b-256	`3338b23a3ce96607df067d89b1c341868c6531ccc020688edc58ab7e917969fa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for structured_output_cookbook-0.1.2.tar.gz:

Publisher: release.yml on mazzasaverio/structured-output-cookbook

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: structured_output_cookbook-0.1.2.tar.gz
- Subject digest: ed78889e5b49b33ddec4763ed16538c25e49cafc2e67f7e7aa49b4c269d69df7
- Sigstore transparency entry: 255458565
- Sigstore integration time: Jun 29, 2025
Source repository:
- Permalink: mazzasaverio/structured-output-cookbook@a1058ca1de789b33b4b2a6f0c521899205b9d200
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/mazzasaverio
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a1058ca1de789b33b4b2a6f0c521899205b9d200
- Trigger Event: push

File details

Details for the file structured_output_cookbook-0.1.2-py3-none-any.whl.

File metadata

Download URL: structured_output_cookbook-0.1.2-py3-none-any.whl
Upload date: Jun 29, 2025
Size: 28.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for structured_output_cookbook-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2020672aba0579d8421cfad96ed4d2c1a367e38fc0bd0e85f1a0698bff65ef41`
MD5	`3e4565756ce2a0fc37cf8766257a8ac1`
BLAKE2b-256	`7b6da9f2e253e422b16e96cb1f2c34902c33076013e6c39c3a331adb79b8641a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for structured_output_cookbook-0.1.2-py3-none-any.whl:

Publisher: release.yml on mazzasaverio/structured-output-cookbook

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: structured_output_cookbook-0.1.2-py3-none-any.whl
- Subject digest: 2020672aba0579d8421cfad96ed4d2c1a367e38fc0bd0e85f1a0698bff65ef41
- Sigstore transparency entry: 255458568
- Sigstore integration time: Jun 29, 2025
Source repository:
- Permalink: mazzasaverio/structured-output-cookbook@a1058ca1de789b33b4b2a6f0c521899205b9d200
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/mazzasaverio
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a1058ca1de789b33b4b2a6f0c521899205b9d200
- Trigger Event: push

structured-output-cookbook 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🧑‍🍳 Structured Output Cookbook

✨ Features

🚀 Quick Start

Using uv (Recommended)

Using pip

Using Docker

📖 Usage

CLI Commands

Python API

🎨 Creating Custom Schemas

🐳 Docker Usage

Development with Docker

Production Deployment

🔧 Configuration

Environment Variables

Configuration File

📊 Examples

🧪 Testing

🛠️ Development

📈 Performance Tips

🤝 Contributing

📄 License

🙏 Acknowledgments

📚 Related Projects

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance