Skip to main content

OpenAI and Azure OpenAI provider plugin for LangExtract

Project description

LangExtract OpenAI Plugin

OpenAI and Azure OpenAI provider plugin for LangExtract that enables structured data extraction using OpenAI's language models.

Features

  • OpenAI Support: Direct integration with OpenAI's API
  • Azure OpenAI Support: Full support for Azure OpenAI deployments
  • Structured Output: JSON and YAML format support
  • Parallel Processing: Efficient batch processing with configurable concurrency
  • Plugin System: Seamless integration with LangExtract's provider system

Structure

langextract-openai/
├── pyproject.toml                    # Package configuration and metadata
├── README.md                         # This file
├── langextract_openai/              # Package directory
│   ├── __init__.py                  # Package initialization and exports
│   └── openai_providers.py         # OpenAI and Azure OpenAI providers
├── examples/                        # Usage examples
│   └── usage_examples.ipynb        # Jupyter notebook with examples
└── LICENSE

Provider Implementations

  • OpenAI (OpenAILanguageModel): Direct OpenAI API integration
  • Azure OpenAI (AzureOpenAILanguageModel): Azure OpenAI service integration (inherits OpenAI implementation)

Package Configuration (pyproject.toml)

[project.entry-points."langextract.providers"]
openai = "langextract_openai:OpenAILanguageModel"
azure_openai = "langextract_openai:AzureOpenAILanguageModel"

This entry point allows LangExtract to automatically discover your provider.

Installation

Prerequisites

First install the latest LangExtract from source:

git clone https://github.com/google/langextract.git
cd langextract
pip install -e .

Install Plugin

# Clone this plugin
git clone <this-repo-url>
cd langextract-openai

# Install in development mode
pip install -e .

# Run the example notebook
# (ensure you have Jupyter installed: pip install jupyter)
jupyter notebook examples/usage_examples.ipynb

Quick Start

OpenAI

import langextract as lx

# Extract structured data with OpenAI
result = lx.extract(
    text_or_documents="John Smith is a software engineer at Tech Corp.",
    model_id="gpt-4o-mini",
    api_key="your-openai-api-key",
    prompt_description="Extract person's name, job title, and company",
    examples=[{
        "input": "Jane Doe works as a data scientist at DataCorp.",
        "output": {"name": "Jane Doe", "job_title": "data scientist", "company": "DataCorp"}
    }]
)

# Tip: If there are multiple providers matching your model_id in your environment,
# you can disambiguate by explicitly specifying the provider name:
# result = lx.extract(
#     text_or_documents="...",
#     model_id="gpt-4o-mini",
#     api_key="...",
#     provider="OpenAILanguageModel",
#     prompt_description="...",
# )

Azure OpenAI

import langextract as lx

# Extract with Azure OpenAI
result = lx.extract(
    text_or_documents="John Smith is a software engineer at Tech Corp.",
    model_id="azure:your-deployment-name",
    api_key="your-azure-api-key",
    azure_endpoint="https://your-resource.openai.azure.com",
    prompt_description="Extract person's name, job title, and company",
)

Environment Variables

Set these environment variables for the examples and tests:

  • OPENAI_API_KEY: Your OpenAI API key
  • AZURE_OPENAI_API_KEY: Your Azure OpenAI API key
  • AZURE_OPENAI_ENDPOINT: Your Azure OpenAI endpoint URL
  • AZURE_OPENAI_DEPLOYMENT: Your Azure deployment name (optional, defaults to 'gpt-4o-mini')

Release

  1. Bump version in pyproject.toml under [project] version.

  2. Build and upload to PyPI:

python -m pip install --upgrade build twine
rm -rf dist build *.egg-info
python -m build
twine upload dist/*

Optional: Upload to TestPyPI first:

twine upload --repository testpypi dist/*

Optional: Tag the release in git:

git tag vX.Y.Z
git push origin vX.Y.Z

Notes:

  • Use a PyPI API token (username: __token__, password: your token), or configure ~/.pypirc.
  • Ensure you have a clean tree and tests/examples pass before publishing.

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langextract_openai-0.0.2.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langextract_openai-0.0.2-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file langextract_openai-0.0.2.tar.gz.

File metadata

  • Download URL: langextract_openai-0.0.2.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for langextract_openai-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1b216949b77d56c000ecb2f408be9757dc88808880bdd603af7bfb5980f4d13c
MD5 603a3dabadbf505aabb84fae107a1e35
BLAKE2b-256 2d7fc136cf3851e52e98c30be648a3ca82c30db03a4f56c88cd32725be6193ce

See more details on using hashes here.

File details

Details for the file langextract_openai-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for langextract_openai-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d912ce2803fc6c0af35d901c8d4f7abcacf907917acff4135fe6bd4db9746cfa
MD5 ad8de8074218151a396647c24fb958d6
BLAKE2b-256 52593682f78c95bc2065f816a85eafdd3f4778f0b9ef3c99dfbcb083a8bf2160

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page