OpenAI and Azure OpenAI provider plugin for LangExtract
Project description
LangExtract OpenAI Plugin
OpenAI and Azure OpenAI provider plugin for LangExtract that enables structured data extraction using OpenAI's language models.
Features
- OpenAI Support: Direct integration with OpenAI's API
- Azure OpenAI Support: Full support for Azure OpenAI deployments
- Structured Output: JSON and YAML format support
- Parallel Processing: Efficient batch processing with configurable concurrency
- Plugin System: Seamless integration with LangExtract's provider system
Structure
langextract-openai/
├── pyproject.toml # Package configuration and metadata
├── README.md # This file
├── langextract_openai/ # Package directory
│ ├── __init__.py # Package initialization and exports
│ └── openai_providers.py # OpenAI and Azure OpenAI providers
├── examples/ # Usage examples
│ └── usage_examples.ipynb # Jupyter notebook with examples
└── LICENSE
Provider Implementations
- OpenAI (
OpenAILanguageModel): Direct OpenAI API integration - Azure OpenAI (
AzureOpenAILanguageModel): Azure OpenAI service integration (inherits OpenAI implementation)
Package Configuration (pyproject.toml)
[project.entry-points."langextract.providers"]
openai = "langextract_openai:OpenAILanguageModel"
azure_openai = "langextract_openai:AzureOpenAILanguageModel"
This entry point allows LangExtract to automatically discover your provider.
Installation
Prerequisites
First install the latest LangExtract from source:
git clone https://github.com/google/langextract.git
cd langextract
pip install -e .
Install Plugin
# Clone this plugin
git clone <this-repo-url>
cd langextract-openai
# Install in development mode
pip install -e .
# Run the example notebook
# (ensure you have Jupyter installed: pip install jupyter)
jupyter notebook examples/usage_examples.ipynb
Quick Start
OpenAI
import langextract as lx
# Extract structured data with OpenAI
result = lx.extract(
text_or_documents="John Smith is a software engineer at Tech Corp.",
model_id="gpt-4o-mini",
api_key="your-openai-api-key",
prompt_description="Extract person's name, job title, and company",
examples=[{
"input": "Jane Doe works as a data scientist at DataCorp.",
"output": {"name": "Jane Doe", "job_title": "data scientist", "company": "DataCorp"}
}]
)
# Tip: If there are multiple providers matching your model_id in your environment,
# you can disambiguate by explicitly specifying the provider name:
# result = lx.extract(
# text_or_documents="...",
# model_id="gpt-4o-mini",
# api_key="...",
# provider="OpenAILanguageModel",
# prompt_description="...",
# )
Azure OpenAI
import langextract as lx
# Extract with Azure OpenAI
result = lx.extract(
text_or_documents="John Smith is a software engineer at Tech Corp.",
model_id="azure:your-deployment-name",
api_key="your-azure-api-key",
azure_endpoint="https://your-resource.openai.azure.com",
prompt_description="Extract person's name, job title, and company",
)
Environment Variables
Set these environment variables for the examples and tests:
OPENAI_API_KEY: Your OpenAI API keyAZURE_OPENAI_API_KEY: Your Azure OpenAI API keyAZURE_OPENAI_ENDPOINT: Your Azure OpenAI endpoint URLAZURE_OPENAI_DEPLOYMENT: Your Azure deployment name (optional, defaults to 'gpt-4o-mini')
Release
-
Bump version in
pyproject.tomlunder[project] version. -
Build and upload to PyPI:
python -m pip install --upgrade build twine
rm -rf dist build *.egg-info
python -m build
twine upload dist/*
Optional: Upload to TestPyPI first:
twine upload --repository testpypi dist/*
Optional: Tag the release in git:
git tag vX.Y.Z
git push origin vX.Y.Z
Notes:
- Use a PyPI API token (username:
__token__, password: your token), or configure~/.pypirc. - Ensure you have a clean tree and tests/examples pass before publishing.
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langextract_openai-0.0.2.tar.gz.
File metadata
- Download URL: langextract_openai-0.0.2.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b216949b77d56c000ecb2f408be9757dc88808880bdd603af7bfb5980f4d13c
|
|
| MD5 |
603a3dabadbf505aabb84fae107a1e35
|
|
| BLAKE2b-256 |
2d7fc136cf3851e52e98c30be648a3ca82c30db03a4f56c88cd32725be6193ce
|
File details
Details for the file langextract_openai-0.0.2-py3-none-any.whl.
File metadata
- Download URL: langextract_openai-0.0.2-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d912ce2803fc6c0af35d901c8d4f7abcacf907917acff4135fe6bd4db9746cfa
|
|
| MD5 |
ad8de8074218151a396647c24fb958d6
|
|
| BLAKE2b-256 |
52593682f78c95bc2065f816a85eafdd3f4778f0b9ef3c99dfbcb083a8bf2160
|