Open-source framework that extracts structured data from unstructured data.
Project description
OpenXtract
Turn documents into structured data
Open-source toolkit for extracting clean, structured data from text, images, and PDFs.
Installation
pip install open-xtract
# or
uv add open-xtract
Usage
The model string should look like: <provider>:<model_string>
Ex. "openai:gpt-5-nano", "xai:grok-4"
from pydantic import BaseModel
from open_xtract import OpenXtract
class InvoiceData(BaseModel):
invoice_number: str
date: str
total_amount: float
vendor: str
ox = OpenXtract(model="openai:gpt-5-nano") # or any model
# Extract from text (str)
result = ox.extract("Total: $123.45 on 2025-03-01 from ACME", InvoiceData)
print(result)
# Extract from image (bytes)
with open("/path/to/receipt.png", "rb") as f:
img_bytes = f.read()
result = ox.extract(img_bytes, InvoiceData)
print(result)
# Extract from PDF (bytes) — each page is rendered to an image internally
with open("/path/to/invoice.pdf", "rb") as f:
pdf_bytes = f.read()
result = ox.extract(pdf_bytes, InvoiceData)
print(result)
Advanced Features
Model Configuration
# Use any OpenAI-compatible model
ox = OpenXtract(model="openrouter:qwen/qwen3-max")
ox = OpenXtract(model="xai:grok-4")
Features
- Extract structured data from text
- Model-agnostic (works with any OpenAI-compatible API)
- Simple, clean API
Contributing
See CONTRIBUTING.md for contribution guidelines.
License
MIT - see LICENSE.
Built with ❤️ by Mellow AI
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file open_xtract-0.1.2.tar.gz.
File metadata
- Download URL: open_xtract-0.1.2.tar.gz
- Upload date:
- Size: 115.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
933c3457809217bb7b38086b4f2d8c061cd5d34df04543b50f231114122dda88
|
|
| MD5 |
a75f7948377aeb698776e6dabf992623
|
|
| BLAKE2b-256 |
0397bf27542415625407a6a0cd5f3824a0874c6b29907a2d4f96adf86bedb439
|
File details
Details for the file open_xtract-0.1.2-py3-none-any.whl.
File metadata
- Download URL: open_xtract-0.1.2-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f06e257eef77b74f2c345ca75196502f6849d163b388f977d224fd0e96016d5f
|
|
| MD5 |
29208832836e2898d522588441d8460e
|
|
| BLAKE2b-256 |
a09d404385817a7e4695b052f2b220b046c217714ce0045e3cfa18d3cc6731ee
|