Open-source framework that extracts structured data from PDFs. Bring your own OCR or LLM and extend to any file type.
Project description
Open Xtract
Open-source framework that extracts structured data from PDFs. Bring your own OCR or LLM and extend to any file type.
Features
- Model-agnostic – simple adapter API works with any OCR engine or large language model.
- PDF-first ingestion – layout-aware parsing produces clean, tokenized text.
- Cited retrieval – vector search with reranked answers and inline citations.
Installation
pip install open-xtract
Quick Start
from open_xtract import main
main() # prints a greeting for now
CLI
open-xtract
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
open_xtract-0.1.0.tar.gz
(1.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file open_xtract-0.1.0.tar.gz.
File metadata
- Download URL: open_xtract-0.1.0.tar.gz
- Upload date:
- Size: 1.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efebc0862977cf24d0027bcb4cae3a2f4f7a9a07b62fc3ddc0fb9924a2929b6a
|
|
| MD5 |
787858065c51b0b2442224d77c504262
|
|
| BLAKE2b-256 |
3e4269c02aea8b17ec3129ce1cbc9f47c6224bea41d1175338c1c92c9b824a66
|
File details
Details for the file open_xtract-0.1.0-py3-none-any.whl.
File metadata
- Download URL: open_xtract-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1748cba899c30a835e5e6b90a79ba28cc123f27e15f0c79945e4b840a09f73c3
|
|
| MD5 |
43c46f35db80171a341e3c10b6e931a4
|
|
| BLAKE2b-256 |
d1777bdaba6d4c5bc4a0ecd329124266714635ff10a0bf666ebef0d571101bc0
|