Skip to main content

Open-source framework that extracts structured data from PDFs. Bring your own OCR or LLM and extend to any file type.

Project description

Open Xtract

Open-source framework that extracts structured data from PDFs. Bring your own OCR or LLM and extend to any file type.

Features

  • Model-agnostic – simple adapter API works with any OCR engine or large language model.
  • PDF-first ingestion – layout-aware parsing produces clean, tokenized text.
  • Cited retrieval – vector search with reranked answers and inline citations.

Installation

pip install open-xtract

Quick Start

from open_xtract import main

main()  # prints a greeting for now

CLI

open-xtract

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

open_xtract-0.1.0.tar.gz (1.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

open_xtract-0.1.0-py3-none-any.whl (1.8 kB view details)

Uploaded Python 3

File details

Details for the file open_xtract-0.1.0.tar.gz.

File metadata

  • Download URL: open_xtract-0.1.0.tar.gz
  • Upload date:
  • Size: 1.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for open_xtract-0.1.0.tar.gz
Algorithm Hash digest
SHA256 efebc0862977cf24d0027bcb4cae3a2f4f7a9a07b62fc3ddc0fb9924a2929b6a
MD5 787858065c51b0b2442224d77c504262
BLAKE2b-256 3e4269c02aea8b17ec3129ce1cbc9f47c6224bea41d1175338c1c92c9b824a66

See more details on using hashes here.

File details

Details for the file open_xtract-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: open_xtract-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for open_xtract-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1748cba899c30a835e5e6b90a79ba28cc123f27e15f0c79945e4b840a09f73c3
MD5 43c46f35db80171a341e3c10b6e931a4
BLAKE2b-256 d1777bdaba6d4c5bc4a0ecd329124266714635ff10a0bf666ebef0d571101bc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page