Skip to main content

llama-index readers nougat_ocr integration

Project description

Nougat OCR loader

pip install llama-index-readers-nougat-ocr

This loader reads the equations, symbols, and tables included in the PDF.

Users can input the path of the academic PDF document file which they want to parse. This OCR understands LaTeX math and tables.

Usage

Here's an example usage of the PDFNougatOCR.

from llama_index.readers.nougat_ocr import PDFNougatOCR

reader = PDFNougatOCR()

pdf_path = Path("/path/to/pdf")

documents = reader.load_data(pdf_path)

Miscellaneous

An output folder will be created with the same name as the pdf and .mmd extension.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_nougat_ocr-0.2.0.tar.gz (2.5 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file llama_index_readers_nougat_ocr-0.2.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_nougat_ocr-0.2.0.tar.gz
Algorithm Hash digest
SHA256 979a097999a5e03c80deeb5afd088af6bf6eeb2e3eed246853c006b832a10c6e
MD5 a36698b408e05059151234293fb37d59
BLAKE2b-256 077cc574f8baa60be6f1858ee66cb35a51bdad80486eb79b9266937326d0a3c0

See more details on using hashes here.

File details

Details for the file llama_index_readers_nougat_ocr-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_nougat_ocr-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d709f6bd465819dff0138e92f0d75f6069fa1e401cab1c2b0f0799da32be6e26
MD5 690a21806edd10a67e471484bd02f447
BLAKE2b-256 5394876ada55237874b5dd022f2f66382b145082136c9903a5297d7e631e7228

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page