Skip to main content

llama-index readers nougat_ocr integration

Project description

Nougat OCR loader

pip install llama-index-readers-nougat-ocr

This loader reads the equations, symbols, and tables included in the PDF.

Users can input the path of the academic PDF document file which they want to parse. This OCR understands LaTeX math and tables.

Usage

Here's an example usage of the PDFNougatOCR.

from llama_index.readers.nougat_ocr import PDFNougatOCR

reader = PDFNougatOCR()

pdf_path = Path("/path/to/pdf")

documents = reader.load_data(pdf_path)

Miscellaneous

An output folder will be created with the same name as the pdf and .mmd extension.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_nougat_ocr-0.4.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_nougat_ocr-0.4.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_nougat_ocr-0.4.0.tar.gz
Algorithm Hash digest
SHA256 c9ee0ccc67273027ef6bd743287f4709f7c0c08adf1eb6724bf7c57d82b7d135
MD5 1b207dbc2012b08a94f3ba564893175c
BLAKE2b-256 68084eb0e37bc579cc78cc1e399a898cd7dc953ab50c3f7b84f5b1c6b31fea3a

See more details on using hashes here.

File details

Details for the file llama_index_readers_nougat_ocr-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_nougat_ocr-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e94df3799a9c802eb121008c8128464e5241a0b57dc0c9d5b3f9cac4c8cb53e3
MD5 0d82edee82f5f4bc6ec5275242d7d420
BLAKE2b-256 37574f399b389237d9dee18ef6e2bbe67e672c90defe448a043b05f8100b2de4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page