Skip to main content

llama-index readers nougat_ocr integration

Project description

Nougat OCR loader

This loader reads the equations, symbols, and tables included in the PDF.

Users can input the path of the academic PDF document file which they want to parse. This OCR understands LaTeX math and tables.

Usage

Here's an example usage of the PDFNougatOCR.

from llama_hub.nougat_ocr import PDFNougatOCR

reader = PDFNougatOCR()

pdf_path = Path("/path/to/pdf")

documents = reader.load_data(pdf_path)

Miscellaneous

An output folder will be created with the same name as the pdf and .mmd extension.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_nougat_ocr-0.1.3.tar.gz (2.4 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page