Skip to main content

llama-index readers nougat_ocr integration

Project description

Nougat OCR loader

pip install llama-index-readers-nougat-ocr

This loader reads the equations, symbols, and tables included in the PDF.

Users can input the path of the academic PDF document file which they want to parse. This OCR understands LaTeX math and tables.

Usage

Here's an example usage of the PDFNougatOCR.

from llama_index.readers.nougat_ocr import PDFNougatOCR

reader = PDFNougatOCR()

pdf_path = Path("/path/to/pdf")

documents = reader.load_data(pdf_path)

Miscellaneous

An output folder will be created with the same name as the pdf and .mmd extension.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_nougat_ocr-0.3.0.tar.gz (2.5 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file llama_index_readers_nougat_ocr-0.3.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_nougat_ocr-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b0e011f203986821b720220fe179064adc6799bd542c27a2b60249089eb59219
MD5 b719e5e17691d46790b1e49eaf24c703
BLAKE2b-256 d833cf96b42063da0427cb9e96ac1f4ae5626f9f4177343cd8a0cffb2efa2522

See more details on using hashes here.

File details

Details for the file llama_index_readers_nougat_ocr-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_nougat_ocr-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e43cf1fc51e4c9b74ce1fd1a764596e1608cfbbadf389f14e1fb3b619d09ed4
MD5 c70ddbeec17c9c32779854b280fbb60b
BLAKE2b-256 26e2c567ea440d8318514154d8334bda0f1867a8abae07e2004c61134673a262

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page