9 projects
pdftext
Extract structured text from pdfs quickly
lift-pdf
Extract structured data from PDFs and images by passing a schema.
surya-ocr
OCR, layout, reading order, and table recognition in 90+ languages.
datalab-python-sdk
SDK for the Datalab document intelligence API
chandra-ocr
OCR model that converts documents to markdown, HTML, or JSON.
marker-pdf
Convert documents to markdown with high speed and accuracy.
tabled-pdf
Detect and recognize tables in PDFs and images.
texify
OCR for latex images
streamlit-drawable-canvas-jsretry
A Streamlit custom component for a free drawing canvas using Fabric.js. A fork to enable retrying for bg images.