AI-powered PDF to HTML conversion using mistral-ocr and pixtral-12b
Project description
Mistral OCR Processor
A command-line tool for processing PDF documents with Mistral OCR and generating accessible HTML output. This tool is adapted from a Google Colab notebook to run locally on your machine.
Quick Start
Several scripts are provided to help you get started:
mistral_ocr.py- The main script for processing PDFsexample.py- An interactive example that guides you through the optionstest_local_pdf.py- A test script for processing a local PDF filetest_url_pdf.py- A test script for processing a PDF from a URL
To run the interactive example:
python example.py
To test with a local PDF:
python test_local_pdf.py
To test with a PDF from a URL:
python test_url_pdf.py
Features
- Process local PDF files or download from URLs
- OCR processing with Mistral OCR
- Generate accessible alt text for images using Pixtral 12B
- Convert to WCAG-compliant accessible HTML
- Enhance tables with proper accessibility features
- Save output as HTML file
- Option to open result in browser
Requirements
- Python 3.6+
- Mistral API key
- Required Python packages:
- mistralai
- requests
Installation
-
Clone the repository:
git clone https://github.com/mystique920/ai-powered-pdf2html cd ai-powered-pdf2html -
Ensure you have Python installed
-
Install required packages using pip:
pip install -r requirements.txtOr install packages individually:
pip install mistralai requests -
Make the script executable (optional):
chmod +x mistral_ocr.py
Usage
Basic Usage
Process a local PDF file:
python mistral_ocr.py --file path/to/your/document.pdf
Process a PDF from a URL:
python mistral_ocr.py --url https://example.com/document.pdf
Command-line Options
--file,-f: Path to local PDF file--url,-u: URL to PDF file--api-key,-k: Mistral API key--output,-o: Output HTML file path (default: output.html)--max-images,-m: Maximum number of images to process (default: all)--open-browser,-b: Open the output HTML in browser after processing
Examples
Process a local file with a custom API key and open in browser:
python mistral_ocr.py --file document.pdf --api-key YOUR_API_KEY --open-browser
Process a PDF from URL and save to a custom output file:
python mistral_ocr.py --url https://example.com/document.pdf --output result.html
Process a file but limit image processing to 5 images:
python mistral_ocr.py --file document.pdf --max-images 5
API Key Setup
The script requires a valid Mistral API key to function. There are two ways to provide the API key:
-
Create a
.envfile in the same directory as the script with the following content:MISTRAL_API_KEY=your_api_key_here -
Provide the API key directly using the
--api-keycommand-line argument:python mistral_ocr.py --file document.pdf --api-key your_api_key_here
Notes
- Processing large PDFs may take some time
- Image alt text generation uses the Pixtral 12B model
- The HTML output is designed to be WCAG-compliant for accessibility
- You can limit the number of images processed to save API usage
- The code for this application is based on a public Google Colab notebook
- The original code is from this repository: https://github.com/coldplazma/Accessible-OCR-Mistral-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2html_ai-0.1.0.tar.gz.
File metadata
- Download URL: pdf2html_ai-0.1.0.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f901077a88d994144bc1a20d0786aee7a021a5f18ece1d6c84d356e4ff89fd3
|
|
| MD5 |
44a07e0b04b81f0812751c4b2413796b
|
|
| BLAKE2b-256 |
b19f7e55d16b226d26b4e62b06d5b443579b146daec4c869647f52abfe80dc54
|
File details
Details for the file pdf2html_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pdf2html_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a38561e428e441cc02c0d9474018273ded7e3192f10c1d7fc1cc21ea25157f61
|
|
| MD5 |
d937f130be10845b044586fb1fb0efb7
|
|
| BLAKE2b-256 |
1be3f89c9422e226c30f63175f34fd0e3086f3af7abcf1f866ccc03399c352f7
|