ollamaocr with Llama vision
Project description
Ollama OCR
A Python-based OCR tool leveraging the Llama 3.2-Vision model for highly accurate text recognition from images, preserving original formatting and structure.
Features
- 🚀 High Accuracy: Text recognition powered by the Llama 3.2-Vision model.
- 📝 Preserves Formatting: Maintains the original structure and layout of the recognized text.
- 🖼️ Wide Format Support: Works with image formats such as
.jpg,.jpeg, and.png. - ⚡️ Customizable Output: Returns results in either Markdown or JSON format.
- 💪 Robust Error Handling: Ensures smooth processing with clear error messages for unsupported formats or invalid configurations.
System Requirements
- Python 3.8 or higher
- Ollama Server running locally
- Llama 3.2-Vision model installed
Prerequisites
- Ensure the Ollama server is running before using the tool.
- Download and configure the Llama 3.2-Vision model for OCR tasks.
Instalation
pip install ollamaocr-python
Usage
Basic Usage
from ollamaocr import OllamaOCR
# Initialize the OCR tool
ocr = OllamaOCR()
# Perform OCR in Markdown format
markdown_result = ocr.perform_ocr("path/to/image.jpg", output_format="markdown")
print(markdown_result)
# Perform OCR in JSON format
json_result = ocr.perform_ocr("path/to/image.jpg", output_format="json")
print(json_result)
Error Handling
The class provides comprehensive error handling for unsupported formats or invalid configurations:
from ollamaocr import OllamaOCR
ocr = OllamaOCR()
try:
result = ocr.perform_ocr("invalid_file.bmp", output_format="markdown")
except ValueError as e:
print(f"Error: {e}")
Customizable Prompts
Modify the prompts used for OCR to suit specific requirements:
- Markdown Prompt: Preserves formatting in Markdown structure.
- JSON Prompt: Outputs results in JSON format.
Limitations
Currently supports only .jpg, .jpeg, and .png image formats. Requires the Ollama server to be running locally with the Llama 3.2-Vision model installed.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ollamaocr-python-0.1.1.tar.gz.
File metadata
- Download URL: ollamaocr-python-0.1.1.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c83c5686d782a343b78fda0838861b3c1e14af6805c1fc74371686c7edc7bbcf
|
|
| MD5 |
ae5e3a4b9641d06e918748de73d9fe25
|
|
| BLAKE2b-256 |
2f5b4d17e9b3f8921c605d0c38c0d5fa7592ffc9f8f8ecc8863fddeb41f3e207
|
File details
Details for the file ollamaocr_python-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ollamaocr_python-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd40a1fcd4e3ffdeafdaffa2bcce25b89c20a32e26b098fb5b402e52c61b20b2
|
|
| MD5 |
3410c30c44caa56ce52e59ef1ec68957
|
|
| BLAKE2b-256 |
11acb55206e7ef554197e2d6a6a14662c3384911bfc662bdd05c117883868380
|