Skip to main content

ollamaocr with Llama vision

Project description

Ollama OCR

A Python-based OCR tool leveraging the Llama 3.2-Vision model for highly accurate text recognition from images, preserving original formatting and structure.

Features

  • 🚀 High Accuracy: Text recognition powered by the Llama 3.2-Vision model.
  • 📝 Preserves Formatting: Maintains the original structure and layout of the recognized text.
  • 🖼️ Wide Format Support: Works with image formats such as .jpg, .jpeg, and .png.
  • ⚡️ Customizable Output: Returns results in either Markdown or JSON format.
  • 💪 Robust Error Handling: Ensures smooth processing with clear error messages for unsupported formats or invalid configurations.

System Requirements

  • Python 3.8 or higher
  • Ollama Server running locally
  • Llama 3.2-Vision model installed

Prerequisites

  1. Ensure the Ollama server is running before using the tool.
  2. Download and configure the Llama 3.2-Vision model for OCR tasks.

Instalation

pip install ollamaocr-python

Usage

Basic Usage

from ollamaocr import OllamaOCR

# Initialize the OCR tool
ocr = OllamaOCR()

# Perform OCR in Markdown format
markdown_result = ocr.perform_ocr("path/to/image.jpg", output_format="markdown")
print(markdown_result)

# Perform OCR in JSON format
json_result = ocr.perform_ocr("path/to/image.jpg", output_format="json")
print(json_result)

Error Handling

The class provides comprehensive error handling for unsupported formats or invalid configurations:

from ollamaocr import OllamaOCR

ocr = OllamaOCR()

try:
    result = ocr.perform_ocr("invalid_file.bmp", output_format="markdown")
except ValueError as e:
    print(f"Error: {e}")

Customizable Prompts

Modify the prompts used for OCR to suit specific requirements:

  • Markdown Prompt: Preserves formatting in Markdown structure.
  • JSON Prompt: Outputs results in JSON format.

Limitations

Currently supports only .jpg, .jpeg, and .png image formats. Requires the Ollama server to be running locally with the Llama 3.2-Vision model installed.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollamaocr-python-0.1.1.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollamaocr_python-0.1.1-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file ollamaocr-python-0.1.1.tar.gz.

File metadata

  • Download URL: ollamaocr-python-0.1.1.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.3

File hashes

Hashes for ollamaocr-python-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c83c5686d782a343b78fda0838861b3c1e14af6805c1fc74371686c7edc7bbcf
MD5 ae5e3a4b9641d06e918748de73d9fe25
BLAKE2b-256 2f5b4d17e9b3f8921c605d0c38c0d5fa7592ffc9f8f8ecc8863fddeb41f3e207

See more details on using hashes here.

File details

Details for the file ollamaocr_python-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ollamaocr_python-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd40a1fcd4e3ffdeafdaffa2bcce25b89c20a32e26b098fb5b402e52c61b20b2
MD5 3410c30c44caa56ce52e59ef1ec68957
BLAKE2b-256 11acb55206e7ef554197e2d6a6a14662c3384911bfc662bdd05c117883868380

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page