Skip to main content

ollamaocr with Llama vision

Project description

Ollama OCR

A Python-based OCR tool leveraging the Llama 3.2-Vision model for highly accurate text recognition from images, preserving original formatting and structure.

Features

  • 🚀 High Accuracy: Text recognition powered by the Llama 3.2-Vision model.
  • 📝 Preserves Formatting: Maintains the original structure and layout of the recognized text.
  • 🖼️ Wide Format Support: Works with image formats such as .jpg, .jpeg, and .png.
  • ⚡️ Customizable Output: Returns results in either Markdown or JSON format.
  • 💪 Robust Error Handling: Ensures smooth processing with clear error messages for unsupported formats or invalid configurations.

System Requirements

  • Python 3.8 or higher
  • Ollama Server running locally
  • Llama 3.2-Vision model installed

Prerequisites

  1. Ensure the Ollama server is running before using the tool.
  2. Download and configure the Llama 3.2-Vision model for OCR tasks.

Instalation

pip install ollamaocr-python

Usage

Basic Usage

from ollamaocr import OllamaOCR

# Initialize the OCR tool
ocr = OllamaOCR()

# Perform OCR in Markdown format
markdown_result = ocr.perform_ocr("path/to/image.jpg", output_format="markdown")
print(markdown_result)

# Perform OCR in JSON format
json_result = ocr.perform_ocr("path/to/image.jpg", output_format="json")
print(json_result)

Error Handling

The class provides comprehensive error handling for unsupported formats or invalid configurations:

from ollamaocr import OllamaOCR

ocr = OllamaOCR()

try:
    result = ocr.perform_ocr("invalid_file.bmp", output_format="markdown")
except ValueError as e:
    print(f"Error: {e}")

Customizable Prompts

Modify the prompts used for OCR to suit specific requirements:

  • Markdown Prompt: Preserves formatting in Markdown structure.
  • JSON Prompt: Outputs results in JSON format.

Limitations

Currently supports only .jpg, .jpeg, and .png image formats. Requires the Ollama server to be running locally with the Llama 3.2-Vision model installed.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollamaocr-python-0.1.0.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollamaocr_python-0.1.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file ollamaocr-python-0.1.0.tar.gz.

File metadata

  • Download URL: ollamaocr-python-0.1.0.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.3

File hashes

Hashes for ollamaocr-python-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2ae018778ec97ede97b5e6d24f047f23fb6eab4330c7e941567b58f5dd566fda
MD5 bd8f5cc8763dad7667aef679bf2754dc
BLAKE2b-256 d2eabbbb172bc4751cc6babf196002ee57a17cae1a0b1c6df86f37649a68ace8

See more details on using hashes here.

File details

Details for the file ollamaocr_python-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ollamaocr_python-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e72055ec8e4cb7eff18b782cdf18e697af8abee88c3c6894deaa72540bf8365a
MD5 38c85f954fd7489a34b718b991203666
BLAKE2b-256 81eb7b6834e0b2033ee86a8796fbdceaecee412f0ca6aa1827af031e6424e9d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page