OCR package using Ollama vision language models.

These details have not been verified by PyPI

Project links

Homepage

Project description

Ollama OCR Logo

Ollama OCR

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images and PDF. Available both as a Python package and a Streamlit web application.

🌟 Features

Supports PDF and Images (New! 🆕)

Multiple Vision Models Support
- LLaVA 7B: Efficient vision-language model for real-time processing (LLaVa model can generate wrong output sometimes)
- Llama 3.2 Vision: Advanced model with high accuracy for complex documents
- Granite3.2-vision: A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.
- Moondream: Small vision language model designed to run efficiently on edge devices.
Multiple Output Formats
- Markdown: Preserves text formatting with headers and lists
- Plain Text: Clean, simple text extraction
- JSON: Structured data format
- Structured: Tables and organized data
- Key-Value Pairs: Extracts labeled information
- Table: Extract all tabular data.
Batch Processing
- Process multiple images in parallel
- Progress tracking for each image
- Image preprocessing (resize, normalize, etc.)
Custom Prompts
- Override default prompts with custom instructions for text extraction.

📦 Package Installation

pip install ollama-ocr

🚀 Quick Start

Prerequisites

Install Ollama
Pull the required model:

ollama pull llama3.2-vision:11b
ollama pull granite3.2-vision
ollama pull moondream

Using the Package

Single File Processing

from ollama_ocr import OCRProcessor

# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b')  # You can use any vision model available on Ollama
# you can pass your custom ollama api

# Process an image
result = ocr.process_image(
    image_path="path/to/your/image.png", # path to your pdf files "path/to/your/file.pdf"
    format_type="markdown",  # Options: markdown, text, json, structured, key_value
    custom_prompt="Extract all text, focusing on dates and names.", # Optional custom prompt
    language="English" # Specify the language of the text (New! 🆕)
)
print(result)

Batch File

from ollama_ocr import OCRProcessor

# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b', max_workers=4)  # max workers for parallel processing

# Process multiple images
# Process multiple images with progress tracking
batch_results = ocr.process_batch(
    input_path="path/to/images/folder",  # Directory or list of image paths
    format_type="markdown",
    recursive=True,  # Search subdirectories
    preprocess=True,  # Enable image preprocessing
    custom_prompt="Extract all text, focusing on dates and names.", # Optional custom prompt
    language="English" # Specify the language of the text (New! 🆕)
)
# Access results
for file_path, text in batch_results['results'].items():
    print(f"\nFile: {file_path}")
    print(f"Extracted Text: {text}")

# View statistics
print("\nProcessing Statistics:")
print(f"Total images: {batch_results['statistics']['total']}")
print(f"Successfully processed: {batch_results['statistics']['successful']}")
print(f"Failed: {batch_results['statistics']['failed']}")

📋 Output Format Details

Markdown Format: The output is a markdown string containing the extracted text from the image.
Text Format: The output is a plain text string containing the extracted text from the image.
JSON Format: The output is a JSON object containing the extracted text from the image.
Structured Format: The output is a structured object containing the extracted text from the image.
Key-Value Format: The output is a dictionary containing the extracted text from the image.

🌐 Streamlit Web Application(supports batch processing)

User-Friendly Interface
- Drag-and-drop file upload
- Real-time processing
- Download extracted text
- Image preview with details
- Responsive design
- Language Selection: Specify the language for better OCR accuracy. (New! 🆕)

Clone the repository:

git clone https://github.com/imanoop7/Ollama-OCR.git
cd Ollama-OCR

Install dependencies:

pip install -r requirements.txt

Go to the directory where app.py is located:

cd src/ollama_ocr

Run the Streamlit app:

streamlit run app.py

Examples Output

Input Image

Sample Output

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with Ollama Powered by Vision Models

Star History

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.6

Mar 9, 2025

0.1.5

Mar 7, 2025

0.1.4

Mar 5, 2025

0.1.3

Dec 4, 2024

0.1.2

Dec 2, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_ocr-0.1.6.tar.gz (13.6 kB view details)

Uploaded Mar 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ollama_ocr-0.1.6-py3-none-any.whl (12.3 kB view details)

Uploaded Mar 9, 2025 Python 3

File details

Details for the file ollama_ocr-0.1.6.tar.gz.

File metadata

Download URL: ollama_ocr-0.1.6.tar.gz
Upload date: Mar 9, 2025
Size: 13.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for ollama_ocr-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`aaacc1626c62f7da90332f19968d2e846f95ca8005d8b5f4a8e6cfc3673b4913`
MD5	`ef6c4f29cd12b472edf701178dade2f8`
BLAKE2b-256	`1cd03b5effc572b473c2487997e9c16d4e2eb2f577b854e3cf0f4cb219f06e35`

See more details on using hashes here.

File details

Details for the file ollama_ocr-0.1.6-py3-none-any.whl.

File metadata

Download URL: ollama_ocr-0.1.6-py3-none-any.whl
Upload date: Mar 9, 2025
Size: 12.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for ollama_ocr-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03a13741b1b52dddfe2490c06a2dcfd98264dad2847f1b32032517261705623e`
MD5	`9514311f20e6ab9f76816be44f7c2b0c`
BLAKE2b-256	`f67e1454188304e97900ac83d47f27b7de726c69708dfc5afbe7573e9c62796e`

See more details on using hashes here.

ollama-ocr 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Ollama OCR

🌟 Features

Supports PDF and Images (New! 🆕)

📦 Package Installation

🚀 Quick Start

Prerequisites

Using the Package

Single File Processing

Batch File

📋 Output Format Details

🌐 Streamlit Web Application(supports batch processing)

Examples Output

Input Image

Sample Output

📄 License

🙏 Acknowledgments

Star History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes