Skip to main content

A package for document extraction using Ollama Vision model

Project description

llama_ocr

A Python package for document text extraction using Ollama Vision models, with a focus on OCR (Optical Character Recognition) capabilities.

Features

  • 🔍 Text extraction from images using Ollama Vision models
  • 🖼️ Advanced image preprocessing for better OCR results
  • 🛠️ Configurable settings and parameters
  • 🔧 Extensible architecture with dependency injection
  • 📝 Comprehensive logging and error handling

Installation

pip install llama_ocr

Or install from source:

git clone https://github.com/princexoleo/llama_ocr.git
cd llama_ocr
pip install -e .

Quick Start

from llama_ocr import DocumentExtractor

# Initialize extractor
extractor = DocumentExtractor()

# Extract text from an image
result = extractor.extract_from_image(
    "path/to/image.jpg",
    prompt="Extract all text from this document"
)

# Print extracted text
print(result["text"])

Advanced Usage

Custom Configuration

from llama_ocr import DocumentExtractor, OCRConfig

config = OCRConfig(
    model_name="llama3.2-vision",
    preprocess_images=True,
    optimize_for_ocr=True,
    log_level="INFO"
)

extractor = DocumentExtractor(config=config)

Image Processing Options

# Extract with OCR optimization
result = extractor.extract_from_image(
    "path/to/image.jpg",
    save_processed=True  # Save processed image for inspection
)

Architecture

The package follows SOLID principles and uses dependency injection for flexibility:

  • DocumentExtractor: Main class orchestrating the extraction process
  • VisionClient: Abstract base class for vision model interactions
  • ImagePreprocessor: Abstract base class for image processing
  • OCRConfig: Configuration management using dataclasses

Components

  1. Core Module

    • extractor.py: Main document extraction logic
    • vision_client.py: Ollama Vision API integration
    • image_processor.py: Image preprocessing utilities
    • base.py: Abstract base classes and interfaces
  2. Configuration

    • config.py: Configuration management using dataclasses

Dependencies

  • ollama>=0.1.27
  • Pillow>=10.1.0
  • python-dotenv>=1.0.0
  • opencv-python>=4.8.1.78
  • numpy>=1.21.0

Development

Setting up Development Environment

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
.\venv\Scripts\activate  # Windows

# Install development dependencies
pip install -e ".[dev]"

Running Tests

python -m unittest discover -s tests

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Thanks to the Ollama team for providing the vision models
  • Built with ❤️ by Mazharul Islam Leon

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_ocr_py-0.1.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_ocr_py-0.1.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file llama_ocr_py-0.1.0.tar.gz.

File metadata

  • Download URL: llama_ocr_py-0.1.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.3

File hashes

Hashes for llama_ocr_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3f75195157565e7f557f2e0bb293e0b64f21fd310d21c7f4a40e2065a4c1a161
MD5 19513278ed41ae63896919adcd53abf5
BLAKE2b-256 8bd86127f444ca4c9759bb8ed2bcb44f869cf017f65b7e00614342ff9bae0a09

See more details on using hashes here.

File details

Details for the file llama_ocr_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llama_ocr_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.3

File hashes

Hashes for llama_ocr_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7fec80f86967950c700330f7483a07463b4b69df2030cee245ce9655c53264e9
MD5 6f328abe96af8e6d4013b0438962458e
BLAKE2b-256 36173b133acb394df9865654bfb026fe4bd85c1d6b9d71c41f006c7bd6c4078a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page