A package for document extraction using Ollama Vision model
Project description
llama_ocr
A Python package for document text extraction using Ollama Vision models, with a focus on OCR (Optical Character Recognition) capabilities.
Features
- 🔍 Text extraction from images using Ollama Vision models
- 🖼️ Advanced image preprocessing for better OCR results
- 🛠️ Configurable settings and parameters
- 🔧 Extensible architecture with dependency injection
- 📝 Comprehensive logging and error handling
Installation
pip install llama_ocr
Or install from source:
git clone https://github.com/princexoleo/llama_ocr.git
cd llama_ocr
pip install -e .
Quick Start
from llama_ocr import DocumentExtractor
# Initialize extractor
extractor = DocumentExtractor()
# Extract text from an image
result = extractor.extract_from_image(
"path/to/image.jpg",
prompt="Extract all text from this document"
)
# Print extracted text
print(result["text"])
Advanced Usage
Custom Configuration
from llama_ocr import DocumentExtractor, OCRConfig
config = OCRConfig(
model_name="llama3.2-vision",
preprocess_images=True,
optimize_for_ocr=True,
log_level="INFO"
)
extractor = DocumentExtractor(config=config)
Image Processing Options
# Extract with OCR optimization
result = extractor.extract_from_image(
"path/to/image.jpg",
save_processed=True # Save processed image for inspection
)
Architecture
The package follows SOLID principles and uses dependency injection for flexibility:
DocumentExtractor: Main class orchestrating the extraction processVisionClient: Abstract base class for vision model interactionsImagePreprocessor: Abstract base class for image processingOCRConfig: Configuration management using dataclasses
Components
-
Core Module
extractor.py: Main document extraction logicvision_client.py: Ollama Vision API integrationimage_processor.py: Image preprocessing utilitiesbase.py: Abstract base classes and interfaces
-
Configuration
config.py: Configuration management using dataclasses
Dependencies
- ollama>=0.1.27
- Pillow>=10.1.0
- python-dotenv>=1.0.0
- opencv-python>=4.8.1.78
- numpy>=1.21.0
Development
Setting up Development Environment
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
.\venv\Scripts\activate # Windows
# Install development dependencies
pip install -e ".[dev]"
Running Tests
python -m unittest discover -s tests
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Thanks to the Ollama team for providing the vision models
- Built with ❤️ by Mazharul Islam Leon
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_ocr_py-0.1.0.tar.gz.
File metadata
- Download URL: llama_ocr_py-0.1.0.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f75195157565e7f557f2e0bb293e0b64f21fd310d21c7f4a40e2065a4c1a161
|
|
| MD5 |
19513278ed41ae63896919adcd53abf5
|
|
| BLAKE2b-256 |
8bd86127f444ca4c9759bb8ed2bcb44f869cf017f65b7e00614342ff9bae0a09
|
File details
Details for the file llama_ocr_py-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llama_ocr_py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fec80f86967950c700330f7483a07463b4b69df2030cee245ce9655c53264e9
|
|
| MD5 |
6f328abe96af8e6d4013b0438962458e
|
|
| BLAKE2b-256 |
36173b133acb394df9865654bfb026fe4bd85c1d6b9d71c41f006c7bd6c4078a
|