Some image related machine learning methods, to be used by Ruurd Photos.
Project description
Ruurd Photos ML
A Python package providing a suite of machine learning tools for image analysis, designed to be the backbone of the Ruurd Photos project, a self-hosted Google Photos alternative. This package is intended to be called from Rust using PyO3.
✨ Features
This library offers a selection of pre-trained models for various image analysis tasks:
Image Captioning
Generate descriptive captions for images and ask questions about their content.
- InstructBLIP: A powerful model for both generating detailed descriptions and answering questions about an image.
- Salesforce BLIP: A robust model for generating high-quality image captions.
😀 Facial Recognition
Detect and analyze faces within images.
- InsightFace: A comprehensive toolkit for face analysis that can:
- Detect multiple faces in an image.
- Estimate age and gender.
- Identify key facial landmarks (eyes, nose, mouth).
- Generate facial embeddings for clustering and recognition.
🖼️ Object Detection
Identify and locate various objects within an image.
- ResNet: Utilizes a ResNet-based model to detect a wide range of common objects, returning their labels and bounding boxes.
🔤 Optical Character Recognition (OCR)
Detect and extract text from images.
- ResNet & Tesseract: A two-stage process that first uses a ResNet model to determine if an image contains legible text, and then employs Tesseract to extract the text and its bounding boxes.
🚀 Installation
This package will be available on PyPI. You can install it using pip:
pip install ruurd-photos-ml
💻 Usage
The library is designed to be simple to use. Here are some examples for each of the main functionalities.
First, you'll need to load an image using Pillow:
from PIL import Image
# Load your image
image = Image.open("path/to/your/image.jpg")
Image Captioning
from ruurd_photos_ml import get_captioner, CaptionerProvider
# Initialize the captioner
captioner = get_captioner(CaptionerProvider.BLIP_INSTRUCT)
# Generate a simple caption
caption = captioner.caption(image)
print(f"Caption: {caption}")
# Ask a question about the image
question = "What color is the main object?"
answer = captioner.caption(image, instruction=question)
print(f"Answer: {answer}")
Facial Recognition
from ruurd_photos_ml import get_facial_recognition, FacialRecognitionProvider
# Initialize the facial recognition model
face_detector = get_facial_recognition(FacialRecognitionProvider.INSIGHT)
# Get faces from the image
faces = face_detector.get_faces(image)
for face in faces:
print(f"Found a face at position {face.position} with confidence {face.confidence}")
print(f" - Age: {face.age}")
print(f" - Gender: {face.sex}")
print(f" - Embedding: {face.embedding[:5]}...") # Showing first 5 values
Object Detection
from ruurd_photos_ml import get_object_detection, ObjectDetectionProvider
# Initialize the object detector
object_detector = get_object_detection(ObjectDetectionProvider.RESNET)
# Detect objects in the image
objects = object_detector.detect_objects(image)
for obj in objects:
print(f"Detected '{obj.label}' with confidence {obj.confidence}")
Optical Character Recognition (OCR)
from ruurd_photos_ml import get_ocr, OCRProvider
# Initialize the OCR model
ocr = get_ocr(OCRProvider.RESNET_TESSERACT)
# Check for legible text
if ocr.has_legible_text(image):
# Extract text (specify languages for better accuracy)
text = ocr.get_text(image, languages=("eng", "nld"))
print(f"Extracted Text: {text}")
# Get text with bounding boxes
boxes = ocr.get_boxes(image, languages=("eng", "nld"))
for box in boxes:
print(f"Found text: '{box.text}' at position {box.position}")
🛠️ Development
To contribute to this project, you can set up a local development environment.
-
Clone the repository:
git clone https://github.com/RuurdBijlsma/ruurd-photos-ml.git cd ruurd-photos-ml
-
Install dependencies using
uv:uv sync --all-extras --dev
3Run tests:
uv run pytest
3Quality checks:
pre-commit run -a
🔗 Project Links
- **Homepage **: https://github.com/RuurdBijlsma/ruurd-photos-ml
- **Repository **: https://github.com/RuurdBijlsma/ruurd-photos-ml
- **Documentation **: https://ruurdbijlsma.github.io/ruurd-photos-ml
📜 License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ruurd_photos_ml-0.1.2.tar.gz.
File metadata
- Download URL: ruurd_photos_ml-0.1.2.tar.gz
- Upload date:
- Size: 114.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7c4ec9fef6f3304e3833472777509ebde82acb5cdccae9e952560c27e908360
|
|
| MD5 |
2604feaca0a967021f4340df1f2acf86
|
|
| BLAKE2b-256 |
6154317d93d394d7fe18ec3d1c6b84e81ec0c9ed4e344f452189fc34bc7466a9
|
Provenance
The following attestation bundles were made for ruurd_photos_ml-0.1.2.tar.gz:
Publisher:
publish-to-pypi.yml on RuurdBijlsma/ruurd-photos-ml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ruurd_photos_ml-0.1.2.tar.gz -
Subject digest:
f7c4ec9fef6f3304e3833472777509ebde82acb5cdccae9e952560c27e908360 - Sigstore transparency entry: 598001917
- Sigstore integration time:
-
Permalink:
RuurdBijlsma/ruurd-photos-ml@3e7a2fcc56c1d69f9803f5efb6dedf4ebb302a78 -
Branch / Tag:
refs/tags/0.1.2 - Owner: https://github.com/RuurdBijlsma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@3e7a2fcc56c1d69f9803f5efb6dedf4ebb302a78 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ruurd_photos_ml-0.1.2-py3-none-any.whl.
File metadata
- Download URL: ruurd_photos_ml-0.1.2-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ccf5836256c5b19f978f1fe408bfa82c05a222099cad6422b1ad5d82e87c83c
|
|
| MD5 |
ecbc4806995c968131751dadc56b02b3
|
|
| BLAKE2b-256 |
08a12f06b520ffd4bec2133404fa96e219c61bffeea6043214680b2af3530030
|
Provenance
The following attestation bundles were made for ruurd_photos_ml-0.1.2-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on RuurdBijlsma/ruurd-photos-ml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ruurd_photos_ml-0.1.2-py3-none-any.whl -
Subject digest:
0ccf5836256c5b19f978f1fe408bfa82c05a222099cad6422b1ad5d82e87c83c - Sigstore transparency entry: 598001929
- Sigstore integration time:
-
Permalink:
RuurdBijlsma/ruurd-photos-ml@3e7a2fcc56c1d69f9803f5efb6dedf4ebb302a78 -
Branch / Tag:
refs/tags/0.1.2 - Owner: https://github.com/RuurdBijlsma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@3e7a2fcc56c1d69f9803f5efb6dedf4ebb302a78 -
Trigger Event:
push
-
Statement type: