Skip to main content

Some image related machine learning methods, to be used by Ruurd Photos.

Project description

Ruurd Photos ML

PyPI version Python Quality Checks codecov License: MIT

A Python package providing a suite of machine learning tools for image analysis, designed to be the backbone of the Ruurd Photos project, a self-hosted Google Photos alternative. This package is intended to be called from Rust using PyO3.

✨ Features

This library offers a selection of pre-trained models for various image analysis tasks:

Image Captioning

Generate descriptive captions for images and ask questions about their content.

  • InstructBLIP: A powerful model for both generating detailed descriptions and answering questions about an image.
  • Salesforce BLIP: A robust model for generating high-quality image captions.

😀 Facial Recognition

Detect and analyze faces within images.

  • InsightFace: A comprehensive toolkit for face analysis that can:
    • Detect multiple faces in an image.
    • Estimate age and gender.
    • Identify key facial landmarks (eyes, nose, mouth).
    • Generate facial embeddings for clustering and recognition.

🖼️ Object Detection

Identify and locate various objects within an image.

  • ResNet: Utilizes a ResNet-based model to detect a wide range of common objects, returning their labels and bounding boxes.

🔤 Optical Character Recognition (OCR)

Detect and extract text from images.

  • ResNet & Tesseract: A two-stage process that first uses a ResNet model to determine if an image contains legible text, and then employs Tesseract to extract the text and its bounding boxes.

🚀 Installation

This package will be available on PyPI. You can install it using pip:

pip install ruurd-photos-ml

💻 Usage

The library is designed to be simple to use. Here are some examples for each of the main functionalities.

First, you'll need to load an image using Pillow:

from PIL import Image

# Load your image
image = Image.open("path/to/your/image.jpg")

Image Captioning

from ruurd_photos_ml import get_captioner, CaptionerProvider

# Initialize the captioner
captioner = get_captioner(CaptionerProvider.BLIP_INSTRUCT)

# Generate a simple caption
caption = captioner.caption(image)
print(f"Caption: {caption}")

# Ask a question about the image
question = "What color is the main object?"
answer = captioner.caption(image, instruction=question)
print(f"Answer: {answer}")

Facial Recognition

from ruurd_photos_ml import get_facial_recognition, FacialRecognitionProvider

# Initialize the facial recognition model
face_detector = get_facial_recognition(FacialRecognitionProvider.INSIGHT)

# Get faces from the image
faces = face_detector.get_faces(image)

for face in faces:
    print(f"Found a face at position {face.position} with confidence {face.confidence}")
    print(f"  - Age: {face.age}")
    print(f"  - Gender: {face.sex}")
    print(f"  - Embedding: {face.embedding[:5]}...")  # Showing first 5 values

Object Detection

from ruurd_photos_ml import get_object_detection, ObjectDetectionProvider

# Initialize the object detector
object_detector = get_object_detection(ObjectDetectionProvider.RESNET)

# Detect objects in the image
objects = object_detector.detect_objects(image)

for obj in objects:
    print(f"Detected '{obj.label}' with confidence {obj.confidence}")

Optical Character Recognition (OCR)

from ruurd_photos_ml import get_ocr, OCRProvider

# Initialize the OCR model
ocr = get_ocr(OCRProvider.RESNET_TESSERACT)

# Check for legible text
if ocr.has_legible_text(image):
    # Extract text (specify languages for better accuracy)
    text = ocr.get_text(image, languages=("eng", "nld"))
    print(f"Extracted Text: {text}")

    # Get text with bounding boxes
    boxes = ocr.get_boxes(image, languages=("eng", "nld"))
    for box in boxes:
        print(f"Found text: '{box.text}' at position {box.position}")

🛠️ Development

To contribute to this project, you can set up a local development environment.

  1. Clone the repository:

    git clone https://github.com/RuurdBijlsma/ruurd-photos-ml.git
    cd ruurd-photos-ml
    
  2. Install dependencies using uv:

    uv sync --all-extras --dev
    

3Run tests:

uv run pytest

3Quality checks:

pre-commit run -a

🔗 Project Links

📜 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruurd_photos_ml-0.1.2.tar.gz (114.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ruurd_photos_ml-0.1.2-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file ruurd_photos_ml-0.1.2.tar.gz.

File metadata

  • Download URL: ruurd_photos_ml-0.1.2.tar.gz
  • Upload date:
  • Size: 114.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ruurd_photos_ml-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f7c4ec9fef6f3304e3833472777509ebde82acb5cdccae9e952560c27e908360
MD5 2604feaca0a967021f4340df1f2acf86
BLAKE2b-256 6154317d93d394d7fe18ec3d1c6b84e81ec0c9ed4e344f452189fc34bc7466a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for ruurd_photos_ml-0.1.2.tar.gz:

Publisher: publish-to-pypi.yml on RuurdBijlsma/ruurd-photos-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ruurd_photos_ml-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ruurd_photos_ml-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0ccf5836256c5b19f978f1fe408bfa82c05a222099cad6422b1ad5d82e87c83c
MD5 ecbc4806995c968131751dadc56b02b3
BLAKE2b-256 08a12f06b520ffd4bec2133404fa96e219c61bffeea6043214680b2af3530030

See more details on using hashes here.

Provenance

The following attestation bundles were made for ruurd_photos_ml-0.1.2-py3-none-any.whl:

Publisher: publish-to-pypi.yml on RuurdBijlsma/ruurd-photos-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page