Skip to main content

Some image related machine learning methods, to be used by Ruurd Photos.

Project description

Ruurd Photos ML

Python Quality Checks License: MIT

A Python package providing a suite of machine learning tools for image analysis, designed to be the backbone of the Ruurd Photos project, a self-hosted Google Photos alternative. This package is intended to be called from Rust using PyO3.

✨ Features

This library offers a selection of pre-trained models for various image analysis tasks:

Image Captioning

Generate descriptive captions for images and ask questions about their content.

  • InstructBLIP: A powerful model for both generating detailed descriptions and answering questions about an image.
  • Salesforce BLIP: A robust model for generating high-quality image captions.

😀 Facial Recognition

Detect and analyze faces within images.

  • InsightFace: A comprehensive toolkit for face analysis that can:
    • Detect multiple faces in an image.
    • Estimate age and gender.
    • Identify key facial landmarks (eyes, nose, mouth).
    • Generate facial embeddings for clustering and recognition.

🖼️ Object Detection

Identify and locate various objects within an image.

  • ResNet: Utilizes a ResNet-based model to detect a wide range of common objects, returning their labels and bounding boxes.

🔤 Optical Character Recognition (OCR)

Detect and extract text from images.

  • ResNet & Tesseract: A two-stage process that first uses a ResNet model to determine if an image contains legible text, and then employs Tesseract to extract the text and its bounding boxes.

🚀 Installation

This package will be available on PyPI. You can install it using pip:

pip install ruurd-photos-ml

💻 Usage

The library is designed to be simple to use. Here are some examples for each of the main functionalities.

First, you'll need to load an image using Pillow:

from PIL import Image

# Load your image
image = Image.open("path/to/your/image.jpg")

Image Captioning

from ruurd_photos_ml import get_captioner, CaptionerProvider

# Initialize the captioner
captioner = get_captioner(CaptionerProvider.BLIP_INSTRUCT)

# Generate a simple caption
caption = captioner.caption(image)
print(f"Caption: {caption}")

# Ask a question about the image
question = "What color is the main object?"
answer = captioner.caption(image, instruction=question)
print(f"Answer: {answer}")

Facial Recognition

from ruurd_photos_ml import get_facial_recognition, FacialRecognitionProvider

# Initialize the facial recognition model
face_detector = get_facial_recognition(FacialRecognitionProvider.INSIGHT)

# Get faces from the image
faces = face_detector.get_faces(image)

for face in faces:
    print(f"Found a face at position {face.position} with confidence {face.confidence}")
    print(f"  - Age: {face.age}")
    print(f"  - Gender: {face.sex}")
    print(f"  - Embedding: {face.embedding[:5]}...")  # Showing first 5 values

Object Detection

from ruurd_photos_ml import get_object_detection, ObjectDetectionProvider

# Initialize the object detector
object_detector = get_object_detection(ObjectDetectionProvider.RESNET)

# Detect objects in the image
objects = object_detector.detect_objects(image)

for obj in objects:
    print(f"Detected '{obj.label}' with confidence {obj.confidence}")

Optical Character Recognition (OCR)

from ruurd_photos_ml import get_ocr, OCRProvider

# Initialize the OCR model
ocr = get_ocr(OCRProvider.RESNET_TESSERACT)

# Check for legible text
if ocr.has_legible_text(image):
    # Extract text (specify languages for better accuracy)
    text = ocr.get_text(image, languages=("eng", "nld"))
    print(f"Extracted Text: {text}")

    # Get text with bounding boxes
    boxes = ocr.get_boxes(image, languages=("eng", "nld"))
    for box in boxes:
        print(f"Found text: '{box.text}' at position {box.position}")

🛠️ Development

To contribute to this project, you can set up a local development environment.

  1. Clone the repository:

    git clone https://github.com/RuurdBijlsma/ruurd-photos-ml.git
    cd ruurd-photos-ml
    
  2. Install dependencies using uv:

    uv sync --all-extras --dev
    

3Run tests:

uv run pytest

3Quality checks:

pre-commit run -a

🔗 Project Links

📜 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruurd_photos_ml-0.2.2.tar.gz (120.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ruurd_photos_ml-0.2.2-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file ruurd_photos_ml-0.2.2.tar.gz.

File metadata

  • Download URL: ruurd_photos_ml-0.2.2.tar.gz
  • Upload date:
  • Size: 120.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ruurd_photos_ml-0.2.2.tar.gz
Algorithm Hash digest
SHA256 09c9073880b5a3823485f8812d5ef697fd5bb06e44b206e8597148f0be3ae7c8
MD5 ee8eca2de3b30028631a1dea0346e72a
BLAKE2b-256 a0283875a7ac494ad72c62cce4c17c90253082a126e032ba15302325f35d488e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ruurd_photos_ml-0.2.2.tar.gz:

Publisher: publish-to-pypi.yml on RuurdBijlsma/ruurd-photos-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ruurd_photos_ml-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ruurd_photos_ml-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cbce8c7b6466a2e0d94e6cf85fdeec79ecc9a8972b958550fc5226cbb08f58eb
MD5 a9e1578ff85989396ad16073ca2af5be
BLAKE2b-256 c9ecb7ff714f659b005281ab99ea3b31641d77beada80501541ae476397e5375

See more details on using hashes here.

Provenance

The following attestation bundles were made for ruurd_photos_ml-0.2.2-py3-none-any.whl:

Publisher: publish-to-pypi.yml on RuurdBijlsma/ruurd-photos-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page