Skip to main content

A library to preprocess image data.

Project description

Paidiverpy

Paidiverpy is a Python package designed to create pipelines for preprocessing image data for biodiversity analysis.

Note: This package is still in active development, and frequent updates and changes are expected. The API and features may evolve as we continue improving it.

Documentation

The official documentation is hosted on ReadTheDocs.org: https://paidiverpy.readthedocs.io/

Note: Comprehensive documentation is under construction.

Installation

To install paidiverpy, run:

pip install paidiverpy

Build from Source

You can install paidiverpy locally or on a notebook server such as JASMIN or the NOC Data Science Platform (DSP). The following steps are applicable to both environments, but steps 2 and 3 are required if you are using a notebook server.

  1. Clone the repository:

    # ssh
    git clone git@github.com:paidiver/paidiverpy.git
    
    # https
    # git clone https://github.com/paidiver/paidiverpy.git
    
    cd paidiverpy
    
  2. (Optional) Create a Python virtual environment to manage dependencies separately from other projects. For example, using conda:

    conda init
    
    # Command to restart the terminal. This command may not be necessary if mamba init has already been successfully run before
    exec bash
    
    conda env create -f environment.yml
    conda activate Paidiverpy
    
  3. (Optional) For JASMIN or DSP users, you also need to install the environment in the Jupyter IPython kernel. Execute the following command:

    python -m ipykernel install --user --name Paidiverpy
    
  4. Install the paidiverpy package:

    Finally, you can install the paidiverpy package:

    pip install -e .
    

Package Organisation

Configuration File

First, create a configuration file. Example configuration files for processing the sample datasets are available in the example/config directory. You can use these files to test the example notebooks described in the Usage section. Note that running the examples will automatically download the sample data.

The configuration file should follow the JSON schema described in the configuration file schema. An online tool to validate configuration files is available here.

Metadata

To use this package, you may need a metadata file, which can be an IFDO.json file (following the IFDO standard) or a CSV file. For CSV files, ensure the filename column uses one of the following headers: ['image-filename', 'filename', 'file_name', 'FileName', 'File Name'].

Other columns like datetime, latitude, and longitude should follow these conventions:

  • Datetime: ['image-datetime', 'datetime', 'date_time', 'DateTime', 'Datetime']
  • Latitude: ['image-latitude', 'lat', 'latitude_deg', 'latitude', 'Latitude', 'Latitude_deg', 'Lat']
  • Longitude: ['image-longitude', 'lon', 'longitude_deg', 'longitude', 'Longitude', 'Longitude_deg', 'Lon']

Examples of CSV and IFDO metadata files are in the example/metadata directory.

Layers

The package is organised into multiple layers:

Package Organisation

The Paidiverpy class serves as the main container for image processing functions. It manages several subclasses for specific processing tasks: OpenLayer, ConvertLayer, PositionLayer, ResampleLayer, and ColourLayer.

Supporting classes include:

  • Configuration: Parses and manages configuration files.
  • Metadata: Handles metadata.
  • ImagesLayer: Stores outputs from each image processing step.

The Pipeline class integrates all processing steps defined in the configuration file.

Usage

While comprehensive documentation is forthcoming, you can explore various use cases through sample notebooks in the examples/example_notebooks directory:

Example Data

If you'd like to manually download example data for testing, you can use the following command:

from paidiverpy import data
data.load(DATASET_NAME)

Available datasets:

  • pelagic_csv
  • benthic_csv
  • benthic_ifdo

Example data will be automatically downloaded when running the example notebooks.

Command-Line Arguments

Pipelines can be executed via command-line arguments. For example:

paidiverpy -c examples/config_files/config_simple.yaml

This runs the pipeline according to the configuration file, saving output images to the directory defined in the output_path.

Docker Command

You can also run Paidiverpy using Docker. You can either build the container locally or pull it from Docker Hub.

  1. Build the container locally:

    git clone git@github.com:paidiver/paidiverpy.git
    cd paidiverpy
    docker build -t paidiverpy .
    
  2. Pull the image from Docker Hub:

    docker pull soutobias/paidiverpy:latest
    docker tag soutobias/paidiverpy:latest paidiverpy:latest
    

Run the container with:

docker run --rm \
-v <INPUT_PATH>:/app/input/ \
-v <OUTPUT_PATH>:/app/output/ \
-v <FULL_PATH_OF_CONFIGURATION_FILE_WITHOUT_FILENAME>:/app/config_files \
paidiverpy \
paidiverpy -c /app/examples/config_files/<CONFIGURATION_FILE_FILENAME>

In this command:

  • <INPUT_PATH>: The input path defined in your configuration file, where the input images are located.
  • <OUTPUT_PATH>: The output path defined in your configuration file.
  • <FULL_PATH_OF_CONFIGURATION_FILE_WITHOUT_FILENAME>: The local directory of your configuration file.
  • <CONFIGURATION_FILE_FILENAME>: The name of the configuration file.

The output images will be saved to the specified output_path.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paidiverpy-0.0.2.tar.gz (26.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Paidiverpy-0.0.2-py3-none-any.whl (68.5 kB view details)

Uploaded Python 3

File details

Details for the file paidiverpy-0.0.2.tar.gz.

File metadata

  • Download URL: paidiverpy-0.0.2.tar.gz
  • Upload date:
  • Size: 26.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for paidiverpy-0.0.2.tar.gz
Algorithm Hash digest
SHA256 24c4a8b9712eefd473a4ba769b938af4311b66e5b69bd029a9311be2eb20030d
MD5 a0f9ae260173f742e137a3d5d265f32f
BLAKE2b-256 61cfd48bbdd36aa81ea4ad4c108df1ec3d3b23cf53219197d75e2537a537ed0f

See more details on using hashes here.

File details

Details for the file Paidiverpy-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: Paidiverpy-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 68.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for Paidiverpy-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 42bd4c13af87b1d2b14498b86dae236c749e2481a613476a23bcec07553ac9c8
MD5 877cc5727371e5004b5b8545f83c6d7d
BLAKE2b-256 05d04d9e925ba920b39008a51712df2bd3ab259a6d23f081d1dae7dd1f89973b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page