Skip to main content

Library for easier access and research of wildlife re-identification datasets

Project description

GitHub issues GitHub pull requests GitHub contributors GitHub forks GitHub stars GitHub watchers License

Wildlife datasets

Pipeline for wildlife re-identification including dataset zoo, training tools and trained models. Usage includes classifying new images in labelled databases and clustering individuals in unlabelled databases.

Documentation · Report Bug · Request Feature · :mailbox_with_mail:Email

Wildlife datasets MegaDescriptor Wildlife tools
Datasets for identification of individual animals Trained model for individual re‑identification Tools for training re‑identification models

Wildlife Re-Identification (Re-ID) Datasets

The aim of the project is to provide comprehensive overview of datasets for wildlife individual re-identification and an easy-to-use package for developers of machine learning methods. The core functionality includes:

  • overview of 41 publicly available wildlife re-identification datasets.
  • utilities to mass download and convert them into a unified format and fix some wrong labels.
  • default splits for several machine learning tasks including the ability create additional splits.

An introductory example is provided in a Jupyter notebook. The package provides a natural synergy with Wildlife tools, which provides our MegaDescriptor model and tools for training neural networks.

Changelog

[08/10/2024] Added AmvrakikosTurtles, ReunionTurtles, ZakynthosTurtles (sea turtles), ELPephants (elephants) and Chicks4FreeID (chickens).
[13/06/2024] Added WildlifeReID-10k (unification of multiple datasets).
[09/05/2024] Added CatIndividualImages (cats), CowDataset (cows) and DogFaceNet (dogs).
[28/02/2024] Added MPDD (dogs), PolarBearVidID (polar bears) and SeaStarReID2023 (sea stars).
[04/01/2024] Received Best paper award at WACV 2024.

Summary of datasets

An overview of the provided datasets is available in the documentation, while the more numerical summary is located in a Jupyter notebook. Due to its size, it may be necessary to view it via nbviewer.

We include basic characteristics such as publication years, number of images, number of individuals, dataset time span (difference between the last and first image taken) and additional information such as source, number of poses, inclusion of timestamps, whether the animals were captured in the wild and whether the dataset contain multiple species.

Dataset summary

Installation

The installation of the package is simple by

pip install wildlife-datasets

Basic functionality

We show an example of downloading, extracting and processing the MacaqueFaces dataset.

from wildlife_datasets import analysis, datasets

datasets.MacaqueFaces.get_data('data/MacaqueFaces')
dataset = datasets.MacaqueFaces('data/MacaqueFaces')

The class dataset contains the summary of the dataset. The content depends on the dataset. Each dataset contains the identity and paths to images. This particular dataset also contains information about the date taken and contrast. Other datasets store information about bounding boxes, segmentation masks, position from which the image was taken, keypoints or various other information such as age or gender.

dataset.df
Overview of the MacaqueFaces dataset

The dataset also contains basic metadata including information about the number of individuals, time span, licences or published year.

dataset.summary
Metadata of the MacaqueFaces dataset

This particular dataset already contains cropped images of faces. Other datasets may contain uncropped images with bounding boxes or even segmentation masks.

d.plot_grid()

Additional functionality

For additional functionality including mass loading, datasets splitting or evaluation metrics we refer to the documentation or the notebooks.

Citation

If you like our package, please cite our paper. You may be also interested in our SeaTurtleID dataset published in another paper.

@InProceedings{Cermak_2024_WACV,
    author    = {\v{C}erm\'ak, Vojt\v{e}ch and Picek, Luk\'a\v{s} and Adam, Luk\'a\v{s} and Papafitsoros, Kostas},
    title     = {{WildlifeDatasets: An Open-Source Toolkit for Animal Re-Identification}},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2024},
    pages     = {5953-5963}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wildlife_datasets-1.0.5.tar.gz (54.5 kB view details)

Uploaded Source

Built Distribution

wildlife_datasets-1.0.5-py3-none-any.whl (81.3 kB view details)

Uploaded Python 3

File details

Details for the file wildlife_datasets-1.0.5.tar.gz.

File metadata

  • Download URL: wildlife_datasets-1.0.5.tar.gz
  • Upload date:
  • Size: 54.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for wildlife_datasets-1.0.5.tar.gz
Algorithm Hash digest
SHA256 fd1d87e70df12a7b713e2c49d8f444b590299ad60c03c346a746c8d1bdf7d846
MD5 dfb3d3690513426f6b2a51128020a5f3
BLAKE2b-256 58322e2788dfabc69c703984b878249aa5821a6ff645c3015f0e1ba1619ad7aa

See more details on using hashes here.

File details

Details for the file wildlife_datasets-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for wildlife_datasets-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 100ce0cb85c5de5c6d9dc6484839aefa055901a14486619daf114c87e1ae64b0
MD5 f3c77f76f60b4bdd5b0b1a81c220f501
BLAKE2b-256 ad478db104d5eaafed9a1b0ef23028c36c59c8f38a0868b74143d7c3d15ad3d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page