Easily load and transform datasets for object detection

These details have not been verified by PyPI

Project links

Project description

Detection datasets

Easily load and transform datasets for object detection.

Documentation: https://blinjrm.github.io/detection-datasets/

Source Code: https://github.com/blinjrm/detection-datasets

Datasets on Hugging Face Hub: https://huggingface.co/detection-datasets

detection_datasets aims to make it easier to work with detection datasets.

This library works alongside the Detection dataset organisation on the 🤗 Hub, where some detection datasets have been uploaded in the format expected by the library, and are ready to use.

The main features are:

Read the dataset :
- From disk if it has already been downloaded.
- Directly from the Hugging Face Hub if it already exist.
Transform the dataset:
- Select a subset of data.
- Remap categories.
- Create new train-val-test splits.
Visualize the annotations and images.
Write the dataset:
- To disk, selecting the target detection format: COCO, YOLO and more to come.
- To the Hugging Face Hub for easy reuse in a different environment and share with the community.

Read the quick start bellow, or directly jump to the tutorials:

Goal	Tutorial	Colab
Load from disk and upload to the Hub	Open in the docs
Load from the Hub and transform	Open in the docs

Getting started

0. Setup

Requirements

Python 3.8.1+

detection_datasets is upon the great work of:

Pandas for manipulating data.
Hugging Face Datasets to store and load datasets from the Hub.

Installation

$ pip install detection_datasets

Import

from detection_datasets import DetectionDataset

1. Read

From local filesystem

config = {
    'dataset_format': 'coco',                   # the format of the dataset on disk
    'path': 'path/do/data/on/disk',             # where the dataset is located
    'splits': {                                 # how to read the files
        'train': ('train.json', 'train'),       # name of the split (annotation file, images directory)
        'test': ('test.json', 'test'),
    },
}

dd = DetectionDataset()
dd.from_disk(**config)

# note that you can use method cascading as well:
# dd = DetectionDataset().from_disk(**config)

From the Hugging Face Hub

The detection_dataset library works alongside the Detection dataset organisation on the Hugging Face Hub, where some detection datasets have been uploaded in the format expected by the library, and are ready to use.

dd = DetectionDataset().from_hub(name='fashionpedia')

Currently supported format for reading datasets are:

COCO
more to come

The list of datasets available from the Hub is given by:

# Search in the "detection-datasets" repository on the Hub.
DetectionDataset().available_in_hub()

# Search in another repository on the Hub.
DetectionDataset().available_in_hub(repo_name=MY_REPO_OR_ORGANISATION)

2. Transform

The supported transformations are:

# Select a subset of images, perserving the splits and their proportions
dd.select(n_images=1000)

# Shuffle the dataset, perserving the splits and their proportions
dd.shuffle(seed=42)

# Create new train-val-test splits, overwritting the splits from the original dataset
dd.split(splits=[0.8, 0.1, 0.1])

# Map existing categories to new categories.
# The annotations with a category absent from the mapping are dropped.
dd.map_categories(mapping={'existing_category': 'new_category'})

These transformations can be chained; for example here we select a subset of 10.000 images and create new train-val-test splits:

dd = DetectionDataset()\
    .from_hub(name='fashionpedia')\
    .select(n_images=10000)\
    .split(splits=[0.8, 0.1, 0.1])

3. Visualize

The DetectionDataset objects contains several properties to analyze your data:

dd.data                     # This is equivlent to calling `dd.get_data('image')`,
                            # and returns a DataFrame with 1 row per image

dd.get_data('bbox')         # Returns a DataFrame with 1 row per annotation

dd.n_images                 # Number of images

dd.n_bbox                   # Number of annotations

dd.splits                   # List of split names

dd.split_proportions        # DataFrame with the % of iamges in each split

dd.categories               # DataFrame with the categories and thei ids

dd.category_names           # List of categories

dd.n_categories             # Number of categories

You can also visualize a image with its annotations in a notebook:

dd.show()                   # Shows a random image from the dataset
dd.show(image_id=42)        # Shows the select image based on image_id

4. Write

To local filesystem

Once the dataset is ready, you can write it to the local filesystem in a given format:

dd.to_disk(
    dataset_format='yolo',
    name='MY_DATASET_NAME',
    path='DIRECTORY_TO_WRITE_TO',
)

Currently supported format for writing datasets are:

YOLO
COCO
MMDET
more to come

To the Hugging Face Hub

The dataset can also be easily uploaded to the Hugging Face Hub, for reuse later on or in a different environment:

dd.to_hub(
    dataset_name='MY_DATASET_NAME',
    repo_name='MY_REPO_OR_ORGANISATION'
)

The dataset viewer on the Hub will work out of the box, and we encourage you to update the README in your new repo to make it easier for the comminuty to use the dataset.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.8

Dec 6, 2023

0.3.7

Oct 11, 2023

0.3.6

Aug 6, 2023

0.3.5

Jan 3, 2023

0.3.3

Oct 12, 2022

0.3.2

Oct 9, 2022

0.3.1

Sep 21, 2022

0.3.0

Sep 20, 2022

0.2.4

Sep 20, 2022

0.2.3

Sep 19, 2022

0.2.2

Sep 19, 2022

0.2.1

Sep 18, 2022

0.2.0

Sep 18, 2022

0.1.0

Sep 13, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detection_datasets-0.3.8.tar.gz (17.1 kB view details)

Uploaded Dec 6, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

detection_datasets-0.3.8-py3-none-any.whl (20.2 kB view details)

Uploaded Dec 6, 2023 Python 3

File details

Details for the file detection_datasets-0.3.8.tar.gz.

File metadata

Download URL: detection_datasets-0.3.8.tar.gz
Upload date: Dec 6, 2023
Size: 17.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.10.2 Linux/6.2.0-1016-azure

File hashes

Hashes for detection_datasets-0.3.8.tar.gz
Algorithm	Hash digest
SHA256	`e5847f31d9e11a5c8bce3741398b3c60f31b8f6667ed4702f1729b1982509b83`
MD5	`f72a36db97cdc478969db73dfe898794`
BLAKE2b-256	`55367c0f9a6f1af2eaab87ac99f3fa9fddfda47d5333d4df3e856502486358fe`

See more details on using hashes here.

File details

Details for the file detection_datasets-0.3.8-py3-none-any.whl.

File metadata

Download URL: detection_datasets-0.3.8-py3-none-any.whl
Upload date: Dec 6, 2023
Size: 20.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.10.2 Linux/6.2.0-1016-azure

File hashes

Hashes for detection_datasets-0.3.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2df798a7e31fad822e75ad9f0c36803cc6d33799f456b87e8a02f951c2719b1b`
MD5	`cd9be1b0d971e69a1c177e62cb96678c`
BLAKE2b-256	`58767c162c5b7bd2adf9559e2f82d208b7e6563d127947bd7635954d697f982c`

See more details on using hashes here.

detection_datasets 0.3.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Detection datasets

Getting started

0. Setup

Requirements

Installation

Import

1. Read

From local filesystem

From the Hugging Face Hub

2. Transform

3. Visualize

4. Write

To local filesystem

To the Hugging Face Hub

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes