A utility repo for vision dataset access and management.

These details have not been verified by PyPI

Project links

Homepage

Project description

Vision Datasets

Introduction

This repo

defines unified contract for dataset for purposes such as training, visualization, and exploration, via DatasetManifest, ImageDataManifest, etc.
provides many commonly used dataset operation, such as sample dataset by categories, sample few-shot sub-dataset, sample dataset by ratios, train-test split, merge dataset, etc. (See Here)
provides API for organizing and accessing datasets, via DatasetHub

Currently, seven basic types of data are supported:

image_classification_multiclass: each image can is only with one label.
image_classification_multilabel: each image can is with one or multiple labels (e.g., 'cat', 'animal', 'pet').
image_object_detection: each image is labeled with bounding boxes surrounding the objects of interest.
image_text_matching: each image is associated with a collection of texts describing the image, and whether each text description matches the image or not.
image_matting: each image has a pixel-wise annotation, where each pixel is labeled as 'foreground' or 'background'.
image_regression: each image is labeled with a real-valued numeric regression target.
image_caption: each image is labeled with a few texts describing the images.
text_2_image_retrieval: each image is labeled with a number of text queries describing the image. Optionally, an image is associated with one label.
visual_question_answering: each image is labeled with a number of question-answer pairs
visual_object_grounding: each image is labeled with a number of question-answer-bboxes triplets.

multitask type is a composition type, where one set of images has multiple sets of annotations available for different tasks, where each task can be of any basic type.

key_value_pair type is a generalized type, where a sample can be one or multiple images with optional text, labeled with key-value pairs. The keys and values are defined by a schema. Note that all the above seven basic types can be defined as this type with specific schemas.

Note that image_caption and text_2_image_retrieval might be merged into image_text_matching in future.

Dataset Contracts

We support datasets with two types of annotations:

single-image annotation (S), and
multi-image annotation (M)

Below table shows all the supported contracts:

Annotation	Contract class	Explaination
S	`DatasetManifest`	wraps the information about a dataset including labelmap, images (width, height, path to image), and annotations. Information about each image is obtained in `ImageDataManifest`. For multitask dataset, the labels stored in the ImageDataManifest is a dict mapping from task name to that task's labels. The labelmap stored in DatasetManifest is also a dict mapping from task name to that task's labels.
S,M	`ImageDataManifest`	encapsulates image-specific information, such as image id, path, labels, and width/height. One thing to note here is that the image path can be: 1. a local path (absolute `c:\images\1.jpg` or relative `images\1.jpg`), 2. a local path in a non-compressed zip file (absolute `c:\images.zip@1.jpg` or relative `images.zip@1.jpg`) or 3. an url. All three kinds of paths can be loaded by `VisionDataset`
S	`ImageLabelManifest`	encapsulates one single image-level annotation
S	`CategoryManifest`	encapsulates the information about a category, such as its name and super category, if applicable
M	`MultiImageLabelManifest`	is abstract class. It encapsulates one annotation with one or multiple images, each image is stored as an image index.
M	`DatasetManifestWithMultiImageLabel`	supports annotations associated with one or multiple images. Each annotation is represented by `MultiImageLabelManifest` class, and each image is represented by `ImageDataManifest`.
M	`KeyValuePairDatasetManifest`	inherits `DatasetManifestWithMultiImageLabel`, dataset with each sample having `KeyValuePairLabelManifest` label, dataset is also associated with a schema to define the expected keys and values.
M	`KeyValuePairLabelManifest`	inherits `MultiImageLabelManifest`, encapsulates label information of `KeyValuePairDatasetManifest`. Each label has fields `img_ids` (associated images), `text` (associated text input), and `fields` (dictionary of interested field keys and values).
S,M	`VisionDataset`	is an iterable dataset class that consumes the information from `DatasetManifest` or `DatasetManifestWithMultiImageLabel`

Creating DatasetManifest

In addition to loading a serialized DatasetManifest for instantiation, this repo currently supports two formats of data that can instantiates DatasetManifest, using DatasetManifest.create_dataset_manifest(dataset_info, usage, container_sas_or_root_dir): COCO and IRIS (legacy).

DatasetInfo as the first arg in the arg list wraps the metainfo about the dataset like the name of the dataset, locations of the images, annotation files, etc. See examples in the sections below for different data formats.

Once a DatasetManifest is created, you can create a VisionDataset for accessing the data in the dataset, especially the image data, for training, visualization, etc:

dataset = VisionDataset(dataset_info, dataset_manifest, coordinates='relative')

Creating KeyValuePairDatasetManifest

You can use CocoManifestAdaptorFactory to create the manifest from COCO format data and a schema, a COCO data example can be found in COCO_DATA_FORMAT.md, and a schema example (dictionary) can be found in DATA_PREPARATION.md.

from vision_datasets.common import CocoManifestAdaptorFactory, DatasetTypes
# check schema dictionary example From `DATA_PREPARATION.md`
adaptor = CocoManifestAdaptorFactory.create(DatasetTypes.KEY_VALUE_PAIR, schema=schema_dict)
key_value_pair_dataset_manifest = adaptor.create_dataset_manifest(coco_file_path_or_url='test.json', url_or_root_dir='data/')  # image paths in test.json is relative to url_or_root_dir
# test the first sample
print(
    key_value_pair_dataset_manifest.images[0].img_path,'\n',
    key_value_pair_dataset_manifest.annotations[0].fields,'\n',
    key_value_pair_dataset_manifest.annotations[0].text,'\n',
)

Once a KeyValuePairDatasetManifest is created, along with a dataset_info, create a VisionDataset for accessing the data in the dataset.

from vision_datasets.common import DatasetInfoFactory, VisionDataset
# check dataset information dictionary example From `DATA_PREPARATION.md`
dataset_info = DatasetInfoFactory.create(dataset_info_dict)
dataset = VisionDataset(dataset_info, key_value_pair_dataset_manifest)
# test the first sample
imgs, target, _ = dataset[0]
print(imgs)
print(target)

Loading IC/OD/VQA Datasets in KeyValuePair (KVP) Format:

You can convert an existing IC/OD VisionDataset to the generalized KVP format using the following adapter:

# For MultiClass and MultiLabel IC dataset
from vision_datasets.image_classification import MulticlassClassificationAsKeyValuePairDataset, MultilabelClassificationAsKeyValuePairDataset
sample_multiclass_ic_dataset = VisionDataset(dataset_info, dataset_manifest)
kvp_dataset = MulticlassClassificationAsKeyValuePairDataset(sample_multiclass_ic_dataset)
sample_multilabel_ic_dataset = VisionDataset(dataset_info, dataset_manifest)
kvp_dataset = MultilabelClassificationAsKeyValuePairDataset(sample_multilabel_ic_dataset)


# For OD dataset
from vision_datasets.image_object_detection import DetectionAsKeyValuePairDataset, DetectionAsKeyValuePairDatasetForMultilabelClassification
sample_od_dataset = VisionDataset(dataset_info, dataset_manifest)
kvp_dataset = DetectionAsKeyValuePairDataset(sample_od_dataset)
kvp_dataset_for_multilabel_classification = DetectionAsKeyValuePairDatasetForMultilabelClassification(sample_od_dataset)

# For VQA dataset
from vision_datasets.visual_question_answering import VQAAsKeyValuePairDataset
sample_vqa_dataset = VisionDataset(dataset_info, dataset_manifest)
kvp_dataset = VQAAsKeyValuePairDataset(sample_vqa_dataset)

Coco format

Here is an example with explanation of what a DatasetInfo looks like for coco format, when it is serialized into json:

    {
        "name": "sampled-ms-coco",
        "version": 1,
        "description": "A sampled ms-coco dataset.",
        "type": "object_detection",
        "format": "coco", // indicating the annotation data are stored in coco format
        "root_folder": "detection/coco2017_20200401", // a root folder for all files listed
        "train": {
            "index_path": "train.json", // coco json file for training, see next section for example
            "files_for_local_usage": [ // associated files including data such as images
                "images/train_images.zip"
            ]
        },
        "val": {
            "index_path": "val.json",
            "files_for_local_usage": [
                "images/val_images.zip"
            ]
        },
        "test": {
            "index_path": "test.json",
            "files_for_local_usage": [
                "images/test_images.zip"
            ]
        }
    }

Coco annotation format details w.r.t. image_classification_multiclass/label, image_object_detection, image_caption, image_text_match, key_value_pair, and multitask can be found in COCO_DATA_FORMAT.md.

Index file can be put into a zip file as well (e.g., annotations.zip@train.json), no need to add the this zip to "files_for_local_usage" explicitly.

Iris format

Iris format is a legacy format which can be found in IRIS_DATA_FORMAT.md. Only multiclass/label_classification, object_detection and multitask are supported.

Dataset management and access

Check DATA_PREPARATION.md for complete guide on how to prepare datasets in steps.

Once you have multiple datasets, it is more convenient to have all the DatasetInfo in one place and instantiate DatasetManifest or even VisionDataset by just using the dataset name, usage ( train, val ,test) and version.

This repo offers the class DatasetHub for this purpose. Once instantiated with a json including the DatasetInfo for all datasets, you can retrieve a VisionDataset by

import pathlib
from vision_datasets.common import Usages, DatasetHub

dataset_infos_json_path = 'datasets.json'
dataset_hub = DatasetHub(pathlib.Path(dataset_infos_json_path).read_text(), blob_container_sas, local_dir)
stanford_cars = dataset_hub.create_vision_dataset('stanford-cars', version=1, usage=Usages.TRAIN)

# note that you can pass multiple datasets.json to DatasetHub, it can combine them all
# example: DatasetHub([ds_json1, ds_json2, ...])
# note that you can specify multiple usages in create_vision_dataset call
# example dataset_hub.create_vision_dataset('stanford-cars', version=1, usage=[Usages.TRAIN, Usages.VAL])

for img, targets, sample_idx_str in stanford_cars:
    if isinstance(img, list):  # for key_value_pair dataset, the first item is a list of images
       img = img[0]
    img.show()
    img.close()
    print(targets)
    input()

Note that this hub class works with data saved in both Azure Blob container and on local disk.

If local_dir:

is provided, the hub will look for the resources locally and download the data (files included in " files_for_local_usage", the index files, metadata (if iris format), labelmap (if iris format)) from blob_container_sas if not present locally
is NOT provided (i.e. None), the hub will create a manifest dataset that directly consumes data from the blob indicated by blob_container_sas. Note that this does not work, if data are stored in zipped files. You will have to unzip your data in the azure blob. (Index files requires no update, if image paths are for zip files: a.zip@1.jpg). This kind of azure-based dataset is good for large dataset exploration, but can be slow for training.

When data exists on local disk, blob_container_sas can be None.

Operations on manifests {#oom}

There are supported operations on manifests for different data types, such as split, merge, sample, etc. You can run

vision_list_supported_operations -d {DATA_TYPE}

to see the supported operations for a specific data type. You can use the factory classes in vision_datasets.common.factory to create operations for certain data type.

from vision_datasets.common import DatasetTypes, SplitFactory, SplitConfig


data_manifest = ....
splitter = SplitFactory.create(DatasetTypes.IMAGE_CLASSIFICATION_MULTICLASS, SplitConfig(ratio=0.3))
manifest_1, manifest_2 = splitter.run(data_manifest)

Training with PyTorch

Training with PyTorch is easy. After instantiating a VisionDataset, simply passing it in vision_datasets.common.dataset.TorchDataset together with the transform, then you are good to go with the PyTorch DataLoader for training.

Helpful commands

There are a few commands that come with this repo once installed, such as datset check and download, detection conversion to classification dataset, and so on, check UTIL_COMMANDS.md for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.21

Mar 24, 2025

1.0.20

Feb 20, 2025

1.0.19

Nov 6, 2024

1.0.18

Oct 18, 2024

1.0.17

Oct 2, 2024

1.0.16

Sep 24, 2024

1.0.15

Sep 10, 2024

1.0.14

Aug 26, 2024

1.0.13

Aug 2, 2024

1.0.12

Feb 2, 2024

1.0.11

Nov 23, 2023

1.0.10

Nov 16, 2023

1.0.9

Oct 19, 2023

1.0.8

Sep 29, 2023

1.0.7

Sep 14, 2023

1.0.6

Sep 11, 2023

1.0.5

Sep 11, 2023

1.0.4

Sep 1, 2023

1.0.3

Aug 20, 2023

1.0.2

Jul 12, 2023

1.0.1

Jul 10, 2023

1.0.0

Jun 28, 2023

0.2.29

Jun 20, 2023

0.2.28

May 19, 2023

0.2.27

Apr 11, 2023

0.2.26

Mar 8, 2023

0.2.25

Feb 27, 2023

0.2.24

Feb 11, 2023

0.2.23

Nov 4, 2022

0.2.22

Nov 2, 2022

0.2.20

Oct 14, 2022

0.2.19

Sep 30, 2022

0.2.18

Sep 22, 2022

0.2.17

Aug 25, 2022

0.2.16

Aug 23, 2022

0.2.15

Aug 17, 2022

0.2.14

Jul 22, 2022

0.2.13

Jun 30, 2022

0.2.12

Jun 17, 2022

0.2.11

May 7, 2022

0.2.10

Apr 6, 2022

0.2.9

Mar 24, 2022

0.2.8

Mar 22, 2022

0.2.7

Feb 17, 2022

0.2.6

Feb 14, 2022

0.2.5

Feb 9, 2022

0.2.4

Feb 7, 2022

0.2.3

Feb 3, 2022

0.2.2

Jan 26, 2022

0.2.1

Jan 20, 2022

0.2.0

Jan 6, 2022

0.1.9

Dec 15, 2021

0.1.8

Dec 13, 2021

0.1.7

Dec 8, 2021

0.1.6

Nov 24, 2021

0.1.5

Nov 24, 2021

0.1.4

Nov 19, 2021

0.1.3

Oct 9, 2021

0.1.2

Sep 30, 2021

0.1.1

Sep 24, 2021

0.1.0

Aug 24, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_datasets-1.0.21.tar.gz (91.6 kB view details)

Uploaded Mar 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vision_datasets-1.0.21-py3-none-any.whl (144.7 kB view details)

Uploaded Mar 24, 2025 Python 3

File details

Details for the file vision_datasets-1.0.21.tar.gz.

File metadata

Download URL: vision_datasets-1.0.21.tar.gz
Upload date: Mar 24, 2025
Size: 91.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for vision_datasets-1.0.21.tar.gz
Algorithm	Hash digest
SHA256	`064a49ee7fd46b4121333dc572e1bdad6d8da8fc500c9e824d08a74bb332717a`
MD5	`9f3aa0385b6bd3a3ef18acad5e9ff731`
BLAKE2b-256	`f618b00d55988bd914e620b1708289f9b863b99fbdcef6aee21ba7fb43703ed8`

See more details on using hashes here.

File details

Details for the file vision_datasets-1.0.21-py3-none-any.whl.

File metadata

Download URL: vision_datasets-1.0.21-py3-none-any.whl
Upload date: Mar 24, 2025
Size: 144.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for vision_datasets-1.0.21-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9713a093e5f666386872be440feb6af5057104b1a5a47262cc6ef234a6a41ff6`
MD5	`0fd33a38169d352e918b05763ff6ebee`
BLAKE2b-256	`c96052ce98fe96fc9cb6fdec8a164f6f4c2b8394b92a6bcc0779dba4fd055855`

See more details on using hashes here.

vision-datasets 1.0.21

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Vision Datasets

Introduction

Dataset Contracts

Creating DatasetManifest

Creating KeyValuePairDatasetManifest

Loading IC/OD/VQA Datasets in KeyValuePair (KVP) Format:

Coco format

Iris format

Dataset management and access

Operations on manifests {#oom}

Training with PyTorch

Helpful commands

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes