Skip to main content

Utilities for building and working with computer vision datasets

Project description

xt-cvdata

Description

This repo contains utilities for building and working with computer vision datasets, developed by Xtract AI.

So far, APIs for the following open-source datasets are included:

  1. COCO 2017 (detection and segmentation): xt_cvdata.apis.COCO
  2. Open Images V5 (detection and segmentation): xt_cvdata.apis.OpenImages
  3. Visual Object Tagging Tool (VoTT) CSV output (detection): xt_cvdata.apis.VoTTCSV

More to come.

Installation

From PyPI:

pip install xt-cvdata

From source:

git clone https://github.com/XtractTech/xt-cvdata.git
pip install ./xt-cvdata

Usage

See specific help on a dataset class using help. E.g., help(xt_cvdata.apis.COCO).

Building a dataset

from xt_cvdata import COCO, OpenImages

# Build an object populated with the COCO image list, categories, and annotations
coco = COCO('/nasty/data/common/COCO_2017')
print(coco)
print(coco.class_distribution)

# Same for Open Images
oi = OpenImages('/nasty/data/common/open_images_v5')
print(oi)
print(coco.class_distribution)

# Get just the person classes
coco.subset(['person'])
oi.subset(['Person']).rename({'Person': 'person'})

# Merge and build
merged = coco.merge(oi)
merged.build('./data/new_dataset_dir')

This package follows pytorch chaining rules, meaning that methods operating on an object modify it in-place, but also return the modified object. The exception is the merge() method which does not modify in-place and returns a new merged object. Hence, the above operations can also be completed using:

from xt_cvdata import COCO, OpenImages

merged = (
    COCO('/nasty/data/common/COCO_2017')
        .subset(['person'])
        .merge(
            OpenImages('/nasty/data/common/COCO_2017')
                .subset(['Person'])
                .rename({'Person': 'person'})
        )
)
merged.build('./data/new_dataset_dir')

In practice, somewhere between the two approaches will probably be most readable.

The current set of dataset operations are:

  • analyze: recalculate dataset statistics (e.g., class distributions, train/val split)
  • verify_schema: check if class attributes follow required schema
  • subset: remove all but a subset of classes from the dataset
  • rename: rename/combine dataset classes
  • sample: sample a specified number of images from the train and validation sets
  • split: define the proportion of data in the validation set
  • merge: merge two datasets together
  • build: create the currently defined dataset using either symlinks or by copying images

Implementing a new dataset type

New dataset types should inherit from the base xt_cvdata.Builder class. See the Builder, COCO and OpenImages classes as a guide. Specifically, the class initializer should define info, licenses, categories, annotations, and images attributes such that self.verify_schema() runs without error. This ensures that all of the methods defined in the Builder class will operate correctly on the inheriting class.

Data Sources

[descriptions and links to data]

Dependencies/Licensing

[list of dependencies and their licenses, including data]

References

[list of references]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xt-cvdata-0.3.0.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

xt_cvdata-0.3.0-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file xt-cvdata-0.3.0.tar.gz.

File metadata

  • Download URL: xt-cvdata-0.3.0.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for xt-cvdata-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a098247c2af19f19ca18b62bf7e2ec0f028c031c8363dbf3607e1f351b95e4bc
MD5 0fd76281f96bec48504cf24634f8cb61
BLAKE2b-256 77fd26d2356d9fc92f8b4004e5836add8fd2bd01db7305cbe89a5b39398ba1c7

See more details on using hashes here.

File details

Details for the file xt_cvdata-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: xt_cvdata-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for xt_cvdata-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3f9534f0afb42048c674fc3483eb9baf0110b2cf55a8c1e5c6d59bd4fdb1efa
MD5 f957e7961154dfdeaa0ba7129bfbc8c9
BLAKE2b-256 dc1a680db02ca63c4ecd222ca7f9e0efa00e7c6d9f4e53ccfa7769a1118d7590

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page