Utilities for building and working with computer vision datasets
Project description
xt-cvdata
Description
This repo contains utilities for building and working with computer vision datasets, developed by Xtract AI.
So far, APIs for the following open-source datasets are included:
- COCO 2017 (detection and segmentation):
xt_cvdata.apis.COCO
- Open Images V5 (detection and segmentation):
xt_cvdata.apis.OpenImages
- Visual Object Tagging Tool (VoTT) CSV output (detection):
xt_cvdata.apis.VoTTCSV
More to come.
Installation
From PyPI:
pip install xt-cvdata
From source:
git clone https://github.com/XtractTech/xt-cvdata.git
pip install ./xt-cvdata
Usage
See specific help on a dataset class using help
. E.g., help(xt_cvdata.apis.COCO)
.
Building a dataset
from xt_cvdata.apis import COCO, OpenImages
# Build an object populated with the COCO image list, categories, and annotations
coco = COCO('/nasty/data/common/COCO_2017')
print(coco)
print(coco.class_distribution)
# Same for Open Images
oi = OpenImages('/nasty/data/common/open_images_v5')
print(oi)
print(coco.class_distribution)
# Get just the person classes
coco.subset(['person'])
oi.subset(['Person']).rename({'Person': 'person'})
# Merge and build
merged = coco.merge(oi)
merged.build('./data/new_dataset_dir')
This package follows pytorch chaining rules, meaning that methods operating on an object modify it in-place, but also return the modified object. The exception is the merge()
method which does not modify in-place and returns a new merged object. Hence, the above operations can also be completed using:
from xt_cvdata.apis import COCO, OpenImages
merged = (
COCO('/nasty/data/common/COCO_2017')
.subset(['person'])
.merge(
OpenImages('/nasty/data/common/COCO_2017')
.subset(['Person'])
.rename({'Person': 'person'})
)
)
merged.build('./data/new_dataset_dir')
In practice, somewhere between the two approaches will probably be most readable.
The current set of dataset operations are:
analyze
: recalculate dataset statistics (e.g., class distributions, train/val split)verify_schema
: check if class attributes follow required schemasubset
: remove all but a subset of classes from the datasetrename
: rename/combine dataset classessample
: sample a specified number of images from the train and validation setssplit
: define the proportion of data in the validation setmerge
: merge two datasets together, returning merged datasetbuild
: create the currently defined dataset using either symlinks or by copying images
Implementing a new dataset type
New dataset types should inherit from the base xt_cvdata.Builder
class. See the Builder
, COCO
and OpenImages
classes as a guide. Specifically, the class initializer should define info
, licenses
, categories
, annotations
, and images
attributes such that self.verify_schema()
runs without error. This ensures that all of the methods defined in the Builder
class will operate correctly on the inheriting class.
Data Sources
[descriptions and links to data]
Dependencies/Licensing
[list of dependencies and their licenses, including data]
References
[list of references]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file xt-cvdata-0.8.0.tar.gz
.
File metadata
- Download URL: xt-cvdata-0.8.0.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1.post20200622 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 269dc4b00dbf9b49f77fe4126487c6934208e0ccea0f2ca6198b7157954b261e |
|
MD5 | 80da2b6e24ae7e01fdbae002040d2f5b |
|
BLAKE2b-256 | 9282778271c95df6768e6f7c317535d36ce7271571c6283ce64966060779452e |
File details
Details for the file xt_cvdata-0.8.0-py3-none-any.whl
.
File metadata
- Download URL: xt_cvdata-0.8.0-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1.post20200622 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe3643272e0f64c27e19d15e05731692a5fb01a7f98ccc8a021b769182e84e37 |
|
MD5 | b5214077d39a950482099178314bc126 |
|
BLAKE2b-256 | d09d022857740aaa0e67a6bf3ae10eb2e479f8c627a7f6b1394d5462910ec40f |