Skip to main content

A cli tool for building computer vision datasets.

Project description

tiny-data

A rust-based cli tool for building computer vision datasets built with reqwest and tokio.

alt text

You can get a list of the available options by running the command below:

>> tiny-data -h
Usage: tiny-data [OPTIONS]

Options:
  -t, --topics <TOPICS>...   Space-delimited list of image classes
  -n, --nsamples <NSAMPLES>  number of images to download per-class [default: 20]
  -d, --dir <DIR>            name of directory to save to [default: images]
  -h, --help                 Print help

Example:

>> tiny-data --topics bats wombats -n 10 --dir images
>> tree images
images
├── bats
│   ├── 0.jpeg
│   ├── 1.jpeg
│   ├── 2.jpeg
│   ├── 3.jpeg
│   ├── 4.jpeg
│   ├── 5.jpeg
│   ├── 6.jpeg
│   ├── 7.jpeg
│   ├── 8.jpeg
│   └── 9.jpeg
└── wombats
    ├── 0.jpeg
    ├── 1.jpeg
    ├── 2.jpeg
    ├── 3.jpeg
    ├── 4.jpeg
    ├── 5.jpeg
    ├── 6.jpeg
    ├── 7.jpeg
    ├── 8.jpeg
    └── 9.jpeg

Installation

To get started with tiny-data you need to enable the Custom Search API from Google and export the variables SEARCH_ENGINE_ID and CUSTOM_SEARCH_API_KEY to your environment.

Note: google limits the number of requests to 100/day which inherently puts a cap on the number of images you can download.

The package itself can be downloaded from crates.io by running:

cargo install tiny-data

The python bindings for the package can be downloaded from pypi with additional features for post-download filtering using CLIP by running:

pip install tinydata[ml]

Make sure you also install the appropriate version of torch from here if you want to use open clip.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinydata-0.1.0.tar.gz (232.0 kB view hashes)

Uploaded Source

Built Distributions

tinydata-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

tinydata-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl (2.0 MB view hashes)

Uploaded CPython 3.10 macOS 10.12+ x86-64

tinydata-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page