Skip to main content

Bulk-download all thumbnails from an ImageNet synset, with optional rescaling

Project description

Command-line utility for downloading all thumbnail images from an ImageNet synset, optionally rescaling to a different resolution.

Usage

Usage: imagenetscraper [OPTIONS] SYNSET_ID [OUTPUT_DIR]

Options:
  -c, --concurrency INTEGER  Number of concurrent downloads (default: 8).
  -s, --size WIDTH,HEIGHT    If specified, images will be rescaled to the
                             given size.
  -q, --quiet                Suppress progress output.
  -h, --help                 Show this message and exit.
  --version                  Show the version and exit.

If the URL of a synset page looks like:

http://image-net.org/synset?wnid=n00000000
                                 ^^^^^^^^^
                                 SYNSET_ID

SYNSET_ID is the n00000000 part. For example, for the “person, individual, someone, somebody, mortal, soul” synset at http://image-net.org/synset?wnid=n00007846, the corresponding synset id is n00007846.

The default output directory (OUTPUT_DIR) is the current directory.

Examples

To download all thumbnail imagess from the synset mentioned above to the directory “person_images”, run:

imagenetscraper n00007846 person_images

To do the same, but with each thumbnail image rescaled to a width of 256 and a height of 128, add --size 256,128:

imagenetscraper n00007846 person_images --size 256,128

To run in “quiet mode”, suppressing progress output, add --quiet:

imagenetscraper n00007846 person_images --size 256,128 --quiet

By default, imagenetscraper will download 8 images at once. To change this, use --concurrency:

imagenetscraper n00007846 person_images --size 256,128 --concurrency 4

Install

  1. Install Python 3, pip, and a development version of libjpeg. imagenetscraper is tested with Python 3.4-3.7 and libjpeg-turbo 8.

    sudo apt-get install python3 python3-pip libjpeg-turbo8-dev
  2. Download and install imagenetscraper with pip.

    sudo -H pip3 install imagenetscraper

Citation

If this tool helped with your research, a citation would be appreciated:

@Misc{imagenetscraper,
author = {Michael Smith},
title = {imagenetscraper: Bulk-download thumbnails from ImageNet synsets},
howpublished = {\url{https://github.com/spinda/imagenetscraper}},
year = {2017}
}

Testing

To run unit tests, use:

python3 setup.py test

License

Copyright (C) 2017-2018 Michael Smith <michael@spinda.net>

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imagenetscraper-1.0.1.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imagenetscraper-1.0.1-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file imagenetscraper-1.0.1.tar.gz.

File metadata

  • Download URL: imagenetscraper-1.0.1.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for imagenetscraper-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8db85559f9a6ac97c68b56720d63b30e4b1fcbbead4dcec7ce7b6a58c45cf105
MD5 b05c57d8d2383ffe583c4197d35739c6
BLAKE2b-256 8ad4e72482f6ca7394e2986ca2cc661e22352f6f319741eaa37dad531fb66702

See more details on using hashes here.

File details

Details for the file imagenetscraper-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: imagenetscraper-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for imagenetscraper-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9a59a1c84fcf02cd36c06388f6b858abfece6f76e8503d81cb8b691ba5280598
MD5 374e06bbfe2affe05a8863f37bbfc950
BLAKE2b-256 12f3cd60374c220c2149ec50245be50d0a3d1a4fa176a4244e8d498b596e4c4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page