Bulk-download all thumbnails from an ImageNet synset, with optional rescaling
Project description
Command-line utility for downloading all thumbnail images from an ImageNet synset, optionally rescaling to a different resolution.
Usage
Usage: imagenetscraper [OPTIONS] SYNSET_ID [OUTPUT_DIR] Options: -c, --concurrency INTEGER Number of concurrent downloads (default: 8). -s, --size WIDTH,HEIGHT If specified, images will be rescaled to the given size. -q, --quiet Suppress progress output. -h, --help Show this message and exit. --version Show the version and exit.
If the URL of a synset page looks like:
http://image-net.org/synset?wnid=n00000000 ^^^^^^^^^ SYNSET_ID
SYNSET_ID is the n00000000 part. For example, for the “person, individual, someone, somebody, mortal, soul” synset at http://image-net.org/synset?wnid=n00007846, the corresponding synset id is n00007846.
The default output directory (OUTPUT_DIR) is the current directory.
Examples
To download all thumbnail imagess from the synset mentioned above to the directory “person_images”, run:
imagenetscraper n00007846 person_images
To do the same, but with each thumbnail image rescaled to a width of 256 and a height of 128, add --size 256,128:
imagenetscraper n00007846 person_images --size 256,128
To run in “quiet mode”, suppressing progress output, add --quiet:
imagenetscraper n00007846 person_images --size 256,128 --quiet
By default, imagenetscraper will download 8 images at once. To change this, use --concurrency:
imagenetscraper n00007846 person_images --size 256,128 --concurrency 4
Install
Install Python 3, pip, and a development version of libjpeg. imagenetscraper is tested with Python 3.4-3.7 and libjpeg-turbo 8.
sudo apt-get install python3 python3-pip libjpeg-turbo8-dev
Download and install imagenetscraper with pip.
sudo -H pip3 install imagenetscraper
Citation
If this tool helped with your research, a citation would be appreciated:
@Misc{imagenetscraper, author = {Michael Smith}, title = {imagenetscraper: Bulk-download thumbnails from ImageNet synsets}, howpublished = {\url{https://github.com/spinda/imagenetscraper}}, year = {2017} }
Testing
To run unit tests, use:
python3 setup.py test
License
Copyright (C) 2017-2018 Michael Smith <michael@spinda.net>
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for imagenetscraper-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a59a1c84fcf02cd36c06388f6b858abfece6f76e8503d81cb8b691ba5280598 |
|
MD5 | 374e06bbfe2affe05a8863f37bbfc950 |
|
BLAKE2b-256 | 12f3cd60374c220c2149ec50245be50d0a3d1a4fa176a4244e8d498b596e4c4a |