Easily turn a set of image urls to an image dataset
Project description
img2dataset
Easily turn a set of image urls to an image dataset
Install
pip install img2dataset
Usage
First get some image url list. For example:
echo 'https://placekitten.com/200/305' >> myimglist.txt
echo 'https://placekitten.com/200/304' >> myimglist.txt
echo 'https://placekitten.com/200/303' >> myimglist.txt
Then, run the tool:
img2dataset --url_list=myimglist.txt --output_folder=output_folder --thread_count=64 --image_size=256
The tool will then automatically download the urls, resize them, and store them with that format:
- output_folder
- 0
- 0.jpg
- 1.jpg
- 2.jpg
- 0
with each number being the position in the list. The subfolders avoids having too many files in a single folder.
This can then easily be fed into machine learning training or any other use case.
API
This module exposes a single function download
which takes the same arguments as the command line tool:
- url_list A file with the list of url of images to download, one by line (required)
- image_size The side to resize image to (default 256)
- output_folder The path to the output folder (default "images")
- thread_count The number of threads used for downloading the pictures. This is important to be high for performance. (default 256)
- resize_mode The way to resize pictures, can be no, border or keep_ratio (default border)
- no doesn't resize at all
- border will make the image image_size x image_size and add a border
- keep ratio will keep the ratio and make the smallest side of the picture image_size
- resize_only_if_bigger resize pictures only if bigger that the image_size (default False)
Road map
This tool work as it. However in the future goals will include:
- WebDataset format option
- support for multiple input files
- support for csv or parquet files as input
- benchmarks for 1M, 10M, 100M pictures
For development
Either locally, or in gitpod (do export PIP_USER=false
there)
Setup a virtualenv:
python3 -m venv .env
source .env/bin/activate
pip install -e .
to run tests:
pip install -r requirements-test.txt
then
python -m pytest -v tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for img2dataset-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b77bc4c66c524b46a34ca6a86ae8c7d547459bb5ceb367e7a2501dc2bc9268af |
|
MD5 | 007dba96334ad10a723586867086d176 |
|
BLAKE2b-256 | 341d4eff747a9b44f9f71ae76698a1e49bdbf4184764825c6a3dfe895ed90463 |