Download hundreds of images from Google. Do image post processing later.
Project description
Easy Images
This repo contains the Python script that can let you download the images from Google for the given keyword. Also, there are some additional functionalities added that can help in post-image processing.
Preparing the image dataset which is not publicly available, is still a challenging task. Machine Learning Engineers need image data when building something of a Computer Vision. But due to the non-availability of the data, they are left with nothing but 2 choices - either to drop the idea or postpone it until some data is available. And manually downloading the images from Google could take forever.
With this Python script, you can easily download hundreds of images from Google within a couple of minutes and try out your Computer Vision idea. You can also remove duplicate images while downloading or later.
Features
- Download hundreds of images within couple of minutes with one go.
- Remove duplicate images while downloading.
- Provide the summary of the download.
- Remove duplicate images (later) irrespective of the image size or resolution.
- Resize all the images in a directory.
- Convert all the images in a directory, into grayscale.
- Calculate average image size of all the images in a directory.
- Run above 3 post processing operations just in one go.
Getting Started
Prerequisites
Require Python >= 3.8
Installation
Using Github repo
- Clone the repo using
git clone https://github.com/mohdsaqibhbi/easy_images.git
. - Install the dependencies by running
pip3 install -r requirements.txt
.
Using pip
pip3 install easy-images-downloader
Usage
- To download images from Google.
from easy_images.easy_images import EasyImages
keywords = "dogs, cats, horse"
easy_response = EasyImages()
easy_response.download(keywords=keywords, max_limit=100)
- Post processing on all the images in a directory, e.g removing duplicates images.
from easy_images.easy_images import EasyImages
image_dir = "easy_images/dogs"
easy_response = EasyImages()
easy_response.post_processing(image_dir=image_dir, remove_duplicates=True)
Parameters
-
Class initialization
easy_response = EasyImages(browser_name="chrome", headless=True, loading_timeout=2)
-
browser_name : (str), {"chrome", "brave"}, default="chrome"
The browser to use.
-
headless : (boolean), default=True
While downloading, whether to open browser or not. Set headless=False to open browser.
-
loading_timeout : (float), default=2
Page loading timeout. Less for fast and more for slow internet.
-
-
Download images
easy_response.download(keywords, output_dir="easy_images_dir", max_limit=10, image_formats={".jpg", ".jpeg", ".png"}, remove_duplicates=False)
-
keywords : (str / dict), e.g. "dogs, cats" or {"dogs": 100, "cats": 200}, default=Required
Keywords for which images will be downloaded.
-
output_dir : (str), default="easy_images_dir"
Output directory where images will be downloaded for each keyword.
-
max_limit : (int), default=10
Maximum number of images to download.
-
image_formats : (set), default={".jpg", ".jpeg", ".png"}
Supported image formats.
-
remove_duplicates : (boolean), default=False
Whether to remove duplicate images or not while downloading. Set remove_duplicates=True to remove duplicates.
-
-
Post processing on images
easy_response.post_processing(image_dir, remove_duplicates=False, resize=None, grayscale=False, avg_image_size=False)
-
image_dir : (str), e.g. "easy_images/dogs", default=Required
Directory name from where duplicate images need to be removed.
-
remove_duplicates : (boolean), default=False
Whether to remove duplicate images from a directory. Set remove_duplicates=True to remove.
-
resize : (tuple), e.g (200 x 200), default=None
Image size to resize. If resize is equal to tuple of int, resize the images.
-
grayscale : (boolean), default=False
Whether to convert images in a directory, into grayscale. Set grayscale=True to convert.
-
avg_image_size : (boolean), default=False
Whether to calculate average image size of all the images in a directory. Set avg_image_size=True to calculate.
-
Limitations
Note: This script/package Will not work in Colab.
This scripts download the images with size approximately 200 x 200. This is because Google allows to download the images with rendered size only. Only few images can be downloaded with original image size. The original urls of the image are encrypted and with the encryption, image size is changed to a particular size which is lesser than the original image size.
Please share your ideas to overcome these limitations. Let's together build a beautiful python script that can help lots of people.
Next Steps
Following the next steps to improve the script:
- Find a method to download the images with original size.
- Build the script without selenium for fast downloading. Selenium is a bit slower.
- Add image similarity factor so that more relevant images can be downloaded.
- Optimize the overall script with additional functionalities for faster downloading of images.
- Add some more generic OpenCV functionalities. Please share you ideas if you got some.
Everyone is welcome to contribute to this script. If you want to contribute please write me on Linkedin or Email me.
Disclaimer
This Python script allows you to download hundreds of Google images. Please do not download or use any image whose copyright has been violated. Google indexes images and makes them searchable. It does not create its own images, and as a result, none of them are protected by copyright. The original creator of the image owns the copyrights.
LICENSE
This project is licensed under the terms of the MIT license.
Follow me
- Follow me on Linkedin: mohdsaqibhbi
- Subscribe my Youtube channel: StarrAI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file easy_images_downloader-0.0.6.3.tar.gz
.
File metadata
- Download URL: easy_images_downloader-0.0.6.3.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.24.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.49.0 importlib-metadata/4.11.2 keyring/21.4.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba407f91742e40029f5bd0fe2adc452b0d99d63d8232dc4820f539992b03256d |
|
MD5 | cd6f4a67f57dcad1b1d1990037124dd6 |
|
BLAKE2b-256 | 3b4da2c13ca7ba66f74bb1a3f18dbac48617c1237b3aeea2850eda0339b627d8 |
File details
Details for the file easy_images_downloader-0.0.6.3-py3-none-any.whl
.
File metadata
- Download URL: easy_images_downloader-0.0.6.3-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.24.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.49.0 importlib-metadata/4.11.2 keyring/21.4.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aaac821f6c8f25fac56f83df807d8288b73efa7f94d130c25bd4e0a457978b04 |
|
MD5 | 5a8a1d31356d55e1fbb1958b5e1b5f59 |
|
BLAKE2b-256 | 8bde327f5d732685c1c56eab1ba2c59c1d15b9c018bb992bc40156f32f973c34 |