Skip to main content

Download hundreds of images from Google. Do image post processing later.

Project description

Easy Images

This repo contains the Python script that can let you download the images from Google for the given keyword. Also, there are some additional functionalities added that can help in post-image processing.

Preparing the image dataset which is not publicly available, is still a challenging task. Machine Learning Engineers need image data when building something of a Computer Vision. But due to the non-availability of the data, they are left with nothing but 2 choices - either to drop the idea or postpone it until some data is available. And manually downloading the images from Google could take forever.

With this Python script, you can easily download hundreds of images from Google within a couple of minutes and try out your Computer Vision idea. You can also remove duplicate images while downloading or later.

Features

  • Download hundreds of images within couple of minutes with one go.
  • Remove duplicate images while downloading.
  • Provide the summary of the download.
  • Remove duplicate images (later) irrespective of the image size or resolution.
  • Resize all the images in a directory.
  • Convert all the images in a directory, into grayscale.
  • Calculate average image size of all the images in a directory.
  • Run above 3 post processing operations just in one go.

Getting Started

Prerequisites

Require Python >= 3.8

Installation

Using Github repo

  1. Clone the repo using git clone https://github.com/mohdsaqibhbi/easy_images.git.
  2. Install the dependencies by running pip3 install -r requirements.txt.

Using pip

pip3 install easy-images-downloader

Usage

  • To download images from Google.
from easy_images.easy_images import EasyImages

keywords = "dogs, cats, horse"

easy_response = EasyImages()
easy_response.download(keywords=keywords, max_limit=100)
  • Post processing on all the images in a directory, e.g removing duplicates images.
from easy_images.easy_images import EasyImages

image_dir = "easy_images/dogs"

easy_response = EasyImages()
easy_response.post_processing(image_dir=image_dir, remove_duplicates=True)

Parameters

  • Class initialization

    easy_response = EasyImages(browser_name="chrome", headless=True, loading_timeout=2)

    • browser_name : (str), {"chrome", "brave"}, default="chrome"

      The browser to use.

    • headless : (boolean), default=True

      While downloading, whether to open browser or not. Set headless=False to open browser.

    • loading_timeout : (float), default=2

      Page loading timeout. Less for fast and more for slow internet.

  • Download images

    easy_response.download(keywords, output_dir="easy_images_dir", max_limit=10, image_formats={".jpg", ".jpeg", ".png"}, remove_duplicates=False)

    • keywords : (str / dict), e.g. "dogs, cats" or {"dogs": 100, "cats": 200}, default=Required

      Keywords for which images will be downloaded.

    • output_dir : (str), default="easy_images_dir"

      Output directory where images will be downloaded for each keyword.

    • max_limit : (int), default=10

      Maximum number of images to download.

    • image_formats : (set), default={".jpg", ".jpeg", ".png"}

      Supported image formats.

    • remove_duplicates : (boolean), default=False

      Whether to remove duplicate images or not while downloading. Set remove_duplicates=True to remove duplicates.

  • Post processing on images

    easy_response.post_processing(image_dir, remove_duplicates=False, resize=None, grayscale=False, avg_image_size=False)

    • image_dir : (str), e.g. "easy_images/dogs", default=Required

      Directory name from where duplicate images need to be removed.

    • remove_duplicates : (boolean), default=False

      Whether to remove duplicate images from a directory. Set remove_duplicates=True to remove.

    • resize : (tuple), e.g (200 x 200), default=None

      Image size to resize. If resize is equal to tuple of int, resize the images.

    • grayscale : (boolean), default=False

      Whether to convert images in a directory, into grayscale. Set grayscale=True to convert.

    • avg_image_size : (boolean), default=False

      Whether to calculate average image size of all the images in a directory. Set avg_image_size=True to calculate.

Limitations

Note: This script/package Will not work in Colab.

This scripts download the images with size approximately 200 x 200. This is because Google allows to download the images with rendered size only. Only few images can be downloaded with original image size. The original urls of the image are encrypted and with the encryption, image size is changed to a particular size which is lesser than the original image size.

Please share your ideas to overcome these limitations. Let's together build a beautiful python script that can help lots of people.

Next Steps

Following the next steps to improve the script:

  • Find a method to download the images with original size.
  • Build the script without selenium for fast downloading. Selenium is a bit slower.
  • Add image similarity factor so that more relevant images can be downloaded.
  • Optimize the overall script with additional functionalities for faster downloading of images.
  • Add some more generic OpenCV functionalities. Please share you ideas if you got some.

Everyone is welcome to contribute to this script. If you want to contribute please write me on Linkedin or Email me.

Disclaimer

This Python script allows you to download hundreds of Google images. Please do not download or use any image whose copyright has been violated. Google indexes images and makes them searchable. It does not create its own images, and as a result, none of them are protected by copyright. The original creator of the image owns the copyrights.

LICENSE

This project is licensed under the terms of the MIT license.

Follow me

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_images_downloader-0.0.6.3.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

easy_images_downloader-0.0.6.3-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file easy_images_downloader-0.0.6.3.tar.gz.

File metadata

  • Download URL: easy_images_downloader-0.0.6.3.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.24.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.49.0 importlib-metadata/4.11.2 keyring/21.4.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.7.4

File hashes

Hashes for easy_images_downloader-0.0.6.3.tar.gz
Algorithm Hash digest
SHA256 ba407f91742e40029f5bd0fe2adc452b0d99d63d8232dc4820f539992b03256d
MD5 cd6f4a67f57dcad1b1d1990037124dd6
BLAKE2b-256 3b4da2c13ca7ba66f74bb1a3f18dbac48617c1237b3aeea2850eda0339b627d8

See more details on using hashes here.

File details

Details for the file easy_images_downloader-0.0.6.3-py3-none-any.whl.

File metadata

  • Download URL: easy_images_downloader-0.0.6.3-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.24.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.49.0 importlib-metadata/4.11.2 keyring/21.4.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.7.4

File hashes

Hashes for easy_images_downloader-0.0.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 aaac821f6c8f25fac56f83df807d8288b73efa7f94d130c25bd4e0a457978b04
MD5 5a8a1d31356d55e1fbb1958b5e1b5f59
BLAKE2b-256 8bde327f5d732685c1c56eab1ba2c59c1d15b9c018bb992bc40156f32f973c34

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page