Skip to main content

Asynchronously download Google Images search results

Project description

Python Google Images Downloader

This tool lets you download results from Google Images pretty fast. It uses aiohttp under the hood. It is essentially a Python 3.7 rewrite of google-images-downloader without the restriction of not using external dependencies.

Installation

The following should work to install the utility and check that it will run:

pip install pygidl
pygidl -h

Basic Usage

From the command line:

basic usage is something like pygidl "cats and dogs". This will create an output directory in your current working directory named for the timestamp that the command was run. Underneath that directory will be another directory with the slugified version of your query string. Underneath that directory will be the downloaded images, named by their sha256 hashes and file type extensions:

pygidl "cats and dogs"
tree .
.
└── 2020-01-21-17-57-34
    └── cats-and-dogs
        ├── 01d2dde343a45e3a1fcc5e7cd3cace33398c9b06a97e494d4329f264e57d5f57.jpg
        ├── 026cef34db26cbd5fa246bc720c1234b39ffa07737e43523b160390c13d5d3e6.jpeg
        ├── 03a0f2ebeed5d91acaed73ad303bd724767e101688e33d1a5557cca9139972d7.webp
        ├── 0affcf3198b40063e9302c4515380d6796946098c7c1c3c043072815e29e2770.jpeg
        ...

Advanced Usage

The command can be configured to support several more complex query scenarios:

Verbose Output

Use the -v/--verbose flag to change the log level to show more messages. The log level is "WARNING" by default. Supplying -v once sets it to "INFO", and two or more times sets it to "DEBUG".

Prefixes and Suffixes

The -p/--prefix and -s/--suffix flags can be used to run multiple copies of the same Google Images query with extra prefix or suffix strings. For example:

pygidl -p Andorra -p Angola -s "on a ship" -s "on a plane" flag
tree -d .
.
└── 2020-01-21-18-15-59
    ├── andorra-flag-on-a-plane
    ├── andorra-flag-on-a-ship
    ├── angola-flag-on-a-plane
    └── angola-flag-on-a-ship

5 directories

Output Groups

You can override the name of the directory that contains the results of each query with the -g/--group flag. For example:

pygidl -g "Cute Animals" -p fluffy -p adorable -s dog -s cat "" -v
tree . -d
.
└── cute-animals
    ├── adorable-cat
    ├── adorable-dog
    ├── fluffy-cat
    └── fluffy-dog

5 directories

Face Search

You can tell Google Images to find faces with the -f/--face flag. For example:

pygidl -g "Tom Hanks" -p "" -p young "Tom Hanks" -s "" -s "Oscars" -f -v
tree -d .
.
└── tom-hanks
    ├── tom-hanks
    ├── tom-hanks-oscars
    ├── young-tom-hanks
    └── young-tom-hanks-oscars

5 directories

Programmatic Usage

Something like the following should work (assuming you have opencv-python installed in your environment):

import asyncio
import os

import cv2

from pygidl import scrape_google_images


downloaded_image_paths = asyncio.run(
    scrape_google_images(
        base_query="cats and dogs",
        prefixes=["cute", "adorable"],
        suffixes=["playing", "running"],
        group="cute-animals",
        output_dir=os.getcwd(),
        face=False,
    )
)
for path in downloaded_image_paths:
    image = cv2.imread(path)
    if image is None:
        print(f"could not load image {path}")
        continue
    height, width = image.shape[:2]
    print(f"image {path} has size {width}x{height}")

Known Issues and Limitations

  • Only returns max of 100 results per query
  • Doesn't support full range of advanced search options
  • No tests
  • No retries
  • No report on results/metadata output option
  • Sometimes Google returns results from a different template without easily-parseable metadata

Contributing

I don't think anyone will ever get this far, but if you want to open a pull request (or even better, take over ownership of the project for me!), go for it. At a minimum, new code should have type hints, docstrings, and be auto-formatted with black with an 80-character max line length. Even better would be some tests and Sphinx documentation!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygidl-0.0.2.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygidl-0.0.2-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file pygidl-0.0.2.tar.gz.

File metadata

  • Download URL: pygidl-0.0.2.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for pygidl-0.0.2.tar.gz
Algorithm Hash digest
SHA256 189f381812afecdefdcf80d2681f6ed6537b4d7ec573f6a135cacb8405e186b2
MD5 3591e470d7b11a410f1fee9b9ee2d715
BLAKE2b-256 9e5bcb570af29d4164a3f07f3d168afac1aefc6d15f8fb3af7983e104feb16ef

See more details on using hashes here.

File details

Details for the file pygidl-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pygidl-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for pygidl-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ccec6f1eeea507994a795e4d7528ca09d50777effb51f39ab145a4f7b080fc19
MD5 4f333def053aabb8420f65f03f8e6b3c
BLAKE2b-256 269902a6251af4657231dbd8c68c0d2d9e5a41dfb9301c8396e7dfe9af88e066

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page