Skip to main content

Image scraper for DuckDuckGo for creating deep learning datasets

Project description

jmd_imagescraper

An image scraping library for creating deep learning datasets.

This library is for creating deep learning datasets.

It uses DuckDuckGo for the image scraping as they have return nice big images and have some rather nice parameters to make your life easier, for example we can filter the searches to only return square images which are photos.

jmd_imagescraper.core contains the main scraping/downloading functionality.
jmd_imagescraper.imagecleaner contains an image cleaner you can use from within your notebook to clean up the results and delete anything unsuitable.

Install

pip install jmd_imagescraper

How to use

from jmd_imagescraper.core import *
from pathlib import Path

root = Path().cwd()/"images"
duckduckgo_search(root, "Puppies", "cute puppies", max_results=10)
Duckduckgo search: cute puppies
Downloading results into C:\Users\Joe\Documents\GitHub\jmd_imagescraper\images\Puppies
<style> /* Turns off some styling */ progress { /* gets rid of default border in Firefox and Opera. */ border: none; /* Needs to be in here for Safari polyfill so background images work as expected. */ background-size: auto; } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style> 100.00% [10/10 00:02<00:00 Images downloaded]
[WindowsPath('C:/Users/Joe/Documents/GitHub/jmd_imagescraper/images/Puppies/001.jpg'),
 WindowsPath('C:/Users/Joe/Documents/GitHub/jmd_imagescraper/images/Puppies/002.jpg'),
 WindowsPath('C:/Users/Joe/Documents/GitHub/jmd_imagescraper/images/Puppies/003.jpg'),
 WindowsPath('C:/Users/Joe/Documents/GitHub/jmd_imagescraper/images/Puppies/004.jpg'),
 WindowsPath('C:/Users/Joe/Documents/GitHub/jmd_imagescraper/images/Puppies/005.jpg'),
 WindowsPath('C:/Users/Joe/Documents/GitHub/jmd_imagescraper/images/Puppies/006.jpg'),
 WindowsPath('C:/Users/Joe/Documents/GitHub/jmd_imagescraper/images/Puppies/007.jpg'),
 WindowsPath('C:/Users/Joe/Documents/GitHub/jmd_imagescraper/images/Puppies/008.jpg'),
 WindowsPath('C:/Users/Joe/Documents/GitHub/jmd_imagescraper/images/Puppies/009.jpg'),
 WindowsPath('C:/Users/Joe/Documents/GitHub/jmd_imagescraper/images/Puppies/010.jpg')]
from jmd_imagescraper.imagecleaner import *
display_image_cleaner(root)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jmd_imagescraper-0.0.1.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jmd_imagescraper-0.0.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file jmd_imagescraper-0.0.1.tar.gz.

File metadata

  • Download URL: jmd_imagescraper-0.0.1.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.2

File hashes

Hashes for jmd_imagescraper-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f4c9cd49738d25bb1cf7016083108ca41bc313e2d0870361552716942f00fe91
MD5 a8ce9f4162d207152ee0d2ca9a81a9fa
BLAKE2b-256 cbe80d092d98c10032f065b4c170ef5c066d28513cfdb7847b387f5dd280f0cc

See more details on using hashes here.

File details

Details for the file jmd_imagescraper-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: jmd_imagescraper-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.2

File hashes

Hashes for jmd_imagescraper-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0258ffba5f99cf827ea6c343f32b5a294d9ce4318d9d4d71bfc87e2268add0e0
MD5 159e33ab0ae6ecdbae5667d1f0567250
BLAKE2b-256 a9ed4a85d715d5d6b35b6cc4e1a78abdefa4e165fd24ae6aeacc9fd4690a7949

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page