Skip to main content

A web bot to scrape images from websites.

Project description

A web bot to scrape images from websites.

Features

  • Supported platform: Linux / Python 2.x.

  • Uses scrapy web crawling framework.

  • Maintains a database of all downloaded images to avoid duplicate downloads.

  • Optionally, it can scrape only under a particular url, e.g. scraping “http://website.com/albums/new” with this option will only download from new album.

  • You can specify minimum image size to be downloaded.

  • Live monitor window for displaying images as they are scraped.

Usage

  1. Scrape images from http://website.com:

    imagebot http://website.com
  2. Scrape images from http://website.com while allowing images from a cdn such as amazonaws.com (add multiple domains with comma separated list):

    imagebot http://website.com -d amazonaws.com
  3. Specify minimum size of image to be downloaded (width x height):

    imagebot http://website.com -s 300x300
  4. Stay under http://website.com/albums/new:

    imagebot http://website.com/albums/new -u http://website.com/albums/new
  5. Launch monitor windows for live images:

    imagebot http://website.com -m
  6. Set user-agent:

    imagebot http://website.com -a "my_imagebot(http://mysite.com)"
  7. For more options, get help:

    wallp -h

Dependencies

  1. python-gi (Python GObject Introspection API)

    On Ubuntu:

    apt-get install python-gi
  2. scrapy (a powerful web crawling framework)

    It will be automatically installed by pip.

  3. Pillow (Python Imaging Library)

    It will be automatically installed by pip.

Download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imagebot-1.0.tar.gz (11.8 kB view details)

Uploaded Source

File details

Details for the file imagebot-1.0.tar.gz.

File metadata

  • Download URL: imagebot-1.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for imagebot-1.0.tar.gz
Algorithm Hash digest
SHA256 5f0d8b8d3a497a6a07a89c0c30f52d802b85a34a96acc72e2316b87291625723
MD5 0327754c305ad0095b376933c232cc7c
BLAKE2b-256 e1321d0b3e50547778ddc0def04c7ebbea90e2a278df0dd3f0149e3358e98677

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page