Skip to main content

A web bot to scrape images from websites.

Project description

A web bot to scrape images from websites.

Features

  • Supported platform: Linux / Python 2.x.

  • Uses scrapy web crawling framework.

  • Maintains a database of all downloaded images to avoid duplicate downloads.

  • Optionally, it can scrape only under a particular url, e.g. scraping “http://website.com/albums/new” with this option will only download from new album.

  • You can specify minimum image size to be downloaded.

  • Live monitor window for displaying images as they are scraped.

Usage

  1. Scrape images from http://website.com:

    imagebot http://website.com
  2. Scrape images from http://website.com while allowing images from a cdn such as amazonaws.com (add multiple domains with comma separated list):

    imagebot http://website.com -d amazonaws.com
  3. Specify minimum size of image to be downloaded (width x height):

    imagebot http://website.com -s 300x300
  4. Stay under http://website.com/albums/new:

    imagebot http://website.com/albums/new -u http://website.com/albums/new
  5. Launch monitor windows for live images:

    imagebot http://website.com -m
  6. Set user-agent:

    imagebot http://website.com -a "my_imagebot(http://mysite.com)"
  7. For more options, get help:

    wallp -h

Dependencies

  1. python-gi (Python GObject Introspection API)

    On Ubuntu:

    apt-get install python-gi
  2. scrapy (a powerful web crawling framework)

    It will be automatically installed by pip.

  3. Pillow (Python Imaging Library)

    It will be automatically installed by pip.

Download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imagebot-1.0.1.tar.gz (11.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page