Skip to main content

A web bot to scrape images from websites.

Project description

A web bot to scrape images from websites.

Features

  • Supported platform: Linux (+Gnome) / Python 2.x.

  • Uses scrapy web crawling framework.

  • Maintains a database of all downloaded images to avoid duplicate downloads.

  • Optionally, it can scrape only under a particular url, e.g. scraping “http://website.com/albums/new” with this option will only download from new album.

  • You can specify minimum image size to be downloaded.

  • Scrapes through javascript popup links.

  • Live monitor window for displaying images as they are scraped.

Usage

  1. Scrape images from http://website.com:

    imagebot http://website.com
  2. Scrape images from http://website.com while allowing images from a cdn such as amazonaws.com (add multiple domains with comma separated list):

    imagebot http://website.com -d amazonaws.com
  3. Specify minimum size of image to be downloaded (width x height):

    imagebot http://website.com -s 300x300
  4. Stay under http://website.com/albums/new:

    imagebot http://website.com/albums/new -u http://website.com/albums/new
  5. Launch monitor windows for live images:

    imagebot http://website.com -m
  6. Set user-agent:

    imagebot http://website.com -a "my_imagebot(http://mysite.com)"
  7. For more options, get help:

    imagebot -h

Dependencies

  1. python-gi (Python GObject Introspection API)

    On Ubuntu:

    apt-get install python-gi
  2. scrapy (a powerful web crawling framework)

    It will be automatically installed by pip.

  3. Pillow (Python Imaging Library)

    It will be automatically installed by pip.

Download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imagebot-1.0.2.tar.gz (11.9 kB view details)

Uploaded Source

File details

Details for the file imagebot-1.0.2.tar.gz.

File metadata

  • Download URL: imagebot-1.0.2.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for imagebot-1.0.2.tar.gz
Algorithm Hash digest
SHA256 ea091b734269c896967a159a7de48c8aee8d351d2ac3ab335dc9cd42aa548551
MD5 c02cb8d652b511bbdce85796f2947b43
BLAKE2b-256 c80355a8b1cd1832cc992755b412b0d4ec2d8432922422a1bb2694c660fcbd0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page