Skip to main content

A simple image scraper to download all images from a given url

Project description

ImageScraper
============
First python app :D
A simple python script which downloads all images in the given webpage.


Download
--------
tar file:
Grab the latest build using https://pypi.python.org/pypi/ImageScraper

pip install:
$pip install ImageScraper


Usage
-----
Using the tar file:

Extract the contents of the tar file.
Note that ``ImageScraper`` depends on ``lxml``. and ``requests``.
If you run into problems in the compilation of ``lxml`` through ``pip``, install the ``libxml2-dev`` and ``libxslt-dev`` packages on your system.


$cd ImageScraper/image_scraper/
$python __init__.py
$ Enter URL to scrap: https://github.com
$ Found 6 images:
$ How many images do you want ? : 6
$ Done.

If installed using pip:

Open python in terminal.

$python
>>>import image_scraper
Enter URL to scrap: https://github.com
Found 6 images:
How many images do you want ? : 6
Done.


NOTE:
A new folder called "images" will be created in the same place, containing all the downloaded images.

Issues
------

Q.)All images were not downloaded?
It could be that the content was injected into the page via javascript and this scraper doesn't run javascript.


Todo
----
Scraping sites which inject image tags via javascript using PhantomJS or Selenium.

Project details


Release history Release notifications

History Node

2.0.7

History Node

2.0.6

History Node

2.0.5

History Node

2.0.3

History Node

2.0.2

History Node

2.0.1

History Node

2.0.0

History Node

1.1.0

This version
History Node

1.0.4

History Node

1.0.3

History Node

1.0.2

History Node

1.0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
ImageScraper-1.0.4.tar.gz (1.8 kB) Copy SHA256 hash SHA256 Source None May 31, 2014

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page