Skip to main content

Scraping library to retrieve data from useful pages, such as Amazon wishlists

Project description

Travis Build Test coverage PyPI - Latest version PyPI - Python Version

Scraping library to retrieve data from useful pages, such as Amazon wishlists

API

The API to use the library, scrape data and manage spiders is the following:

  • scrape(SPIDER_NAME, URL): scrapes the given URL using the spider referenced on SPIDER_NAME.

  • spiders(): list all spiders found by the library.

Custom Spiders

Using custom spiders is possible, as long as they:

  • They must be implemented as a class, and inherit from BaseSpider.

  • The spider file need to be either on scraper_factory/spiders, or in a custom location, as long as the environment variable $SPIDER_PATH is set to the directory where the spider is located.

Usage example

>>> import scraper_factory as SF
>>> SF.scrape('amazon-wishlist', 'https://www.amazon.com/hz/wishlist/ls/24XY9873RPAYN')
[{
    'id': 'I1MZVK8RDPYK8P',
    'title': 'AmazonBasics Heavy Weight Ruled Lined Index Cards, White, 3x5 Inch Card, 100-Count - AMZ63500',
    'byline': None,
    'price': None,
    'link': 'https://www.amazon.com/dp/B06XSRLP51/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/71i7LVTzpsL._SS135_.jpg'
}, {
    'id': 'I14TUJ6TADACU5',
    'title': "Women's Walking Shoes Sock Sneakers - Mesh Slip On Air Cushion Lady Girls Modern Jazz Dance Easy Shoes Platform Loafers",
    'byline': None,
    'price': None,
    'link': 'https://www.amazon.com/dp/B07MWCDJ9X/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/61sHA7o-bxL._SS135_.jpg'
}, {
    'id': 'I3C97JA2JR06PN',
    'title': 'Tenergy Redigrill\xa0Smoke-Less Infrared Grill, Indoor Grill, Heating\xa0Electric Tabletop Grill, Non-Stick Easy to Clean\xa0BBQ Grill, for Party/Home, ETL Certified',
    'byline': None,
    'price': '$179.99',
    'link': 'https://www.amazon.com/dp/B07BZ412HT/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/41uGvSPg-ML._SS135_.jpg'
}, {
    'id': 'I1C7RJI2H0VWZ7',
    'title': 'Shelf Liners for Wire Shelf Liner Set of 4 - Graphite (14-Inch-by-36-Inch)',
    'byline': None,
    'price': '$29.99',
    'link': 'https://www.amazon.com/dp/B01N9V4A9A/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/71Lg6J7sGHL._SS135_.jpg'
},
...]

Installation

Latest release through PyPI:

$ pip install scraper_factory

Development version:

$ git clone git@github.com:machinia/scraper-factory.git
$ cd scraper_factory
$ pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scraper-factory-0.2.0.tar.gz (9.4 kB view hashes)

Uploaded Source

Built Distribution

scraper_factory-0.2.0-py3-none-any.whl (11.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page