Skip to main content
Python Software Foundation 20th Year Anniversary Fundraiser  Donate today!

Scraping library to retrieve data from useful pages, such as Amazon wishlists

Project description

Travis Build Test coverage PyPI - Latest version PyPI - Python Version

Scraping library to retrieve data from useful pages, such as Amazon wishlists

API

The API to use the library, scrape data and manage spiders is the following:

  • scrape(SPIDER_NAME, URL): scrapes the given URL using the spider referenced on SPIDER_NAME.
  • spiders(): list all spiders found by the library.

Custom Spiders

Using custom spiders is possible, as long as they:

  • They must be implemented as a class, and inherit from BaseSpider.
  • The spider file need to be either on scraper_factory/spiders, or in a custom location, as long as the environment variable $SPIDER_PATH is set to the directory where the spider is located.

Usage example

>>> import scraper_factory as SF
>>> SF.scrape('amazon-wishlist', 'https://www.amazon.com/hz/wishlist/ls/24XY9873RPAYN')
[{
    'id': 'I1MZVK8RDPYK8P',
    'title': 'AmazonBasics Heavy Weight Ruled Lined Index Cards, White, 3x5 Inch Card, 100-Count - AMZ63500',
    'byline': None,
    'price': None,
    'link': 'https://www.amazon.com/dp/B06XSRLP51/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/71i7LVTzpsL._SS135_.jpg'
}, {
    'id': 'I14TUJ6TADACU5',
    'title': "Women's Walking Shoes Sock Sneakers - Mesh Slip On Air Cushion Lady Girls Modern Jazz Dance Easy Shoes Platform Loafers",
    'byline': None,
    'price': None,
    'link': 'https://www.amazon.com/dp/B07MWCDJ9X/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/61sHA7o-bxL._SS135_.jpg'
}, {
    'id': 'I3C97JA2JR06PN',
    'title': 'Tenergy Redigrill\xa0Smoke-Less Infrared Grill, Indoor Grill, Heating\xa0Electric Tabletop Grill, Non-Stick Easy to Clean\xa0BBQ Grill, for Party/Home, ETL Certified',
    'byline': None,
    'price': '$179.99',
    'link': 'https://www.amazon.com/dp/B07BZ412HT/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/41uGvSPg-ML._SS135_.jpg'
}, {
    'id': 'I1C7RJI2H0VWZ7',
    'title': 'Shelf Liners for Wire Shelf Liner Set of 4 - Graphite (14-Inch-by-36-Inch)',
    'byline': None,
    'price': '$29.99',
    'link': 'https://www.amazon.com/dp/B01N9V4A9A/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/71Lg6J7sGHL._SS135_.jpg'
},
...]

Installation

Latest release through PyPI:

$ pip install scraper_factory

Development version:

$ git clone git@github.com:machinia/scraper-factory.git
$ cd scraper_factory
$ pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scraper-factory, version 0.2.1
Filename, size File type Python version Upload date Hashes
Filename, size scraper_factory-0.2.1-py3-none-any.whl (12.2 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size scraper-factory-0.2.1.tar.gz (9.4 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page