Scraping library to retrieve data from useful pages, such as Amazon wishlists
Project description
Scraping library to retrieve data from useful pages, such as Amazon wishlists
API
The API to use the library, scrape data and manage spiders is the following:
scrape(SPIDER_NAME, URL): scrapes the given URL using the spider referenced on SPIDER_NAME.
spiders(): list all spiders found by the library.
Custom Spiders
Using custom spiders is possible, as long as they:
They must be implemented as a class, and inherit from BaseSpider.
The spider file need to be either on scraper_factory/spiders, or in a custom location, as long as the environment variable $SPIDER_PATH is set to the directory where the spider is located.
Usage example
>>> import scraper_factory as SF
>>> SF.scrape('amazon-wishlist', 'https://www.amazon.com/hz/wishlist/ls/24XY9873RPAYN')
[{
'id': 'I1MZVK8RDPYK8P',
'title': 'AmazonBasics Heavy Weight Ruled Lined Index Cards, White, 3x5 Inch Card, 100-Count - AMZ63500',
'byline': None,
'price': None,
'link': 'https://www.amazon.com/dp/B06XSRLP51/',
'img': 'https://images-na.ssl-images-amazon.com/images/I/71i7LVTzpsL._SS135_.jpg'
}, {
'id': 'I14TUJ6TADACU5',
'title': "Women's Walking Shoes Sock Sneakers - Mesh Slip On Air Cushion Lady Girls Modern Jazz Dance Easy Shoes Platform Loafers",
'byline': None,
'price': None,
'link': 'https://www.amazon.com/dp/B07MWCDJ9X/',
'img': 'https://images-na.ssl-images-amazon.com/images/I/61sHA7o-bxL._SS135_.jpg'
}, {
'id': 'I3C97JA2JR06PN',
'title': 'Tenergy Redigrill\xa0Smoke-Less Infrared Grill, Indoor Grill, Heating\xa0Electric Tabletop Grill, Non-Stick Easy to Clean\xa0BBQ Grill, for Party/Home, ETL Certified',
'byline': None,
'price': '$179.99',
'link': 'https://www.amazon.com/dp/B07BZ412HT/',
'img': 'https://images-na.ssl-images-amazon.com/images/I/41uGvSPg-ML._SS135_.jpg'
}, {
'id': 'I1C7RJI2H0VWZ7',
'title': 'Shelf Liners for Wire Shelf Liner Set of 4 - Graphite (14-Inch-by-36-Inch)',
'byline': None,
'price': '$29.99',
'link': 'https://www.amazon.com/dp/B01N9V4A9A/',
'img': 'https://images-na.ssl-images-amazon.com/images/I/71Lg6J7sGHL._SS135_.jpg'
},
...]
Installation
Latest release through PyPI:
$ pip install scraper_factory
Development version:
$ git clone git@github.com:machinia/scraper-factory.git
$ cd scraper_factory
$ pip install -e .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scraper_factory-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2de92b5a97c6f9b96a2f02384570b857902d0f9a0edc060eeaf874886cc0cbb2 |
|
MD5 | 97b08676b43d6d403b962516a7116dac |
|
BLAKE2b-256 | a53ba65e99a1bf8bc5b64929a2b424ab0247b27953c57c60193f8eb43619cab4 |