amazon_scraper·PyPI

Provides content not accessible through the standard Amazon API

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Operating System
Programming Language
- Python
Topic

Project description

A Hybrid Web scraper / API client. Supplements the standard Amazon API with web scraping functionality to get extra data. Specifically, product reviews.

Uses the Amazon Simple Product API to provide API accessible data. API search functions are imported directly into the amazon_scraper module.

Parameters are kept the same are in the same style as the underlying API, which in turn uses Bottlenose style parameters. Hence the non-Pythonic parameter names (ItemId).

The AmazonScraper constructor will pass ‘kwargs’ to Bottlenose (via Amazon Simple Product API). Bottlenose supports AWS regions, queries per second limiting, query caching and other nice features. Please view Bottlenose’ API for more information on this.

The latest version of python-amazon-simple-product-api (1.5.0 at time of writing), doesn’t support these arguemnts, only Region. If you require these, please use the latest code from their repository with the following command:

pip install git+https://github.com/yoavaviram/python-amazon-simple-product-api.git#egg=python-amazon-simple-product-api

Caveat

Amazon continually try and keep scrapers from working, they do this by:

A/B testing (randomly receive different HTML).
Huge numbers of HTML layouts for the same product categories.
Changing HTML layouts.
Moving content inside iFrames.

Amazon have resorted to moving more and more content into iFrames which this scraper can’t handle. I envisage a time where most data will be inaccessible without more complex logic.

I’ve spent a long time trying to get these scrapers working and it’s a never ending battle. I don’t have the time to continually keep up the pace with Amazon. If you are interested in improving Amazon Scraper, please let me know (creating an issue is fine). Any help is appreciated.

Installation

pip install amazon_scraper

Examples

All Products All The Time

Create an API instance:

>>> from amazon_scraper import AmazonScraper
>>> amzn = AmazonScraper("put your access key", "secret key", "and associate tag here")

The creation function accepts ‘kwargs’ which are passed to ‘bottlenose.Amazon’ constructor:

>>> from amazon_scraper import AmazonScraper
>>> amzn = AmazonScraper("put your access key", "secret key", "and associate tag here", Region='UK', MaxQPS=0.9, Timeout=5.0)

Search:

>>> import itertools
>>> for p in itertools.islice(amzn.search(Keywords='python', SearchIndex='Books'), 5):
>>>     print p.title
Learning Python, 5th Edition
Python Programming: An Introduction to Computer Science 2nd Edition
Python In A Day: Learn The Basics, Learn It Quick, Start Coding Fast (In A Day Books) (Volume 1)
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Python Cookbook

Lookup by ASIN/ItemId:

>>> p = amzn.lookup(ItemId='B00FLIJJSA')
>>> p.title
Kindle, Wi-Fi, 6" E Ink Display - for international shipment
>>> p.url
http://www.amazon.com/Kindle-Wi-Fi-Ink-Display-international/dp/B0051QVF7A/ref=cm_cr_pr_product_top

Batch Lookups:

>>> for p in amzn.lookup(ItemId='B0051QVF7A,B007HCCNJU,B00BTI6HBS'):
>>>     print p.title
Kindle, Wi-Fi, 6" E Ink Display - for international shipment
Kindle, 6" E Ink Display, Wi-Fi - Includes Special Offers (Black)
Kindle Paperwhite 3G, 6" High Resolution Display with Next-Gen Built-in Light, Free 3G + Wi-Fi - Includes Special Offers

By URL:

>>> p = amzn.lookup(URL='http://www.amazon.com/Kindle-Wi-Fi-Ink-Display-international/dp/B0051QVF7A/ref=cm_cr_pr_product_top')
>>> p.title
Kindle, Wi-Fi, 6" E Ink Display - for international shipment
>>> p.asin
B0051QVF7A

Product Ratings:

>>> p = amzn.lookup(ItemId='B00FLIJJSA')
>>> p.ratings
[8, 4, 6, 4, 13]

Alternative Bindings:

>>> p = amzn.lookup(ItemId='B000GRFTPS')
>>> p.alternatives
['B00IVM5X7E', '9163192993', '0899669433', 'B00IPXPQ9O', '1482998742', '0441444814', '1497344824']
>>> for asin in p.alternatives:
>>>     alt = amzn.lookup(ItemId=asin)
>>>     print alt.title, alt.binding
The King in Yellow Kindle Edition
The King in Yellow Unknown Binding
King in Yellow Hardcover
The Yellow Sign Audible Audio Edition
The King in Yellow MP3 CD
THE KING IN YELLOW Mass Market Paperback
The King in Yellow Paperback

Supplemental text not available via the API:

>>> p = amzn.lookup(ItemId='0441016685')
>>> p.supplemental_text
[u"Bob Howard is a computer-hacker desk jockey ... ", u"Lovecraft\'s Cthulhu meets Len Deighton\'s spies ... ", u"This dark, funny blend of SF and ... "]

Review API

View lists of reviews:

>>> p = amzn.lookup(ItemId='B0051QVF7A')
>>> rs = amzn.reviews(URL=p.reviews_url)
>>> rs.asin
B0051QVF7A
>>> rs.ids
['R3MF0NIRI3BT1E', 'R3N2XPJT4I1XTI', 'RWG7OQ5NMGUMW', 'R1FKKJWTJC4EAP', 'RR8NWZ0IXWX7K', 'R32AU655LW6HPU', 'R33XK7OO7TO68E', 'R3NJRC6XH88RBR', 'R21JS32BNNQ82O', 'R2C9KPSEH78IF7']
>>> rs.url
http://www.amazon.com/product-reviews/B0051QVF7A/ref=cm_cr_pr_top_sort_recent?&sortBy=bySubmissionDateDescending

Quickly get a list of all reviews on a review page using the all_reviews property:

>>> p = amzn.lookup(ItemId='B0051QVF7A')
>>> rs = amzn.reviews(URL=p.reviews_url)
>>> all_reviews_on_page = rs.all_reviews
>>> len(all_reviews_on_page)
10
>>> all_reviews_on_page[0].to_dict()["title"]
'Fantastic device - pick your Kindle!'

By ASIN/ItemId:

>>> rs = amzn.reviews(ItemId='B0051QVF7A')
>>> rs.asin
B0051QVF7A
>>> rs.ids
['R3MF0NIRI3BT1E', 'R3N2XPJT4I1XTI', 'RWG7OQ5NMGUMW', 'R1FKKJWTJC4EAP', 'RR8NWZ0IXWX7K', 'R32AU655LW6HPU', 'R33XK7OO7TO68E', 'R3NJRC6XH88RBR', 'R21JS32BNNQ82O', 'R2C9KPSEH78IF7']

For individual reviews use the review method. As a note this method is NOT suggested for use in bulk collection of reviews. Use all_reviews instead.:

>>> r = amzn.review(Id=rs.ids[0])
>>> r.id
R3MF0NIRI3BT1E
>>> r.asin
B00492CIC8
>>> r.url
http://www.amazon.com/review/R3MF0NIRI3BT1E
>>> r.date
2011-09-29 18:27:14+00:00
>>> r.author
FreeSpirit
>>> r.text
Having been a little overwhelmed by the choices between all the new Kindles ... <snip>

By URL:

>>> r = amzn.review(URL='http://www.amazon.com/review/R3MF0NIRI3BT1E')
>>> r.id
R3MF0NIRI3BT1E

Reviewer API

This package also supports getting information about specific reviewers and the reviews they have written over time. It is advisable to first look up a reviewer via another one of the products they have reviewed though. This situation will be improved in the future though.

Get reviews that a single reviewer has created:

r = self.amzn.review(Id="R3MF0NIRI3BT1E")
reviewer = self.amzn.reviewer(r.author_reviews_url)
all_reviews = reviewer.all_reviews

Iterate to the authors next review page if they have one:

r = self.amzn.review(Id="R3MF0NIRI3BT1E")
reviewer = self.amzn.reviewer(r.author_reviews_url)
reviewer = self.amzn.reviewer(reviewer.next_page_url)
second_page_reviews = reviewer.all_reviews

Authors

Adam Griffiths

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Operating System
Programming Language
- Python
Topic

Release history Release notifications | RSS feed

0.3.3

Jan 19, 2016

0.3.2

Nov 30, 2015

0.3.1

Sep 30, 2015

0.3.0

Sep 30, 2015

This version

0.2

May 13, 2015

0.1.17

Sep 11, 2014

0.1.16

Jun 4, 2014

0.1.14

May 30, 2014

0.1.13

May 30, 2014

0.1.12

May 30, 2014

0.1.11

May 30, 2014

0.1.10

May 30, 2014

0.1.9

May 30, 2014

0.1.8

May 30, 2014

0.1.7

May 20, 2014

0.1.6

May 19, 2014

0.1.5

May 16, 2014

0.1.4

May 12, 2014

0.1.3

May 10, 2014

0.1.2

May 9, 2014

0.1.1

May 9, 2014

0.1.0

May 9, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amazon_scraper-0.2.tar.gz (12.8 kB view details)

Uploaded May 13, 2015 Source

File details

Details for the file amazon_scraper-0.2.tar.gz.

File metadata

Download URL: amazon_scraper-0.2.tar.gz
Upload date: May 13, 2015
Size: 12.8 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for amazon_scraper-0.2.tar.gz
Algorithm	Hash digest
SHA256	`37c23dee05c25ce249c6087c977bd3306e67bb9f4c978e97015971d13706c9fa`
MD5	`65d724f7f9fe8dd749827ff0cb40cc58`
BLAKE2b-256	`ed639779a99fa09e34d0b7393647a3034cd415bb21e36cc915347d4211274821`

See more details on using hashes here.

amazon_scraper 0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Caveat

Installation

Examples

All Products All The Time

Review API

Reviewer API

Authors

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes