A Parser for Amazon Pages
Project description
AmazonParser
Python Library for Parsing Amazon Pages
Description
AmazonParser is a Python library designed to parse product information from Amazon product pages. It extracts useful data such as product title, price, ratings, and more. It's designed to scrape data mostly by XPath and RegEx. This design helps to be more modular and configable.
Prerequisites
- Python 3.6 or higher
- lxml library:
pip install lxml
Installation
You can install the library using pip:
pip install AmazonParser
Usage
Here is an example of how to use the AmazonParser module:
from amazonparser import AmazonParser
# Create an instance of the parser
parser = AmazonParser()
# Parse a product page
path = 'tests/archives/page-ASIN.html'
html = AmazonAEProductPageParser.get_html_from_file(path)
product_data = AmazonAEProductPageParser(html=html, base_url="https://www.amazon.ae/")
# Print the parsed data
print(product_data.get_product_details())
Example Output
The get_product_details method returns a dictionary with the following structure:
{'best_sellers_rank': [{'category': 'Mobile Phones & Communication Products',
'category_url': 'https://...',
'rank': 5},
{'category': 'Mobile Phone Screen Protectors',
'category_url': 'https://...',
'rank': 2}],
'bought_past_mounth': '500+',
'brand': 'JETech',
'bullet_points': 'STRING',
'customers_reviews': {'count': 21049, 'rate': 4.3},
'date_first_available': datetime.date(2024, 8, 6),
'image': 'https://m.media-amazon.com/images/I/71B7WFLtovL._AC_SL1500_.jpg',
'price': {'currency': 'AED', 'value': 30.99},
'product_bundles': {'B09BVR4LFY': 'iPhone 13/13 Pro 6.1-Inch',
'B09BZ2YD6F': 'iPhone 13 Pro Max 6.7-Inch',
'B0B2L6R586': 'iPhone 12/12 Pro 6.1-Inch',
'B0B2RQP8MK': 'iPhone 12 Pro Max 6.7-Inch',
'B0DBZNC8DL': 'iPhone 16 Pro 6.3-Inch',
'B0DBZPXJRH': 'iPhone 16 Pro Max 6.9-Inch',
'B0DBZQ2WR3': 'iPhone 16 Plus 6.7-Inch',
'B0DBZR3TX7': 'iPhone 16 6.1-Inch'},
'seller_detail': {'seller_id': 'A11TDSN2MJL3GW',
'seller_name': 'JE Products AE',
'seller_profile_url': 'https://www.amazon.ae/sp/?seller=A11TDSN2MJL3GW'},
'stock_availability': {'quantity': 50, 'status': True},
'title': 'JETech Screen Protector for iPhone 16 Pro Max 6.9-Inch, Tempered '
'Glass Film with Easy Installation Tool, Case-Friendly, HD Clear, '
'3-Pack'}
Contributing
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file amazonparser-0.1.6.tar.gz.
File metadata
- Download URL: amazonparser-0.1.6.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b192b83de7695ab734608dc93d2683b93657cc9bd6315763d2478b7958a0a25
|
|
| MD5 |
83288826bbc89213f328df3da93ca3fc
|
|
| BLAKE2b-256 |
0c55f02267fb6622d4009c5a597c10185ae3d7416ca19ddf15a2fc963be0d456
|
Provenance
The following attestation bundles were made for amazonparser-0.1.6.tar.gz:
Publisher:
python-publish.yml on a4fr/AmazonParser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
amazonparser-0.1.6.tar.gz -
Subject digest:
4b192b83de7695ab734608dc93d2683b93657cc9bd6315763d2478b7958a0a25 - Sigstore transparency entry: 171274218
- Sigstore integration time:
-
Permalink:
a4fr/AmazonParser@1eae069ee7e705a5fb1c6abb2e3875534bc9740e -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/a4fr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@1eae069ee7e705a5fb1c6abb2e3875534bc9740e -
Trigger Event:
release
-
Statement type:
File details
Details for the file AmazonParser-0.1.6-py3-none-any.whl.
File metadata
- Download URL: AmazonParser-0.1.6-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f501543cda775512d740847a4c17bd78c13e9c1bd40b30f5db630635ad2895d
|
|
| MD5 |
8cbd54173f280d62be760d9e17772d85
|
|
| BLAKE2b-256 |
dfbf9d89c50f307b5cf8d319b08bda63c3a0841fe3553d56f887f36eb1a6b6e1
|
Provenance
The following attestation bundles were made for AmazonParser-0.1.6-py3-none-any.whl:
Publisher:
python-publish.yml on a4fr/AmazonParser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
amazonparser-0.1.6-py3-none-any.whl -
Subject digest:
2f501543cda775512d740847a4c17bd78c13e9c1bd40b30f5db630635ad2895d - Sigstore transparency entry: 171274220
- Sigstore integration time:
-
Permalink:
a4fr/AmazonParser@1eae069ee7e705a5fb1c6abb2e3875534bc9740e -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/a4fr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@1eae069ee7e705a5fb1c6abb2e3875534bc9740e -
Trigger Event:
release
-
Statement type: