Skip to main content

A Parser for Amazon Pages

Project description

AmazonParser

Python Library for Parsing Amazon Pages

Description

AmazonParser is a Python library designed to parse product information from Amazon product pages. It extracts useful data such as product title, price, ratings, and more. It's designed to scrape data mostly by XPath and RegEx. This design helps to be more modular and configable.

Prerequisites

  • Python 3.6 or higher
  • lxml library: pip install lxml

Installation

You can install the library using pip:

pip install AmazonParser

Usage

Here is an example of how to use the AmazonParser module:

from amazonparser import AmazonParser

# Create an instance of the parser
parser = AmazonParser()

# Parse a product page
path = 'tests/archives/page-ASIN.html'
html = AmazonAEProductPageParser.get_html_from_file(path)
product_data = AmazonAEProductPageParser(html=html, base_url="https://www.amazon.ae/")

# Print the parsed data
print(product_data.get_product_details())

Example Output

The get_product_details method returns a dictionary with the following structure:

{'best_sellers_rank': [{'category': 'Mobile Phones & Communication Products',
                        'category_url': 'https://...',
                        'rank': 5},
                       {'category': 'Mobile Phone Screen Protectors',
                        'category_url': 'https://...',
                        'rank': 2}],
 'bought_past_mounth': '500+',
 'brand': 'JETech',
 'bullet_points': 'STRING',
 'customers_reviews': {'count': 21049, 'rate': 4.3},
 'date_first_available': datetime.date(2024, 8, 6),
 'image': 'https://m.media-amazon.com/images/I/71B7WFLtovL._AC_SL1500_.jpg',
 'price': {'currency': 'AED', 'value': 30.99},
 'product_bundles': {'B09BVR4LFY': 'iPhone 13/13 Pro 6.1-Inch',
                     'B09BZ2YD6F': 'iPhone 13 Pro Max 6.7-Inch',
                     'B0B2L6R586': 'iPhone 12/12 Pro 6.1-Inch',
                     'B0B2RQP8MK': 'iPhone 12 Pro Max 6.7-Inch',
                     'B0DBZNC8DL': 'iPhone 16 Pro 6.3-Inch',
                     'B0DBZPXJRH': 'iPhone 16 Pro Max 6.9-Inch',
                     'B0DBZQ2WR3': 'iPhone 16 Plus 6.7-Inch',
                     'B0DBZR3TX7': 'iPhone 16 6.1-Inch'},
 'seller_detail': {'seller_id': 'A11TDSN2MJL3GW',
                   'seller_name': 'JE Products AE',
                   'seller_profile_url': 'https://www.amazon.ae/sp/?seller=A11TDSN2MJL3GW'},
 'stock_availability': {'quantity': 50, 'status': True},
 'title': 'JETech Screen Protector for iPhone 16 Pro Max 6.9-Inch, Tempered '
          'Glass Film with Easy Installation Tool, Case-Friendly, HD Clear, '
          '3-Pack'}

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amazonparser-0.1.4.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

AmazonParser-0.1.4-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file amazonparser-0.1.4.tar.gz.

File metadata

  • Download URL: amazonparser-0.1.4.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for amazonparser-0.1.4.tar.gz
Algorithm Hash digest
SHA256 75e2440778db2f48058025acd83f4489015bdd1317046fcdc1bb4e6489187f69
MD5 d3ec1cdedda28719390e0dae9efe9eda
BLAKE2b-256 8608756d61aa04c2ceca74e7de03178bf5fd51970ca778d0cf237363bf3db9df

See more details on using hashes here.

Provenance

The following attestation bundles were made for amazonparser-0.1.4.tar.gz:

Publisher: python-publish.yml on a4fr/AmazonParser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file AmazonParser-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: AmazonParser-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for AmazonParser-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 37851213dfeb693235ea2c21fb8639fca397220c0103a215b759206b4b0bf21c
MD5 6bf7e2a6fff16adb9d2cfbec968bbc9d
BLAKE2b-256 fec420a9fd98ca3acbddad46e5e1608b9b375de99ff261b5311de6518afead2a

See more details on using hashes here.

Provenance

The following attestation bundles were made for AmazonParser-0.1.4-py3-none-any.whl:

Publisher: python-publish.yml on a4fr/AmazonParser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page