Skip to main content

A Parser for Amazon Pages

Project description

AmazonParser

Python Library for Parsing Amazon Pages

Description

AmazonParser is a Python library designed to parse product information from Amazon product pages. It extracts useful data such as product title, price, ratings, and more. It's designed to scrape data mostly by XPath and RegEx. This design helps to be more modular and configable.

Prerequisites

  • Python 3.6 or higher
  • lxml library: pip install lxml

Installation

You can install the library using pip:

pip install AmazonParser

Usage

Here is an example of how to use the AmazonParser module:

from amazonparser import AmazonParser

# Create an instance of the parser
parser = AmazonParser()

# Parse a product page
path = 'tests/archives/page-ASIN.html'
html = AmazonAEProductPageParser.get_html_from_file(path)
product_data = AmazonAEProductPageParser(html=html, base_url="https://www.amazon.ae/")

# Print the parsed data
print(product_data.get_product_details())

Example Output

The get_product_details method returns a dictionary with the following structure:

{'best_sellers_rank': [{'category': 'Mobile Phones & Communication Products',
                        'category_url': 'https://...',
                        'rank': 5},
                       {'category': 'Mobile Phone Screen Protectors',
                        'category_url': 'https://...',
                        'rank': 2}],
 'bought_past_mounth': '500+',
 'brand': 'JETech',
 'bullet_points': 'STRING',
 'customers_reviews': {'count': 21049, 'rate': 4.3},
 'date_first_available': datetime.date(2024, 8, 6),
 'image': 'https://m.media-amazon.com/images/I/71B7WFLtovL._AC_SL1500_.jpg',
 'price': {'currency': 'AED', 'value': 30.99},
 'product_bundles': {'B09BVR4LFY': 'iPhone 13/13 Pro 6.1-Inch',
                     'B09BZ2YD6F': 'iPhone 13 Pro Max 6.7-Inch',
                     'B0B2L6R586': 'iPhone 12/12 Pro 6.1-Inch',
                     'B0B2RQP8MK': 'iPhone 12 Pro Max 6.7-Inch',
                     'B0DBZNC8DL': 'iPhone 16 Pro 6.3-Inch',
                     'B0DBZPXJRH': 'iPhone 16 Pro Max 6.9-Inch',
                     'B0DBZQ2WR3': 'iPhone 16 Plus 6.7-Inch',
                     'B0DBZR3TX7': 'iPhone 16 6.1-Inch'},
 'seller_detail': {'seller_id': 'A11TDSN2MJL3GW',
                   'seller_name': 'JE Products AE',
                   'seller_profile_url': 'https://www.amazon.ae/sp/?seller=A11TDSN2MJL3GW'},
 'stock_availability': {'quantity': 50, 'status': True},
 'title': 'JETech Screen Protector for iPhone 16 Pro Max 6.9-Inch, Tempered '
          'Glass Film with Easy Installation Tool, Case-Friendly, HD Clear, '
          '3-Pack'}

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amazonparser-0.1.5.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

AmazonParser-0.1.5-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file amazonparser-0.1.5.tar.gz.

File metadata

  • Download URL: amazonparser-0.1.5.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for amazonparser-0.1.5.tar.gz
Algorithm Hash digest
SHA256 376373f216d4bfc28117f80f1edf54c353df799157619a20e3f6030069581e49
MD5 295021f42f0e858dc094287cf5bd24e1
BLAKE2b-256 6832ba0b68af2a2784dac3fc14865dca40ee116076b87d03d30e443fb3d1b89c

See more details on using hashes here.

Provenance

The following attestation bundles were made for amazonparser-0.1.5.tar.gz:

Publisher: python-publish.yml on a4fr/AmazonParser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file AmazonParser-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: AmazonParser-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for AmazonParser-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 de485378e473c4dfee4d753e2566e8bb2329ab5bffcd72e6b53cf238a0cf71ef
MD5 6f1f2d3127a117f46a7a719c1d18dc19
BLAKE2b-256 9603d4a53c9c4ac286dfeadc79202915dbdc3f1e22c008d657733851bfda5a58

See more details on using hashes here.

Provenance

The following attestation bundles were made for AmazonParser-0.1.5-py3-none-any.whl:

Publisher: python-publish.yml on a4fr/AmazonParser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page