Skip to main content

Scrapes product details from Amazon product pages and also downloads the images

Project description

Amazon Product Details Scraper

GitHub License GitHub Release PyPI - Version Downloads GitHub forks GitHub Repo stars

GitHub Actions Workflow Status Codacy Badge OpenSSF Scorecard GitHub Issues or Pull Requests Libraries.io dependency status for GitHub repo

This script helps you scrape product details from Amazon product pages. It extracts information like title, description, and image URLs, saving them to JSON files.

Features

  • Fetches product details from a single Amazon product URL or a list of URLs in a file.
  • Writes extracted data to JSON files for easy storage and processing.
  • Optionally downloads product images along with details.

Installation

Requirements:

  • Python 3 (tested with 3.7+)
  • Libraries:
    • requests
    • beautifulsoup4
    • urllib3

Instructions:

  1. Make sure you have Python 3 installed. You can check by running python3 --version in your terminal.

  2. Create a virtual environment (recommended):

    • Virtual environments help isolate project dependencies and avoid conflicts with other Python installations on your system.

    • Here's how to create a virtual environment using venv:

      python3 -m venv my_env  # Replace "my_env" with your desired environment name
      
    • Activate the virtual environment:

      source my_env/bin/activate
      
  3. Install:

    python3 setup.py install
    

    This will automatically download and install the necessary libraries based on the specifications within the activated virtual environment.

Usage

Basic Usage:

amazon-scraper --url https://www.amazon.com/product-1  # Replace with your product URL

This will scrape details from the provided Amazon product URL and write them to a JSON file in the "output" directory (default).

Using a URL List:

  1. Create a text file containing a list of Amazon product URLs (one per line).
  2. Run the script with the --url-list option and provide the file path:
amazon-scraper --url-list product_urls.txt

This will process each URL in the file and save the scraped details for each product in separate directories within "output".

Optional: Downloading Images:

amazon-scraper --url https://www.amazon.com/product-1 --download-image

The --download-image flag enables downloading product images along with other details.

Getting Help:

The script offers a built-in help message that provides a quick overview of available options and usage instructions. To access the help, run the script with the --help option:

amazon_scraper --help

Configuration

Logging:

  • The script uses basic logging for information and error messages.
  • You can modify the logging level by editing the DEFAULT_LOG_LEVEL in config.py line in the code (refer to the Python documentation for logging configuration).

Example

Scenario:

Scrape details for two products from a file named "products.txt" and download images:

  1. Create a file named "products.txt" with the following content:

    https://www.amazon.com/product-1
    https://www.amazon.com/product-2
    
  2. Run the script with the following command:

    amazon-scraper --url-list products.txt --download-image
    

This will process both URLs in the file, scrape details, create separate output directories for each product, and download images.

Disclaimer

This script is for educational purposes only. Please be respectful of Amazon's terms of service when using it. Consider using official APIs provided by Amazon for extensive data collection.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amazon_product_details_scraper-1.0.5.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file amazon_product_details_scraper-1.0.5.tar.gz.

File metadata

File hashes

Hashes for amazon_product_details_scraper-1.0.5.tar.gz
Algorithm Hash digest
SHA256 2e85df1543b404aca7b2b7eb01353acd52a59f27d1436e3d8b3c3797d94a7b80
MD5 d1cd7386b887533fe5d19d5a3fbb71db
BLAKE2b-256 ba01bde37542278ddbd3bf63e8bf28950c213bd730f4b10787c1377abe610c56

See more details on using hashes here.

File details

Details for the file amazon_product_details_scraper-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for amazon_product_details_scraper-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 4efaef5d597dc1f5b6bfe81a15c4822adecaef2c647e4827360684bbfdd808d1
MD5 29820128e6f7b8a24b291a2d47609821
BLAKE2b-256 a4b72f4082066c9110711e2d0ca9b9d221d6d7fd3c00a6c24c26bcd627f51e15

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page