Skip to main content

A lightweight Amazon scraper library.

Project description

AmzPy - Amazon Product Scraper PyPI

AmzPy Logo

AmzPy - A lightweight Amazon product scraper library. | Product Hunt

AmzPy is a lightweight Python library for scraping product information from Amazon. It provides a simple interface to fetch product details like title, price, currency, and image URL while handling anti-bot measures automatically using curl_cffi for enhanced protection.

Features

  • Easy-to-use API for scraping Amazon product data
  • Supports multiple Amazon domains (.com, .in, .co.uk, etc.)
  • Enhanced anti-bot protection using curl_cffi with browser impersonation
  • Automatic retries on detection with intelligent delay management
  • Support for proxies to distribute requests
  • Dynamic configuration options
  • Extract color variants, discounts, delivery information, and more
  • Clean and typed Python interface

Installation

Install using pip:

pip install amzpy

Basic Usage

Fetching Product Details

from amzpy import AmazonScraper

# Create scraper with default settings (amazon.com)
scraper = AmazonScraper()

# Fetch product details
url = "https://www.amazon.com/dp/B0D4J2QDVY"
product = scraper.get_product_details(url)

if product:
    print(f"Title: {product['title']}")
    print(f"Price: {product['currency']}{product['price']}")
    print(f"Brand: {product['brand']}")
    print(f"Rating: {product['rating']}")
    print(f"Image URL: {product['img_url']}")

Searching for Products

from amzpy import AmazonScraper

# Create scraper for a specific Amazon domain
scraper = AmazonScraper(country_code="in")

# Search by query - get up to 2 pages of results
products = scraper.search_products(query="wireless earbuds", max_pages=2)

# Display the results
for i, product in enumerate(products[:5], 1):
    print(f"{i}. {product['title']} - {product['currency']}{product['price']}")

Advanced Usage

Configuration Options

AmzPy offers flexible configuration options that can be set in multiple ways:

# Method 1: Set at initialization
scraper = AmazonScraper(
    country_code="in",
    impersonate="chrome119",
    proxies={"http": "http://user:pass@proxy.example.com:8080"}
)

# Method 2: Using string-based configuration
scraper.config('MAX_RETRIES = 5, REQUEST_TIMEOUT = 30, DELAY_BETWEEN_REQUESTS = (3, 8)')

# Method 3: Using keyword arguments
scraper.config(MAX_RETRIES=4, DEFAULT_IMPERSONATE="safari15")

Advanced Search Features

The search functionality can extract rich product data including:

# Search for products with 5 pages of results
products = scraper.search_products(query="men sneakers size 9", max_pages=5)

# Or search using a pre-constructed URL (e.g., filtered searches)
url = "https://www.amazon.in/s?i=shoes&rh=n%3A1983518031&s=popularity-rank"
products = scraper.search_products(search_url=url, max_pages=3)

# Access comprehensive product data
for product in products:
    # Basic information
    print(f"Title: {product.get('title')}")
    print(f"ASIN: {product.get('asin')}")
    print(f"URL: https://www.amazon.{scraper.country_code}/dp/{product.get('asin')}")
    print(f"Brand: {product.get('brand')}")
    print(f"Price: {product.get('currency')}{product.get('price')}")
    
    # Discount information
    if 'original_price' in product:
        print(f"Original Price: {product.get('currency')}{product.get('original_price')}")
        print(f"Discount: {product.get('discount_percent')}% off")
    
    # Ratings and reviews
    print(f"Rating: {product.get('rating')} / 5.0 ({product.get('reviews_count')} reviews)")
    
    # Color variants
    if 'color_variants' in product:
        print(f"Available in {len(product['color_variants'])} colors")
        for variant in product['color_variants']:
            print(f"  - {variant['name']}: https://www.amazon.{scraper.country_code}/dp/{variant['asin']}")
    
    # Additional information
    print(f"Prime Eligible: {'Yes' if product.get('prime') else 'No'}")
    if 'delivery_info' in product:
        print(f"Delivery: {product.get('delivery_info')}")
    if 'badge' in product:
        print(f"Badge: {product.get('badge')}")

Working with Proxies

To distribute requests and avoid IP blocks, you can use proxies:

# HTTP/HTTPS proxies
proxies = {
    "http": "http://user:pass@proxy.example.com:8080",
    "https": "http://user:pass@proxy.example.com:8080"
}

# SOCKS5 proxies
proxies = {
    "http": "socks5://user:pass@proxy.example.com:1080",
    "https": "socks5://user:pass@proxy.example.com:1080" 
}

scraper = AmazonScraper(proxies=proxies)

Browser Impersonation

AmzPy uses curl_cffi's browser impersonation to mimic real browser requests, significantly improving success rates when scraping Amazon:

# Specify a particular browser to impersonate
scraper = AmazonScraper(impersonate="chrome119")  # Chrome 119
scraper = AmazonScraper(impersonate="safari15")   # Safari 15
scraper = AmazonScraper(impersonate="firefox115") # Firefox 115

Configuration Reference

These configuration parameters can be adjusted:

Parameter Default Description
MAX_RETRIES 3 Maximum number of retry attempts for failed requests
REQUEST_TIMEOUT 25 Request timeout in seconds
DELAY_BETWEEN_REQUESTS (2, 5) Random delay range between requests (min, max) in seconds
DEFAULT_IMPERSONATE 'chrome119' Default browser to impersonate

Requirements

  • Python 3.6+
  • curl_cffi (for enhanced anti-bot protection)
  • beautifulsoup4
  • lxml (for faster HTML parsing)
  • fake_useragent

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for details on how to contribute to this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amzpy-1.0.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

amzpy-1.0.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file amzpy-1.0.0.tar.gz.

File metadata

  • Download URL: amzpy-1.0.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for amzpy-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7dec89ee2b8879f80358b8726de8ed2ec502ad8420a20652312a2a18556dc962
MD5 5d0c0b68d9716a0391f41edd3623c800
BLAKE2b-256 321f0e304c70ec2556447b297e26cbbfc715fdfbb4156320ebd30f8fdd0aa8a0

See more details on using hashes here.

File details

Details for the file amzpy-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: amzpy-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for amzpy-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88d6dd1e8fd04cc1740a10aae60df00104a6fe957d75020a47a91bce3a44da2f
MD5 8e8644ecc5ef968058d92dd98fcfa611
BLAKE2b-256 3d6426bc7bbd8d7723df370b561433f63e689c8803bb4d76ce439fa7be965a41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page