A lightweight Amazon scraper library.
Project description
AmzPy - Amazon Product Scraper 
AmzPy is a lightweight Python library for scraping product information from Amazon. It provides a simple interface to fetch product details like title, price, currency, and image URL while handling anti-bot measures automatically using curl_cffi for enhanced protection.
Features
- Easy-to-use API for scraping Amazon product data
- Supports multiple Amazon domains (.com, .in, .co.uk, etc.)
- Enhanced anti-bot protection using curl_cffi with browser impersonation
- Automatic retries on detection with intelligent delay management
- Support for proxies to distribute requests
- Dynamic configuration options
- Extract color variants, discounts, delivery information, and more
- Clean and typed Python interface
Installation
Install using pip:
pip install amzpy
Basic Usage
Fetching Product Details
from amzpy import AmazonScraper
# Create scraper with default settings (amazon.com)
scraper = AmazonScraper()
# Fetch product details
url = "https://www.amazon.com/dp/B0D4J2QDVY"
product = scraper.get_product_details(url)
if product:
print(f"Title: {product['title']}")
print(f"Price: {product['currency']}{product['price']}")
print(f"Brand: {product['brand']}")
print(f"Rating: {product['rating']}")
print(f"Image URL: {product['img_url']}")
Searching for Products
from amzpy import AmazonScraper
# Create scraper for a specific Amazon domain
scraper = AmazonScraper(country_code="in")
# Search by query - get up to 2 pages of results
products = scraper.search_products(query="wireless earbuds", max_pages=2)
# Display the results
for i, product in enumerate(products[:5], 1):
print(f"{i}. {product['title']} - {product['currency']}{product['price']}")
Advanced Usage
Configuration Options
AmzPy offers flexible configuration options that can be set in multiple ways:
# Method 1: Set at initialization
scraper = AmazonScraper(
country_code="in",
impersonate="chrome119",
proxies={"http": "http://user:pass@proxy.example.com:8080"}
)
# Method 2: Using string-based configuration
scraper.config('MAX_RETRIES = 5, REQUEST_TIMEOUT = 30, DELAY_BETWEEN_REQUESTS = (3, 8)')
# Method 3: Using keyword arguments
scraper.config(MAX_RETRIES=4, DEFAULT_IMPERSONATE="safari15")
Advanced Search Features
The search functionality can extract rich product data including:
# Search for products with 5 pages of results
products = scraper.search_products(query="men sneakers size 9", max_pages=5)
# Or search using a pre-constructed URL (e.g., filtered searches)
url = "https://www.amazon.in/s?i=shoes&rh=n%3A1983518031&s=popularity-rank"
products = scraper.search_products(search_url=url, max_pages=3)
# Access comprehensive product data
for product in products:
# Basic information
print(f"Title: {product.get('title')}")
print(f"ASIN: {product.get('asin')}")
print(f"URL: https://www.amazon.{scraper.country_code}/dp/{product.get('asin')}")
print(f"Brand: {product.get('brand')}")
print(f"Price: {product.get('currency')}{product.get('price')}")
# Discount information
if 'original_price' in product:
print(f"Original Price: {product.get('currency')}{product.get('original_price')}")
print(f"Discount: {product.get('discount_percent')}% off")
# Ratings and reviews
print(f"Rating: {product.get('rating')} / 5.0 ({product.get('reviews_count')} reviews)")
# Color variants
if 'color_variants' in product:
print(f"Available in {len(product['color_variants'])} colors")
for variant in product['color_variants']:
print(f" - {variant['name']}: https://www.amazon.{scraper.country_code}/dp/{variant['asin']}")
# Additional information
print(f"Prime Eligible: {'Yes' if product.get('prime') else 'No'}")
if 'delivery_info' in product:
print(f"Delivery: {product.get('delivery_info')}")
if 'badge' in product:
print(f"Badge: {product.get('badge')}")
Working with Proxies
To distribute requests and avoid IP blocks, you can use proxies:
# HTTP/HTTPS proxies
proxies = {
"http": "http://user:pass@proxy.example.com:8080",
"https": "http://user:pass@proxy.example.com:8080"
}
# SOCKS5 proxies
proxies = {
"http": "socks5://user:pass@proxy.example.com:1080",
"https": "socks5://user:pass@proxy.example.com:1080"
}
scraper = AmazonScraper(proxies=proxies)
Browser Impersonation
AmzPy uses curl_cffi's browser impersonation to mimic real browser requests, significantly improving success rates when scraping Amazon:
# Specify a particular browser to impersonate
scraper = AmazonScraper(impersonate="chrome119") # Chrome 119
scraper = AmazonScraper(impersonate="safari15") # Safari 15
scraper = AmazonScraper(impersonate="firefox115") # Firefox 115
Configuration Reference
These configuration parameters can be adjusted:
| Parameter | Default | Description |
|---|---|---|
| MAX_RETRIES | 3 | Maximum number of retry attempts for failed requests |
| REQUEST_TIMEOUT | 25 | Request timeout in seconds |
| DELAY_BETWEEN_REQUESTS | (2, 5) | Random delay range between requests (min, max) in seconds |
| DEFAULT_IMPERSONATE | 'chrome119' | Default browser to impersonate |
Requirements
- Python 3.6+
- curl_cffi (for enhanced anti-bot protection)
- beautifulsoup4
- lxml (for faster HTML parsing)
- fake_useragent
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please read CONTRIBUTING.md for details on how to contribute to this project.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file amzpy-1.0.0.tar.gz.
File metadata
- Download URL: amzpy-1.0.0.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7dec89ee2b8879f80358b8726de8ed2ec502ad8420a20652312a2a18556dc962
|
|
| MD5 |
5d0c0b68d9716a0391f41edd3623c800
|
|
| BLAKE2b-256 |
321f0e304c70ec2556447b297e26cbbfc715fdfbb4156320ebd30f8fdd0aa8a0
|
File details
Details for the file amzpy-1.0.0-py3-none-any.whl.
File metadata
- Download URL: amzpy-1.0.0-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88d6dd1e8fd04cc1740a10aae60df00104a6fe957d75020a47a91bce3a44da2f
|
|
| MD5 |
8e8644ecc5ef968058d92dd98fcfa611
|
|
| BLAKE2b-256 |
3d6426bc7bbd8d7723df370b561433f63e689c8803bb4d76ce439fa7be965a41
|