No project description provided
Project description
Amazon Product Scraper
This project is a powerful web scraping tool designed to extract data from Amazon. Whether you're looking to gather details about a specific product, collect lists of products based on search keywords, or fetch product listings from a direct URL — this scraper handles it all, including automatic CAPTCHA solving.
🔍 Features
- Search by keyword: Provide a search term and specify how many pages to scrape. It will return all matching products from the given number of pages.
- Get product details: Supply a product URL and receive detailed information like:
- Title
- Price
- Description
- Features
- Rating
- Number of reviews
- Extract product list by link: Given a category or listing page URL, it fetches all the product entries up to the page limit.
- Automatic CAPTCHA Bypass: Solves Amazon CAPTCHAs automatically to allow seamless scraping.
🚀 Technologies Used
- Selenium: For browser automation and interaction with dynamic content.
- BeautifulSoup: For parsing and extracting data from HTML content.
- Pillow (PIL): Used to process and solve CAPTCHA images.
📖 How to Use
1. Initialize the Scraper
from amazon_scraper import AmazonScraper
scraper = AmazonScraper() # Initializes and runs the Chrome driver
2. Solve CAPTCHA
scraper.bypass_captcha()
When you see the success message, the CAPTCHA is solved and you can proceed to use the other methods.
3. Search Products by Keyword
results = scraper.get_product_by_search("laptop", page_limit=2)
This will return a dictionary of products found in the first 2 pages for the search term "laptop".
4. Get Product List by Link
product_list = scraper.get_product_list_by_link("https://www.amazon.com/s?k=smartphones", page_limit=2)
Scrapes product listings from the given URL up to 2 pages.
5. Get Detailed Product Info
product_details = scraper.get_detail_product_by_link("https://www.amazon.com/dp/B0...example")
Returns detailed product information such as title, price, rating, features, and more.
🙏 Support and Contributions
If you have a feature request or find a bug, feel free to open an issue or pull request on GitHub. I’m actively maintaining this project and happy to improve it based on your feedback.
If you find this project helpful, please consider giving it a ⭐ on GitHub — it means a lot!
Happy Scraping! 🤖
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file master_scramazon-0.1.0.tar.gz.
File metadata
- Download URL: master_scramazon-0.1.0.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25c3d5eef77dc0c7629abf9f131fda980d07c44f63bb514ab6dda2b8d995cc3a
|
|
| MD5 |
7631ab6eafaeaf4c017b28e2e894082e
|
|
| BLAKE2b-256 |
5e7fc27f5bbd453e7ac47913553ea41a4c1531d241783a4733b1cca6202e1ecb
|
File details
Details for the file master_scramazon-0.1.0-py3-none-any.whl.
File metadata
- Download URL: master_scramazon-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca56693073960e23a65f386f7e3ffe6e12ea58c33f0ddb494ade29fefd059129
|
|
| MD5 |
525c817ee9476cb0eca4a75943505a37
|
|
| BLAKE2b-256 |
192fd790f0be1ea2cfe6ad69f4753653ad3e6a0be6f71c28f57cbc7210fc4901
|