Skip to main content

A library to search products on Amazon without using the PA API

Project description

🛍️ Amazon Product Search Library 📦

Overview

Tired of manually browsing Amazon for the best deals? 🌐 Meet Amazon Product Search — your trusty Python library to scrape product details from Amazon's search results with just a few lines of code. Powered by BeautifulSoup4 (bs4), Requests, and multithreading for speed, this library helps you efficiently gather product titles, prices, reviews, images, and direct links. 🎉

Key Features

  • Product Search: Search for products by name, type, brand, and price range. 📱💻
  • Detailed Data: Scrape titles, prices, reviews, images, and URLs. 🎯
  • Fast and Efficient: Uses multithreading to speed up data extraction.
  • Easy-to-use: Simple API for quick integration. ✨

Setup 🛠️

Get started with Amazon Product Search by installing it via PyPI or GitHub.

1. Install via PyPI (Recommended) 🧑‍💻

The easiest way to install the library is using pip from PyPI:

pip install amazon-product-search

This installs the latest stable release.

### 2. Install via GitHub (For Developers) 🦸‍♂️

If you want the very latest development version (which may have new features or bug fixes, but could also be less stable), clone the repository and install it in editable mode:

```bash
git clone --depth 1 https://github.com/ManojPanda3/amazon-product-search
cd amazon-product-search
pip install -e .
```

This allows you to modify the code and have the changes immediately reflected without reinstalling.

## Usage 📚

### Import the Library

First, import the `Amazon` class from the `amazon_product_search` module:

```python
from amazon_product_search import Amazon
```

### Searching for Products

The core functionality is provided by the `Amazon` class.

#### Instantiate the `Amazon` Class

```python
amazon = Amazon(is_debuging=False)  # Set is_debuging to True for verbose output
```

#### Use the `search()` Method

```python
results = amazon.search(productName="iPhone", productType="electronics", brand="Apple", priceRange="80000-100000")
```

**Parameters:**

- `productName` (str, required): The search term (e.g., "iPhone", "laptop").
- `productType` (str, optional): Filters by product type (e.g., "electronics", "books").
- `brand` (str, optional): Filters by brand (e.g., "Apple", "Samsung").
- `priceRange` (str, optional): Filters by price range using the format "min_price-max_price" (e.g., "100-200").

**Returns:**

- `list[dict]`: A list of dictionaries, where each dictionary represents a product and contains the following keys:
  - `"title"` (str | None): The product title.
  - `"link"` (str | None): The URL to the product page.
  - `"review"` (str | None): A string representing the product review (e.g., "4.5 out of 5 stars").
  - `"price"` (str | None): The product price.
  - `"image"` (str | None): The URL of the product image.

#### Example

```python
from amazon_product_search import Amazon

amazon = Amazon()
products = amazon.search("iPhone", productType="electronics", brand="Apple", priceRange="80000-100000")

for product in products:
    print(f"Title: {product['title']}")
    print(f"Price: {product['price']}")
    print(f"Review: {product['review']}")
    print(f"Image: {product['image']}")
    print(f"Link: {product['link']}")
    print("-" * 40)
```

## How It Works 🔍

This library works by:

1. **Constructing a Search URL:** It builds a URL for Amazon's search results page based on the provided search parameters.
2. **Making an HTTP Request:** It sends an HTTP GET request to the Amazon search URL using the `requests` library. It includes headers to mimic a web browser.
3. **Parsing the HTML:** It uses `BeautifulSoup4` to parse the HTML response and extract the relevant product information from the search result elements.
4. **Multithreading:** It uses `concurrent.futures.ThreadPoolExecutor` to process multiple search result elements concurrently, significantly speeding up the data extraction.
5. **Returning Data:** It returns the extracted data as a list of dictionaries.

## Important Notes ⚠️

- **Rate Limiting:** Amazon may rate-limit or block your IP address if you make too many requests in a short period. Use this library responsibly. Consider adding delays or using proxies if you need to scrape a large amount of data. The library includes a `timeout` in the request to help prevent hanging.
- **Terms of Service:** Scraping may be against Amazon's Terms of Service. Use this tool for **personal and educational purposes only**, and be aware of the potential legal and ethical implications.
- **Website Changes:** Amazon frequently updates its website structure. If the scraping stops working, the HTML parsing logic may need to be adjusted.
- **Error Handling:** The library includes basic error handling (e.g., for network errors), but you may need to add more robust error handling for production use.

## Troubleshooting 🛠️

1. **`ValueError: Error product Name is required`:** You must provide a `productName` when calling the `search()` method.
2. **`Exception: Error while geting data from Amazon`:** This indicates a problem fetching data from Amazon. It could be a network issue, a problem with your request, or Amazon blocking your request. Enable debugging (`is_debuging=True`) for more details.
3. **Empty Results:** If you get an empty list, it could be that no products matched your search criteria, or that Amazon's HTML structure has changed, and the parsing logic needs to be updated.
4. **Missing Data (None Values):** If some fields (like `review` or `price`) are `None`, it means the library couldn't find that specific data for that product on the page. This is normal, as Amazon's page structure can vary.

## Contributing 🤝

Contributions are welcome! If you find a bug, have a feature request, or want to improve the code, please open an issue or submit a pull request on GitHub.

## License 📜

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amazon_product_search_v2-0.1.1.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

amazon_product_search_v2-0.1.1-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file amazon_product_search_v2-0.1.1.tar.gz.

File metadata

File hashes

Hashes for amazon_product_search_v2-0.1.1.tar.gz
Algorithm Hash digest
SHA256 60aeef5d123a961b1c5d2752b4d235800982597bc74f14d07479ed44675fb519
MD5 82e862a9012049b8dc7aab990f0746b6
BLAKE2b-256 ee72c73698e2c51d9ef1661e0c356b2ea1e932c5bee22f975f368d0322c5b7e0

See more details on using hashes here.

File details

Details for the file amazon_product_search_v2-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for amazon_product_search_v2-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 28548de65ed50893f2621f630e1f8680168a07fea4a5ff1f151456af78183346
MD5 be08d02426838b3b7b689efc58d9e2a3
BLAKE2b-256 8bb41d5c0eb9afb0d4a9b747c2a9489a7dc8944b7a0ef6264b573e076c5beabd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page