Easily scrape and export detailed product data from Product Hunt categories into JSON files using a Python library. Automate data collection for analysis and reporting.
Project description
ProductHunt API
ProductHunt API is a Python-based library developed by Unrealos (unrealos.com) to automate the process of fetching, processing, and saving Product Hunt data into structured JSON format. It’s designed for developers and businesses to streamline data collection from Product Hunt categories.
Key Features
- Automated Data Retrieval: Fetch products from any Product Hunt category using GraphQL.
- Data Parsing: Process product details and save them as structured JSON files.
- Developer-Friendly: Easy-to-use pipeline designed for automation.
- JSON Output: Save extracted product data in a standardized JSON file for analysis and reporting.
Installation
Install from PyPI
You can install the library directly from PyPI:
pip install producthunt-api
Verify Installation
To confirm the installation was successful, run:
python -m producthunt_api --help
Setting Up
Before running the pipeline, you need to extract some data from Product Hunt.
1. Parsing Category Data
To fetch data from a specific category, follow these steps:
- Open the Product Hunt category page you want to scrape (e.g., AI Software).
- Open your browser's developer tools (usually accessible via
F12or right-click >Inspect). - Go to the Network tab.
- Locate the request to
https://www.producthunt.com/frontend/graphql. This request contains:- OperationName:
"CategoryPageQuery"
- OperationName:
- Extract the following from the request:
- Cookie: Copy the
Cookiestring from the request headers. - SHA256 Hash: Copy the
sha256Hashvalue from the request payload.
- Cookie: Copy the
Usage
Pipeline Workflow
The pipeline automates the following steps:
- Data Fetching: Fetch products from a specified Product Hunt category.
- Product Processing: Parse and save product details as individual JSON files.
- Data Consolidation: Combine all individual JSON files into a single structured JSON file.
Running the Pipeline
Create a Python script and use the following example to run the pipeline:
import os
from producthunt_api import ProductHuntPipeline
# Define necessary variables
directory = os.path.join(os.getcwd(), "downloads")
cookie_value = "<YOUR_COOKIE>"
sha256_hash_value = "<YOUR_SHA256_HASH>"
slug_value = "ai-software" # Replace with your desired category slug
# Initialize and run the pipeline
pipeline = ProductHuntPipeline(
directory=directory,
cookie=cookie_value,
sha256_hash=sha256_hash_value,
slug=slug_value
)
pipeline.run(limit=50, max_threads=10)
- Replace
<YOUR_COOKIE>and<YOUR_SHA256_HASH>with the values you extracted earlier. - Save the script and run it:
python your_script_name.py
Output
- Individual JSON Files: Each product is saved as a separate JSON file in the
downloads/productsdirectory. - Consolidated JSON File: All product data is combined into a single
products.jsonfile in thedownloadsdirectory.
Testing the Library
To ensure everything is working as expected, you can run the test suite.
-
Install testing dependencies:
pip install pytest pytest-cov
-
Run the tests:
pytest -v
Contributing
We welcome contributions to enhance this library. If you’d like to contribute:
- Fork the repository on GitHub.
- Create a feature branch.
- Submit a pull request with your changes.
About Unrealos
Unrealos (unrealos.com) is a software development company specializing in AI, SaaS, and PaaS solutions for businesses. With expertise in integrating artificial intelligence into scalable business processes, Unrealos delivers cutting-edge software tailored to your needs.
Additional Resources
- Source Code: GitHub Repository
- Issue Tracker: Submit Issues Here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file producthunt_api-0.2.0.tar.gz.
File metadata
- Download URL: producthunt_api-0.2.0.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6498f990c156196c36b82f91b2182e7ec1a11ed011284f506059a0e647409a1c
|
|
| MD5 |
347352218dba9d0c5f045c0bc9ae5464
|
|
| BLAKE2b-256 |
c488fa441f1426b8fc4e7fdf33a8a63b8e0f6829fe66f041ffca4d7a175f30d2
|
File details
Details for the file producthunt_api-0.2.0-py3-none-any.whl.
File metadata
- Download URL: producthunt_api-0.2.0-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3528deb1f70bda7ee364fdb5ce9736806cd219a6d0c0acbae4295e6b7f8dae70
|
|
| MD5 |
819448b3ab65df040f2c7aefd6c45ca9
|
|
| BLAKE2b-256 |
4d789c2f01351217dc41f22e8473ea5c28dd887e5356d66d2f8ff40e6363376e
|