Skip to main content

Easily scrape and export detailed product data from Product Hunt categories into JSON files using a Python library. Automate data collection for analysis and reporting.

Project description

ProductHunt API

ProductHunt API is a Python-based library developed by Unrealos (unrealos.com) to automate the process of fetching, processing, and saving Product Hunt data into structured JSON format. It’s designed for developers and businesses to streamline data collection from Product Hunt categories.


Key Features

  • Automated Data Retrieval: Fetch products from any Product Hunt category using GraphQL.
  • Data Parsing: Process product details and save them as structured JSON files.
  • Developer-Friendly: Easy-to-use pipeline designed for automation.
  • JSON Output: Save extracted product data in a standardized JSON file for analysis and reporting.

Installation

Install from PyPI

You can install the library directly from PyPI:

pip install producthunt-api

Verify Installation

To confirm the installation was successful, run:

python -m producthunt_api --help

Setting Up

Before running the pipeline, you need to extract some data from Product Hunt.

1. Parsing Category Data

To fetch data from a specific category, follow these steps:

  1. Open the Product Hunt category page you want to scrape (e.g., AI Software).
  2. Open your browser's developer tools (usually accessible via F12 or right-click > Inspect).
  3. Go to the Network tab.
  4. Locate the request to https://www.producthunt.com/frontend/graphql. This request contains:
    • OperationName: "CategoryPageQuery"
  5. Extract the following from the request:
    • Cookie: Copy the Cookie string from the request headers.
    • SHA256 Hash: Copy the sha256Hash value from the request payload.

Usage

Pipeline Workflow

The pipeline automates the following steps:

  1. Data Fetching: Fetch products from a specified Product Hunt category.
  2. Product Processing: Parse and save product details as individual JSON files.
  3. Data Consolidation: Combine all individual JSON files into a single structured JSON file.

Running the Pipeline

Create a Python script and use the following example to run the pipeline:

import os
from producthunt_api import ProductHuntPipeline

# Define necessary variables
directory = os.path.join(os.getcwd(), "downloads")
cookie_value = "<YOUR_COOKIE>"
sha256_hash_value = "<YOUR_SHA256_HASH>"
slug_value = "ai-software"  # Replace with your desired category slug

# Initialize and run the pipeline
pipeline = ProductHuntPipeline(
    directory=directory,
    cookie=cookie_value,
    sha256_hash=sha256_hash_value,
    slug=slug_value
)
pipeline.run(limit=50, max_threads=10)
  1. Replace <YOUR_COOKIE> and <YOUR_SHA256_HASH> with the values you extracted earlier.
  2. Save the script and run it:
    python your_script_name.py
    

Output

  • Individual JSON Files: Each product is saved as a separate JSON file in the downloads/products directory.
  • Consolidated JSON File: All product data is combined into a single products.json file in the downloads directory.

Testing the Library

To ensure everything is working as expected, you can run the test suite.

  1. Install testing dependencies:

    pip install pytest pytest-cov
    
  2. Run the tests:

    pytest -v
    

Contributing

We welcome contributions to enhance this library. If you’d like to contribute:

  1. Fork the repository on GitHub.
  2. Create a feature branch.
  3. Submit a pull request with your changes.

About Unrealos

Unrealos (unrealos.com) is a software development company specializing in AI, SaaS, and PaaS solutions for businesses. With expertise in integrating artificial intelligence into scalable business processes, Unrealos delivers cutting-edge software tailored to your needs.


Additional Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

producthunt_api-0.2.0.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

producthunt_api-0.2.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file producthunt_api-0.2.0.tar.gz.

File metadata

  • Download URL: producthunt_api-0.2.0.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for producthunt_api-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6498f990c156196c36b82f91b2182e7ec1a11ed011284f506059a0e647409a1c
MD5 347352218dba9d0c5f045c0bc9ae5464
BLAKE2b-256 c488fa441f1426b8fc4e7fdf33a8a63b8e0f6829fe66f041ffca4d7a175f30d2

See more details on using hashes here.

File details

Details for the file producthunt_api-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for producthunt_api-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3528deb1f70bda7ee364fdb5ce9736806cd219a6d0c0acbae4295e6b7f8dae70
MD5 819448b3ab65df040f2c7aefd6c45ca9
BLAKE2b-256 4d789c2f01351217dc41f22e8473ea5c28dd887e5356d66d2f8ff40e6363376e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page