Skip to main content

A high-performance asynchronous tool for fetching and storing Hacker News items in a SQLite database.

Project description

Hacker News Data Fetcher

A tool to fetch and store Hacker News data in a SQLite database.

Installation

To install the Hacker News Data Fetcher, follow these steps:

  1. Install:

    pip install hn-data-fetcher
    
  2. Run the Script:

    • The script can be run in three different modes: update, backfill, and overwrite.

    • Use the following command to run the script:

      hn_data_fetcher --mode <mode> [--start-id <start_id>] [--db-name <db_name>] [--concurrent-requests <concurrent_requests>] [--update-interval <update_interval>] [--db-queue-size <db_queue_size>] [--db-commit-interval <db_commit_interval>] [--tcp-limit <tcp_limit>]
      
    • Parameters:

      • --mode: Operation mode. Choices are update, backfill, or overwrite.
      • --start-id: Starting ID for overwrite mode (required if mode is overwrite).
      • --db-name: Path to the SQLite database file to store HN items (default: hn2.db).
      • --concurrent-requests: Maximum number of concurrent API requests to HN (default: 1000).
      • --update-interval: How often to update the progress bar, in number of items processed (default: 1000).
      • --db-queue-size: Maximum size of the database operation queue (default: 1000).
      • --db-commit-interval: How often to commit database transactions, in number of items (default: 1000).
      • --tcp-limit: Maximum number of TCP connections. 0 means unlimited (default: 0).
    • Examples:

      • To update the database with new items:
        hn-data-fetcher --mode update
        
      • To backfill the database with historical items:
        hn-data-fetcher --mode backfill
        
      • To overwrite existing items starting from a specific ID:
        hn-data-fetcher --mode overwrite --start-id 1000
        
  3. Monitor Progress:

    • The script provides a progress bar with an estimated time of arrival (ETA) for completion.
    • It also handles errors gracefully and ensures that the database is updated correctly.
  4. Graceful Shutdown:

    • You can stop the script at any time by pressing Ctrl+C. The script will handle the shutdown gracefully, ensuring that all ongoing transactions are completed.

Local Development

  1. Install Development Dependencies:

    • Install the package in editable mode and development dependencies:
      pip install -e .
      pip install -r requirements-dev.txt
      
  2. Run Tests:

    • Execute the test suite:
      pytest tests/ -v
      

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hn_data_fetcher-1.0.3.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hn_data_fetcher-1.0.3-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file hn_data_fetcher-1.0.3.tar.gz.

File metadata

  • Download URL: hn_data_fetcher-1.0.3.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for hn_data_fetcher-1.0.3.tar.gz
Algorithm Hash digest
SHA256 bd7f269a8854a8ca86317b76e1de478699cc076af8a88bf44d37fc34ce25a3b9
MD5 db04d6f8cf3c4143afe61793839492f6
BLAKE2b-256 afe780323604e7f91acdc88257992e0ab6632d1ef3db570cab13a97a541d848a

See more details on using hashes here.

Provenance

The following attestation bundles were made for hn_data_fetcher-1.0.3.tar.gz:

Publisher: workflow.yml on adhikasp/hn-data-fetcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hn_data_fetcher-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for hn_data_fetcher-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8918f19abc2792aa1b12c6dbed885fb2a63dc3ea4fa5ed9147545ec42020864d
MD5 4789d75a83d82138297784573bf681ba
BLAKE2b-256 f2403939792ece1a6a53b77c472ecef42c58b3e082aa1cc1142c3cd949e78b3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for hn_data_fetcher-1.0.3-py3-none-any.whl:

Publisher: workflow.yml on adhikasp/hn-data-fetcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page