Skip to main content

A high-performance asynchronous tool for fetching and storing Hacker News items in a SQLite database.

Project description

Hacker News Data Fetcher

A tool to fetch and store Hacker News data in a SQLite database.

Installation

To install the Hacker News Data Fetcher, follow these steps:

  1. Install:

    pip install hn-data-fetcher
    
  2. Run the Script:

    • The script can be run in four different modes: update, backfill, overwrite, and overwrite-from-date.

    • Use the following command to run the script:

      hn_data_fetcher --mode <mode> [--start-id <start_id>] [--start-date <start_date>] [--db-name <db_name>] [--concurrent-requests <concurrent_requests>] [--update-interval <update_interval>] [--db-queue-size <db_queue_size>] [--db-commit-interval <db_commit_interval>] [--tcp-limit <tcp_limit>]
      
    • Parameters:

      • --mode: Operation mode. Choices are update, backfill, overwrite, or overwrite-from-date.
      • --start-id: Starting ID for overwrite mode (required if mode is overwrite).
      • --start-date: Starting date for overwrite-from-date mode in YYYY-MM-DD format (required if mode is overwrite-from-date).
      • --db-name: Path to the SQLite database file to store HN items (default: hn2.db).
      • --concurrent-requests: Maximum number of concurrent API requests to HN (default: 1000).
      • --update-interval: How often to update the progress bar, in number of items processed (default: 1000).
      • --db-queue-size: Maximum size of the database operation queue (default: 1000).
      • --db-commit-interval: How often to commit database transactions, in number of items (default: 1000).
      • --tcp-limit: Maximum number of TCP connections. 0 means unlimited (default: 0).
    • Examples:

      • To update the database with new items:
        hn-data-fetcher --mode update
        
      • To backfill the database with historical items:
        hn-data-fetcher --mode backfill
        
      • To overwrite existing items starting from a specific ID:
        hn-data-fetcher --mode overwrite --start-id 1000
        
      • To overwrite existing items starting from a specific date:
        hn-data-fetcher --mode overwrite-from-date --start-date 2024-01-01
        
  3. Monitor Progress:

    • The script provides a progress bar with an estimated time of arrival (ETA) for completion.
    • It also handles errors gracefully and ensures that the database is updated correctly.
  4. Graceful Shutdown:

    • You can stop the script at any time by pressing Ctrl+C. The script will handle the shutdown gracefully, ensuring that all ongoing transactions are completed.

Local Development

  1. Install Development Dependencies:

    • Install the package in editable mode and development dependencies:
      pip install -e .
      pip install -r requirements-dev.txt
      
  2. Run Tests:

    • Execute the test suite:
      pytest tests/ -v
      

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hn_data_fetcher-1.1.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hn_data_fetcher-1.1.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file hn_data_fetcher-1.1.0.tar.gz.

File metadata

  • Download URL: hn_data_fetcher-1.1.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hn_data_fetcher-1.1.0.tar.gz
Algorithm Hash digest
SHA256 a9148cef822859eaa12ce5ea473640b61e07d88947ebf53f7a0777547c126cf5
MD5 ca35d5bd9ed17133dbc38f6c1803c1ac
BLAKE2b-256 7b11cc414332e7ff0cfb9c0600942db6c76e7d5ce02fd1225975a8e5ae322fcc

See more details on using hashes here.

Provenance

The following attestation bundles were made for hn_data_fetcher-1.1.0.tar.gz:

Publisher: workflow.yml on adhikasp/hn-data-fetcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hn_data_fetcher-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for hn_data_fetcher-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e92f06955c08b7a23bafbc75d60dc4dd4ab0dabff6287757014d5ddcd7789b33
MD5 a4552396118388bed0ff106b6f8e98ad
BLAKE2b-256 d56e9e21dce363c8fa54b345bac94c85fb78fbf598ef53f4c058770e0d7376af

See more details on using hashes here.

Provenance

The following attestation bundles were made for hn_data_fetcher-1.1.0-py3-none-any.whl:

Publisher: workflow.yml on adhikasp/hn-data-fetcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page