Skip to main content

A high-performance asynchronous tool for fetching and storing Hacker News items in a SQLite database.

Project description

Hacker News Data Fetcher

A tool to fetch and store Hacker News data in a SQLite database.

Installation

To install the Hacker News Data Fetcher, follow these steps:

  1. Install:

    pip install hn-data-fetcher
    
  2. Run the Script:

    • The script can be run in four different modes: update, backfill, overwrite, and overwrite-from-date.

    • Use the following command to run the script:

      hn_data_fetcher --mode <mode> [--start-id <start_id>] [--start-date <start_date>] [--db-name <db_name>] [--concurrent-requests <concurrent_requests>] [--update-interval <update_interval>] [--db-queue-size <db_queue_size>] [--db-commit-interval <db_commit_interval>] [--tcp-limit <tcp_limit>]
      
    • Parameters:

      • --mode: Operation mode. Choices are update, backfill, overwrite, or overwrite-from-date.
      • --start-id: Starting ID for overwrite mode (required if mode is overwrite).
      • --start-date: Starting date for overwrite-from-date mode in YYYY-MM-DD format (required if mode is overwrite-from-date).
      • --db-name: Path to the SQLite database file to store HN items (default: hn2.db).
      • --concurrent-requests: Maximum number of concurrent API requests to HN (default: 1000).
      • --update-interval: How often to update the progress bar, in number of items processed (default: 1000).
      • --db-queue-size: Maximum size of the database operation queue (default: 1000).
      • --db-commit-interval: How often to commit database transactions, in number of items (default: 1000).
      • --tcp-limit: Maximum number of TCP connections. 0 means unlimited (default: 0).
    • Examples:

      • To update the database with new items:
        hn-data-fetcher --mode update
        
      • To backfill the database with historical items:
        hn-data-fetcher --mode backfill
        
      • To overwrite existing items starting from a specific ID:
        hn-data-fetcher --mode overwrite --start-id 1000
        
      • To overwrite existing items starting from a specific date:
        hn-data-fetcher --mode overwrite-from-date --start-date 2024-01-01
        
  3. Monitor Progress:

    • The script provides a progress bar with an estimated time of arrival (ETA) for completion.
    • It also handles errors gracefully and ensures that the database is updated correctly.
  4. Graceful Shutdown:

    • You can stop the script at any time by pressing Ctrl+C. The script will handle the shutdown gracefully, ensuring that all ongoing transactions are completed.

Local Development

  1. Install Development Dependencies:

    • Install the package in editable mode and development dependencies:
      pip install -e .
      pip install -r requirements-dev.txt
      
  2. Run Tests:

    • Execute the test suite:
      pytest tests/ -v
      

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hn_data_fetcher-1.2.1.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hn_data_fetcher-1.2.1-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file hn_data_fetcher-1.2.1.tar.gz.

File metadata

  • Download URL: hn_data_fetcher-1.2.1.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hn_data_fetcher-1.2.1.tar.gz
Algorithm Hash digest
SHA256 aed9e47ae0d30fd6e2af8149aed72fdc8d641b5dc1d1cdf10745aea46a2fc9a2
MD5 db5c00cfb005b2803adfb71bb1547882
BLAKE2b-256 01c8c797d7bcb7232b002594977f97a5548e1d2ef3e628ccdd5db0ea7fe02942

See more details on using hashes here.

Provenance

The following attestation bundles were made for hn_data_fetcher-1.2.1.tar.gz:

Publisher: workflow.yml on adhikasp/hn-data-fetcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hn_data_fetcher-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for hn_data_fetcher-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3ff8ff59c71ff5cc4aef95418ad86cb0b80f2696a41f5406ca143517d4ef28d8
MD5 49035748d63395d44c27375616aa796d
BLAKE2b-256 f0b91f8193f81bc9a65f68afcf872ca0736e925db57670d108f995a18522f1aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for hn_data_fetcher-1.2.1-py3-none-any.whl:

Publisher: workflow.yml on adhikasp/hn-data-fetcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page