Skip to main content

A high-performance asynchronous tool for fetching and storing Hacker News items in a SQLite database.

Project description

Hacker News Data Fetcher

A tool to fetch and store Hacker News data in a SQLite database.

Installation

To install the Hacker News Data Fetcher, follow these steps:

  1. Install:

    pip install hn-data-fetcher
    
  2. Run the Script:

    • The script can be run in four different modes: update, backfill, overwrite, and overwrite-from-date.

    • Use the following command to run the script:

      hn_data_fetcher --mode <mode> [--start-id <start_id>] [--start-date <start_date>] [--db-name <db_name>] [--concurrent-requests <concurrent_requests>] [--update-interval <update_interval>] [--db-queue-size <db_queue_size>] [--db-commit-interval <db_commit_interval>] [--tcp-limit <tcp_limit>]
      
    • Parameters:

      • --mode: Operation mode. Choices are update, backfill, overwrite, or overwrite-from-date.
      • --start-id: Starting ID for overwrite mode (required if mode is overwrite).
      • --start-date: Starting date for overwrite-from-date mode in YYYY-MM-DD format (required if mode is overwrite-from-date).
      • --db-name: Path to the SQLite database file to store HN items (default: hn2.db).
      • --concurrent-requests: Maximum number of concurrent API requests to HN (default: 1000).
      • --update-interval: How often to update the progress bar, in number of items processed (default: 1000).
      • --db-queue-size: Maximum size of the database operation queue (default: 1000).
      • --db-commit-interval: How often to commit database transactions, in number of items (default: 1000).
      • --tcp-limit: Maximum number of TCP connections. 0 means unlimited (default: 0).
    • Examples:

      • To update the database with new items:
        hn-data-fetcher --mode update
        
      • To backfill the database with historical items:
        hn-data-fetcher --mode backfill
        
      • To overwrite existing items starting from a specific ID:
        hn-data-fetcher --mode overwrite --start-id 1000
        
      • To overwrite existing items starting from a specific date:
        hn-data-fetcher --mode overwrite-from-date --start-date 2024-01-01
        
  3. Monitor Progress:

    • The script provides a progress bar with an estimated time of arrival (ETA) for completion.
    • It also handles errors gracefully and ensures that the database is updated correctly.
  4. Graceful Shutdown:

    • You can stop the script at any time by pressing Ctrl+C. The script will handle the shutdown gracefully, ensuring that all ongoing transactions are completed.

Local Development

  1. Install Development Dependencies:

    • Install the package in editable mode and development dependencies:
      pip install -e .
      pip install -r requirements-dev.txt
      
  2. Run Tests:

    • Execute the test suite:
      pytest tests/ -v
      

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hn_data_fetcher-1.2.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hn_data_fetcher-1.2.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file hn_data_fetcher-1.2.0.tar.gz.

File metadata

  • Download URL: hn_data_fetcher-1.2.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hn_data_fetcher-1.2.0.tar.gz
Algorithm Hash digest
SHA256 6b9d064f92ea34f6e83b0c04fe3ce8faf23bfbcc1b43e7768d3aac6f5c100ddc
MD5 566c6f2edd815013cf4351ef8c81b725
BLAKE2b-256 d00972876d8fe6d71769ad83b677c2e0cf17e8689fc8b217ae66edb609e92ebb

See more details on using hashes here.

Provenance

The following attestation bundles were made for hn_data_fetcher-1.2.0.tar.gz:

Publisher: workflow.yml on adhikasp/hn-data-fetcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hn_data_fetcher-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for hn_data_fetcher-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 96790805b01c11c81438854afa1ab0a61e1190d44a92ce8291fb3e611473a607
MD5 0afce9a73e22bd682f7c24e977f0a422
BLAKE2b-256 d4ff34c6d06bfc78381839c760eb61b00c173198bb2f206dad4c757c190644b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for hn_data_fetcher-1.2.0-py3-none-any.whl:

Publisher: workflow.yml on adhikasp/hn-data-fetcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page