Skip to main content

A high-performance asynchronous tool for fetching and storing Hacker News items in a SQLite database.

Project description

Hacker News Data Fetcher

A tool to fetch and store Hacker News data in a SQLite database.

Installation

To install the Hacker News Data Fetcher, follow these steps:

  1. Clone the Repository:

    git clone git@github.com:adhikasp/hn-data-fetcher.git
    cd hn-data-fetcher
    
  2. Set Up the Environment:

    • Ensure you have Python 3.7+ installed.
    • Create a virtual environment and activate it:
      python -m venv venv
      source venv/bin/activate  # On Windows use `venv\Scripts\activate`
      
    • Install the required dependencies:
      pip install -r requirements.txt
      
  3. Run the Script:

    • The script can be run in three different modes: update, backfill, and overwrite.

    • Use the following command to run the script:

      python hn_data_fetcher.py --mode <mode> [--start-id <start_id>] [--db-name <db_name>] [--concurrent-requests <concurrent_requests>] [--update-interval <update_interval>] [--db-queue-size <db_queue_size>] [--db-commit-interval <db_commit_interval>] [--tcp-limit <tcp_limit>]
      
    • Parameters:

      • --mode: Operation mode. Choices are update, backfill, or overwrite.
      • --start-id: Starting ID for overwrite mode (required if mode is overwrite).
      • --db-name: Path to the SQLite database file to store HN items (default: hn2.db).
      • --concurrent-requests: Maximum number of concurrent API requests to HN (default: 1000).
      • --update-interval: How often to update the progress bar, in number of items processed (default: 1000).
      • --db-queue-size: Maximum size of the database operation queue (default: 1000).
      • --db-commit-interval: How often to commit database transactions, in number of items (default: 1000).
      • --tcp-limit: Maximum number of TCP connections. 0 means unlimited (default: 0).
    • Examples:

      • To update the database with new items:
        python hn_data_fetcher.py --mode update
        
      • To backfill the database with historical items:
        python hn_data_fetcher.py --mode backfill
        
      • To overwrite existing items starting from a specific ID:
        python hn_data_fetcher.py --mode overwrite --start-id 1000
        
  4. Monitor Progress:

    • The script provides a progress bar with an estimated time of arrival (ETA) for completion.
    • It also handles errors gracefully and ensures that the database is updated correctly.
  5. Graceful Shutdown:

    • You can stop the script at any time by pressing Ctrl+C. The script will handle the shutdown gracefully, ensuring that all ongoing transactions are completed.

Local Development

  1. Install Development Dependencies:

    • Install the package in editable mode and development dependencies:
      pip install -e .
      pip install -r requirements-dev.txt
      
  2. Run Tests:

    • Execute the test suite:
      pytest tests/ -v
      

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hn_data_fetcher-0.1.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hn_data_fetcher-0.1.0-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file hn_data_fetcher-0.1.0.tar.gz.

File metadata

  • Download URL: hn_data_fetcher-0.1.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for hn_data_fetcher-0.1.0.tar.gz
Algorithm Hash digest
SHA256 82a3767b62e6953501917e005d217d5d6fe2aaca7c68437d49d2a2e0162a9c39
MD5 196cd27ed2c5ca22d73bf6b1b4fb7572
BLAKE2b-256 95d78dddc408035bceda84834374f6bfe58560ace759235eb769439df08fce03

See more details on using hashes here.

Provenance

The following attestation bundles were made for hn_data_fetcher-0.1.0.tar.gz:

Publisher: workflow.yml on adhikasp/hn-data-fetcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hn_data_fetcher-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for hn_data_fetcher-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f9dc17b25c35f0d94f3ba8cc4d926685a852091aa511964f2e86dc11feb5bf18
MD5 c96f3542b14908ae8931df0ca6932395
BLAKE2b-256 816dab39f70bee86409896109e70563ed2fd760417db9269e4d8536d012ae13f

See more details on using hashes here.

Provenance

The following attestation bundles were made for hn_data_fetcher-0.1.0-py3-none-any.whl:

Publisher: workflow.yml on adhikasp/hn-data-fetcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page