A high-performance asynchronous tool for fetching and storing Hacker News items in a SQLite database.
Project description
Hacker News Data Fetcher
A tool to fetch and store Hacker News data in a SQLite database.
Installation
To install the Hacker News Data Fetcher, follow these steps:
-
Install:
pip install hn-data-fetcher
-
Run the Script:
-
The script can be run in four different modes:
update,backfill,overwrite, andoverwrite-from-date. -
Use the following command to run the script:
hn_data_fetcher --mode <mode> [--start-id <start_id>] [--start-date <start_date>] [--db-name <db_name>] [--concurrent-requests <concurrent_requests>] [--update-interval <update_interval>] [--db-queue-size <db_queue_size>] [--db-commit-interval <db_commit_interval>] [--tcp-limit <tcp_limit>]
-
Parameters:
--mode: Operation mode. Choices areupdate,backfill,overwrite, oroverwrite-from-date.--start-id: Starting ID foroverwritemode (required if mode isoverwrite).--start-date: Starting date foroverwrite-from-datemode in YYYY-MM-DD format (required if mode isoverwrite-from-date).--db-name: Path to the SQLite database file to store HN items (default:hn2.db).--concurrent-requests: Maximum number of concurrent API requests to HN (default:1000).--update-interval: How often to update the progress bar, in number of items processed (default:1000).--db-queue-size: Maximum size of the database operation queue (default:1000).--db-commit-interval: How often to commit database transactions, in number of items (default:1000).--tcp-limit: Maximum number of TCP connections.0means unlimited (default:0).
-
Examples:
- To update the database with new items:
hn-data-fetcher --mode update
- To backfill the database with historical items:
hn-data-fetcher --mode backfill
- To overwrite existing items starting from a specific ID:
hn-data-fetcher --mode overwrite --start-id 1000
- To overwrite existing items starting from a specific date:
hn-data-fetcher --mode overwrite-from-date --start-date 2024-01-01
- To update the database with new items:
-
-
Monitor Progress:
- The script provides a progress bar with an estimated time of arrival (ETA) for completion.
- It also handles errors gracefully and ensures that the database is updated correctly.
-
Graceful Shutdown:
- You can stop the script at any time by pressing
Ctrl+C. The script will handle the shutdown gracefully, ensuring that all ongoing transactions are completed.
- You can stop the script at any time by pressing
Local Development
-
Install Development Dependencies:
- Install the package in editable mode and development dependencies:
pip install -e . pip install -r requirements-dev.txt
- Install the package in editable mode and development dependencies:
-
Run Tests:
- Execute the test suite:
pytest tests/ -v
- Execute the test suite:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hn_data_fetcher-1.2.0.tar.gz.
File metadata
- Download URL: hn_data_fetcher-1.2.0.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b9d064f92ea34f6e83b0c04fe3ce8faf23bfbcc1b43e7768d3aac6f5c100ddc
|
|
| MD5 |
566c6f2edd815013cf4351ef8c81b725
|
|
| BLAKE2b-256 |
d00972876d8fe6d71769ad83b677c2e0cf17e8689fc8b217ae66edb609e92ebb
|
Provenance
The following attestation bundles were made for hn_data_fetcher-1.2.0.tar.gz:
Publisher:
workflow.yml on adhikasp/hn-data-fetcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hn_data_fetcher-1.2.0.tar.gz -
Subject digest:
6b9d064f92ea34f6e83b0c04fe3ce8faf23bfbcc1b43e7768d3aac6f5c100ddc - Sigstore transparency entry: 176864553
- Sigstore integration time:
-
Permalink:
adhikasp/hn-data-fetcher@715c112e076eb31b1a57288cfec1514fe11df969 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/adhikasp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@715c112e076eb31b1a57288cfec1514fe11df969 -
Trigger Event:
push
-
Statement type:
File details
Details for the file hn_data_fetcher-1.2.0-py3-none-any.whl.
File metadata
- Download URL: hn_data_fetcher-1.2.0-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96790805b01c11c81438854afa1ab0a61e1190d44a92ce8291fb3e611473a607
|
|
| MD5 |
0afce9a73e22bd682f7c24e977f0a422
|
|
| BLAKE2b-256 |
d4ff34c6d06bfc78381839c760eb61b00c173198bb2f206dad4c757c190644b8
|
Provenance
The following attestation bundles were made for hn_data_fetcher-1.2.0-py3-none-any.whl:
Publisher:
workflow.yml on adhikasp/hn-data-fetcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hn_data_fetcher-1.2.0-py3-none-any.whl -
Subject digest:
96790805b01c11c81438854afa1ab0a61e1190d44a92ce8291fb3e611473a607 - Sigstore transparency entry: 176864554
- Sigstore integration time:
-
Permalink:
adhikasp/hn-data-fetcher@715c112e076eb31b1a57288cfec1514fe11df969 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/adhikasp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@715c112e076eb31b1a57288cfec1514fe11df969 -
Trigger Event:
push
-
Statement type: