Efficiently download HIBP new pwned password data by hash-prefix for a local-copy
Project description
hibp-downloader
This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome HIBP pwned passwords api-endpoint using all the good bits; multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to probably make things as fast as is Pythonly possible.
Features
- Direct password lookups via the
querycommand — check passwords against the compressed data store with no database or decompression step needed. Fast enough to use behind a web service. - Download and store acquired data in gzip compressed format to save on storage and speed up queries.
- Download the full dataset in under 45 mins (generally CPU bound).
- Easily resume interrupted
downloadoperations into a--data-pathwithout re-clobbering api-source. - Only download hash-prefix content blocks when the source content has changed (via content ETAG values); making it easy to periodically sync-up when needed.
- Ability to generate a single text file with in-order pwned password hash values, similar to PwnedPasswordsDownloader from the awesome HIBP team.
- Per prefix file metadata in JSON format for easy data reuse by other tooling if required.
- Standalone validation command to verify the local copy dataset, clean up corrupted or incomplete files, and remove orphaned metadata files.
Install
pipx install hibp-downloader
Usage (download)
Performance
Sample download activity log; host with 32 cores on 500Mbit/s connection.
...
2024-05-16T10:18:01-0400 | INFO | hibp-downloader | prefix=f80c7 source=[lc:13616 et:3 rc:1002358 ro:25 xx:1] processed=[17836.6MB ~414462H/s] api=[918req/s 17597.4MB] runtime=36.4min
2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f81af source=[lc:13616 et:3 rc:1002558 ro:25 xx:1] processed=[17840.1MB ~414454H/s] api=[918req/s 17600.9MB] runtime=36.4min
2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f826f source=[lc:13616 et:3 rc:1002758 ro:25 xx:1] processed=[17843.6MB ~414454H/s] api=[918req/s 17604.4MB] runtime=36.4min
2024-05-16T10:18:03-0400 | INFO | hibp-downloader | prefix=f833f source=[lc:13616 et:3 rc:1002958 ro:25 xx:1] processed=[17847.1MB ~414450H/s] api=[918req/s 17607.9MB] runtime=36.4min
- 918x requests per second to
api.pwnedpasswords.com - Log sources are shorthand:
lc: local-cache - request-responses handled locally without hitting the network.et: ETag match - request-responses that confirmed our local data was up-to-date and did not require a new download.rc: remote-cache - request-responses that were downloaded to local, but came from the remote-server cache.ro: remote-origin - request-responses that were downloaded to local, and the download needed to be fetched from remote origin source.xx: unknown/failed - request-responses that failed (and successfully retried).
- ~17GB downloaded in ~36 minutes (full dataset)
- Approx ~414k hash values received per second
- Processing in this example appears to be CPU bound, measured traffic around ~160 Mbit/s.
Usage (query)
Query passwords directly against the compressed data store — no decompression, no database
import required. This is the recommended approach for any password-checking lookup.
Usage (generate)
Generate a single decompressed text file from the data store. If you are generating this to
import into a database for lookups, consider using the query command directly instead —
it is faster to set up, far easier to maintain, and uses a fraction of the storage.
hibp-downloader --data-path /path/to/data generate --filename pwned-hashes.txt --hash-type sha1
Usage (validate)
Validate local pwned password files and automatically clean up corrupted data or orphaned metadata files:
hibp-downloader --data-path /path/to/data validate --hash-type sha1
Project
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hibp_downloader-0.4.8.tar.gz.
File metadata
- Download URL: hibp_downloader-0.4.8.tar.gz
- Upload date:
- Size: 31.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f62d95b63a47fdc35dcc3c03c5674748afc310db0505d9682c94704f313dc2c3
|
|
| MD5 |
39cf35efbdd16be8231a77858836a8c0
|
|
| BLAKE2b-256 |
8fdad19987b30cddde6be3c1abed2139488026067adbdef497a6ac2969bea193
|
Provenance
The following attestation bundles were made for hibp_downloader-0.4.8.tar.gz:
Publisher:
build-tests.yml on threatpatrols/hibp-downloader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hibp_downloader-0.4.8.tar.gz -
Subject digest:
f62d95b63a47fdc35dcc3c03c5674748afc310db0505d9682c94704f313dc2c3 - Sigstore transparency entry: 1742098287
- Sigstore integration time:
-
Permalink:
threatpatrols/hibp-downloader@f0518cc825af791de068541070fd1befa70fe8f6 -
Branch / Tag:
refs/tags/0.4.8 - Owner: https://github.com/threatpatrols
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-tests.yml@f0518cc825af791de068541070fd1befa70fe8f6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file hibp_downloader-0.4.8-py3-none-any.whl.
File metadata
- Download URL: hibp_downloader-0.4.8-py3-none-any.whl
- Upload date:
- Size: 31.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f5eb78cc5737dbb5a47871690d78cfb97052aa03374420a87a84d2af6cc164c
|
|
| MD5 |
911ca5d1c0f8eb7debf213f6d25058ae
|
|
| BLAKE2b-256 |
128c357980c7ea138706b62ec584ffe98c622be2d06e218b63d7c5abcefb03b1
|
Provenance
The following attestation bundles were made for hibp_downloader-0.4.8-py3-none-any.whl:
Publisher:
build-tests.yml on threatpatrols/hibp-downloader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hibp_downloader-0.4.8-py3-none-any.whl -
Subject digest:
1f5eb78cc5737dbb5a47871690d78cfb97052aa03374420a87a84d2af6cc164c - Sigstore transparency entry: 1742098431
- Sigstore integration time:
-
Permalink:
threatpatrols/hibp-downloader@f0518cc825af791de068541070fd1befa70fe8f6 -
Branch / Tag:
refs/tags/0.4.8 - Owner: https://github.com/threatpatrols
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-tests.yml@f0518cc825af791de068541070fd1befa70fe8f6 -
Trigger Event:
push
-
Statement type: