Efficiently download HIBP new pwned password data by hash-prefix for a local-copy
Project description
hibp-downloader
This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome HIBP pwned passwords api-endpoint using all the good bits; multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to probably make things as fast as is Pythonly possible.
Features
- Interface to directly
query
for compromised password values from the compressed file data-store! - Download and store acquired data in gzip'd compressed to save on storage and speed up queries.
- Download the full dataset in under 45 mins (generally CPU bound)
- Easily resume interrupted
download
operations into a--data-path
without re-clobbering api-source. - Only download hash-prefix content blocks when the source content has changed (via content ETAG values); making it easy to periodically sync-up when needed.
- Query interface performance is efficient enough to attach a user web-service with reasonable loads (ie don't waste your own resources decompressing the dataset and storing in a database!)
- Ability to generate a single text file with in-order pwned password hash values, similar to PwnedPasswordsDownloader from the awesome HIBP team.
- Per prefix file metadata in JSON format for easy data reuse by other tooling if required.
Install
pipx install hibp-downloader
Usage (download)
Performance
Sample download activity log; host with 32 cores on 500Mbit/s connection.
...
2024-05-16T10:18:01-0400 | INFO | hibp-downloader | prefix=f80c7 source=[lc:13616 et:3 rc:1002358 ro:25 xx:1] processed=[17836.6MB ~414462H/s] api=[918req/s 17597.4MB] runtime=36.4min
2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f81af source=[lc:13616 et:3 rc:1002558 ro:25 xx:1] processed=[17840.1MB ~414454H/s] api=[918req/s 17600.9MB] runtime=36.4min
2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f826f source=[lc:13616 et:3 rc:1002758 ro:25 xx:1] processed=[17843.6MB ~414454H/s] api=[918req/s 17604.4MB] runtime=36.4min
2024-05-16T10:18:03-0400 | INFO | hibp-downloader | prefix=f833f source=[lc:13616 et:3 rc:1002958 ro:25 xx:1] processed=[17847.1MB ~414450H/s] api=[918req/s 17607.9MB] runtime=36.4min
- 918x requests per second to
api.pwnedpasswords.com
- Log sources are shorthand:
lc
: 13616 from local-cache (lc) - request-responses handled locally without hitting the network.et
: 3 etag-matched (et) - request-responses that confirmed our local data was up-to-date and did not require a new download.rc
: 1002958 from remote-cache (rc) - request-responses that were downloaded to local, but came from the remote-server cache.ro
: 25 from remote-origin (ro) - request-responses that were downloaded to local, and the download needed to be fetched from remote origin source.xx
: 1 failed responses - request-responses that failed (and successfully retried).
- ~17GB downloaded in ~36 minutes (full dataset)
- Approx ~414k hash values received per second
- Processing in this example appears to be CPU bound, measured traffic around ~160 Mbit/s.
Usage (query)
Project
- Github - github.com/threatpatrols/hibp-downloader
- PyPI - pypi.org/project/hibp-downloader/
- ReadTheDocs - hibp-downloader.readthedocs.io
Copyright
- Copyright © 2023-2024 Threat Patrols Pty Ltd
- Copyright © 2023-2024 Nicholas de Jong
All rights reserved.
License
- BSD-3-Clause - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hibp_downloader-0.3.2.tar.gz
(20.6 kB
view details)
Built Distribution
File details
Details for the file hibp_downloader-0.3.2.tar.gz
.
File metadata
- Download URL: hibp_downloader-0.3.2.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
7eaed086ec3b50af31e295850bb56e470e82676c52cb0de2f8774568e72c6023
|
|
MD5 |
8a789804f2e94f92db7418add548daca
|
|
BLAKE2b-256 |
b1b7823b5db215c0892f0bfac5080303ea9078d22dbed95a6ead116a34575169
|
File details
Details for the file hibp_downloader-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: hibp_downloader-0.3.2-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
c2a5308eccdae351e33c5d99d0ef5652c6fa27230a84e77dd9987956d315d17f
|
|
MD5 |
e093e6547b3d3fafff11f54a465f0d86
|
|
BLAKE2b-256 |
0c97383d94aa70c61026046f80a50b468a498e33be9b6d90311cbe0ee07726a5
|