Efficiently download HIBP new pwned password data by hash-prefix for a local-copy
Project description
hibp-downloader
This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome HIBP pwned passwords api-endpoint using multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to make things as fast as (seems) Pythonly possible.
Features
- Only download hash-prefix content blocks when the hash-prefix block content has changed (via content ETAG values).
- Start, stop and re-start the data-collection process without loss of data already collected.
- Ability to query clear text values and return results from the pwned password data set.
- Generate a single text file with pwned password hash values in-order, similar to PwnedPasswordsDownloader from the HIBP team.
- Per prefix file metadata in JSON format for easy data reuse.
Install
pip install --upgrade hibp-downloader
Usage
Performance
Sample download activity log; host with 12 cores on 45Mbit/s DSL connection.
2023-07-31T03:22:45+1000 | INFO | hibp-downloader | prefix=e585f source=[lc:265201 et:0 rc:722148 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~71005H/s] runtime=2.33hr download=11748.0MB
2023-07-31T03:22:48+1000 | INFO | hibp-downloader | prefix=e5877 source=[lc:265201 et:0 rc:722268 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~70998H/s] runtime=2.33hr download=11750.0MB
2023-07-31T03:22:50+1000 | INFO | hibp-downloader | prefix=f5837 source=[lc:265201 et:0 rc:722388 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~70992H/s] runtime=2.33hr download=11751.9MB
- 86 requests per second to
api.pwnedpasswords.com - 265,201 prefix files from (
lc) local-cache; 722,388 from (rc) remote-cache; 3 from (ro) remote-origin; 0 failed (xx) download - estimated ~70k hash values downloaded per second
- 11.5GB (11,751MB) downloaded in 2.3 hours (full dataset is ~3.5 hours)
Project
- Github - github.com/threatpatrols/hibp-downloader
- PyPI - pypi.org/project/hibp-downloader/
- ReadTheDocs - hibp-downloader.readthedocs.io
Copyright
- Copyright © 2023 Threat Patrols Pty Ltd
- Copyright © 2023 Nicholas de Jong
All rights reserved.
License
- BSD-3-Clause - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hibp_downloader-0.1.5.tar.gz.
File metadata
- Download URL: hibp_downloader-0.1.5.tar.gz
- Upload date:
- Size: 18.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/35.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.26.13 tqdm/4.64.0 importlib-metadata/4.6.4 keyring/23.5.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
463b0f34ace217bff068f67bf1bcf6c22db438515a0141fdefc58ad4c498a765
|
|
| MD5 |
15c98f3fa373f83856a566c9d4d4aed6
|
|
| BLAKE2b-256 |
e338248c69758fd762b404ff024392e02cd76f32015b534000171f072929ee37
|
File details
Details for the file hibp_downloader-0.1.5-py3-none-any.whl.
File metadata
- Download URL: hibp_downloader-0.1.5-py3-none-any.whl
- Upload date:
- Size: 24.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/35.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.26.13 tqdm/4.64.0 importlib-metadata/4.6.4 keyring/23.5.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6430bfb7f4e392db78b5b520158b22bc7417bec2339c9bdbb330ecd2bf8f8b9d
|
|
| MD5 |
f8c0c54da303de794496ed1fe6d9b498
|
|
| BLAKE2b-256 |
f7f4fe021b76bddfe62f66e1b7a9dc373bdc5d10befcc3d20ea56b65db8d0202
|