Efficiently download HIBP new pwned password data by hash-prefix for a local-copy
Project description
hibp-downloader
This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome HIBP pwned passwords api-endpoint using multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to make things as fast as (seems) Pythonly possible.
Features
- Only download hash-prefix content blocks when the hash-prefix block content has changed (via content ETAG values).
- Start, stop and re-start the data-collection process without loss of data already collected.
- Ability to query clear text values and return results from the pwned password data set.
- Generate a single text file with pwned password hash values in-order, similar to PwnedPasswordsDownloader from the HIBP team.
- Per prefix file metadata in JSON format for easy data reuse.
Install
pip install --upgrade hibp-downloader
Usage
Performance
Sample download activity log; host with 12 cores on 45Mbit/s DSL connection.
2023-07-31T03:22:45+1000 | INFO | hibp-downloader | prefix=e585f source=[lc:265201 et:0 rc:722148 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~71005H/s] runtime=2.33hr download=11748.0MB
2023-07-31T03:22:48+1000 | INFO | hibp-downloader | prefix=e5877 source=[lc:265201 et:0 rc:722268 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~70998H/s] runtime=2.33hr download=11750.0MB
2023-07-31T03:22:50+1000 | INFO | hibp-downloader | prefix=f5837 source=[lc:265201 et:0 rc:722388 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~70992H/s] runtime=2.33hr download=11751.9MB
- 86 requests per second to
api.pwnedpasswords.com
- 265,201 prefix files from (
lc
) local-cache; 722,388 from (rc
) remote-cache; 3 from (ro
) remote-origin; 0 failed (xx
) download - estimated ~70k hash values downloaded per second
- 11.5GB (11,751MB) downloaded in 2.3 hours (full dataset is ~3.5 hours)
Project
- Github - github.com/threatpatrols/hibp-downloader
- PyPI - pypi.org/project/hibp-downloader/
- ReadTheDocs - hibp-downloader.readthedocs.io
Copyright
- Copyright © 2023 Threat Patrols Pty Ltd
- Copyright © 2023 Nicholas de Jong
All rights reserved.
License
- BSD-3-Clause - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hibp_downloader-0.1.5.tar.gz
(18.9 kB
view hashes)
Built Distribution
Close
Hashes for hibp_downloader-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6430bfb7f4e392db78b5b520158b22bc7417bec2339c9bdbb330ecd2bf8f8b9d |
|
MD5 | f8c0c54da303de794496ed1fe6d9b498 |
|
BLAKE2b-256 | f7f4fe021b76bddfe62f66e1b7a9dc373bdc5d10befcc3d20ea56b65db8d0202 |