Efficiently download HIBP new pwned password data by hash-prefix for a local-copy
Project description
hibp-downloader
This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome HIBP pwned passwords api-endpoint using all the good bits; multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to make things as fast as is Pythonly possible.
Features
- Easily resume interrupted
download
operations into a--data-path
without re-clobbering api-source. - Only download hash-prefix content blocks when the source content has changed (via content ETAG values); thus making it easy to periodically re-sync when needed.
- Ability to directly
query
for compromised password values from the data in-place; efficient enough to attach a service with reasonable loads. - Ability to generate a single text file with in-order pwned password hash values, similar to PwnedPasswordsDownloader from the HIBP team.
- Per prefix file metadata in JSON format for easy data reuse by other tooling if required.
Install
pip install --upgrade hibp-downloader
Usage
Performance
Sample download activity log; host with 12 cores on 45Mbit/s DSL connection.
2023-11-12T21:25:08+1000 | INFO | hibp-downloader | prefix=00ec3 source=[lc:10 et:2 rc:3800 ro:0 xx:0] processed=[62.0MB ~43589H/s] api=[105req/s 60.0MB] runtime=1.2min
2023-11-12T21:25:09+1000 | INFO | hibp-downloader | prefix=00eff source=[lc:10 et:2 rc:3850 ro:0 xx:0] processed=[62.8MB ~43547H/s] api=[105req/s 60.8MB] runtime=1.2min
2023-11-12T21:25:10+1000 | INFO | hibp-downloader | prefix=00f3b source=[lc:10 et:2 rc:3900 ro:0 xx:0] processed=[63.7MB ~43528H/s] api=[105req/s 61.7MB] runtime=1.2min
2023-11-12T21:25:11+1000 | INFO | hibp-downloader | prefix=00f6d source=[lc:10 et:2 rc:3950 ro:0 xx:0] processed=[64.5MB ~43541H/s] api=[105req/s 62.5MB] runtime=1.3min
- 105x requests per second to
api.pwnedpasswords.com
- Log sources are shorthand:
lc
: 10x prefix files from local-cacheet
: 2x etag-match responsesrc
: 3950x from remote-cachero
: 0x from remote-originxx
: 0x failed download
- 62MB downloaded in ~75 seconds
- Approx ~43k hash values per second
Project
- Github - github.com/threatpatrols/hibp-downloader
- PyPI - pypi.org/project/hibp-downloader/
- ReadTheDocs - hibp-downloader.readthedocs.io
Copyright
- Copyright © 2023 Threat Patrols Pty Ltd
- Copyright © 2023 Nicholas de Jong
All rights reserved.
License
- BSD-3-Clause - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hibp_downloader-0.3.1.tar.gz
(20.5 kB
view details)
Built Distribution
File details
Details for the file hibp_downloader-0.3.1.tar.gz
.
File metadata
- Download URL: hibp_downloader-0.3.1.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/42.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.1.0 tqdm/4.66.1 importlib-metadata/7.0.0 keyring/24.3.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54a0119672bcf9d86a6e2a531c34c89a300532c52b1167ae9f6ecc67d8f95b1e |
|
MD5 | a9af145eb8ddd7e098cf5c7ae408b5e1 |
|
BLAKE2b-256 | 0eb9f18a66f51a8184abd788f6e1ce3bda629de6a4f846145ae76d6cceb7b222 |
File details
Details for the file hibp_downloader-0.3.1-py3-none-any.whl
.
File metadata
- Download URL: hibp_downloader-0.3.1-py3-none-any.whl
- Upload date:
- Size: 25.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/42.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.1.0 tqdm/4.66.1 importlib-metadata/7.0.0 keyring/24.3.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 025d961f6957e1cb859178e553d2568890136913e1d67000d36f79a0ca9a3a29 |
|
MD5 | 7414e82e6d9c91248777c37af5290ce9 |
|
BLAKE2b-256 | b349570b9fe497295aa6401e9b88f03c8692d1670aaeb568ca1e7a67f46ee54b |