Skip to main content

Efficiently download HIBP new pwned password data by hash-prefix for a local-copy

Project description

hibp-downloader

pypi python build tests docs license

This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome HIBP pwned passwords api-endpoint using all the good bits; multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to make things as fast as is Pythonly possible.

Features

  • Easily resume interrupted download operations into a --data-path without re-clobbering api-source.
  • Only download hash-prefix content blocks when the source content has changed (via content ETAG values); thus making it easy to periodically re-sync when needed.
  • Ability to directly query for compromised password values from the data in-place; efficient enough to attach a service with reasonable loads.
  • Ability to generate a single text file with in-order pwned password hash values, similar to PwnedPasswordsDownloader from the HIBP team.
  • Per prefix file metadata in JSON format for easy data reuse by other tooling if required.

Install

pip install --upgrade hibp-downloader

Usage

screenshot-help.png

Performance

Sample download activity log; host with 12 cores on 45Mbit/s DSL connection.

2023-11-12T21:25:08+1000 | INFO | hibp-downloader | prefix=00ec3 source=[lc:10 et:2 rc:3800 ro:0 xx:0] processed=[62.0MB ~43589H/s] api=[105req/s 60.0MB] runtime=1.2min
2023-11-12T21:25:09+1000 | INFO | hibp-downloader | prefix=00eff source=[lc:10 et:2 rc:3850 ro:0 xx:0] processed=[62.8MB ~43547H/s] api=[105req/s 60.8MB] runtime=1.2min
2023-11-12T21:25:10+1000 | INFO | hibp-downloader | prefix=00f3b source=[lc:10 et:2 rc:3900 ro:0 xx:0] processed=[63.7MB ~43528H/s] api=[105req/s 61.7MB] runtime=1.2min
2023-11-12T21:25:11+1000 | INFO | hibp-downloader | prefix=00f6d source=[lc:10 et:2 rc:3950 ro:0 xx:0] processed=[64.5MB ~43541H/s] api=[105req/s 62.5MB] runtime=1.3min
  • 105x requests per second to api.pwnedpasswords.com
  • Log sources are shorthand:
    • lc: 10x prefix files from local-cache
    • et: 2x etag-match responses
    • rc: 3950x from remote-cache
    • ro: 0x from remote-origin
    • xx: 0x failed download
  • 62MB downloaded in ~75 seconds
  • Approx ~43k hash values per second

Project

Copyright

All rights reserved.

License

  • BSD-3-Clause - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hibp_downloader-0.3.1.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

hibp_downloader-0.3.1-py3-none-any.whl (25.9 kB view details)

Uploaded Python 3

File details

Details for the file hibp_downloader-0.3.1.tar.gz.

File metadata

  • Download URL: hibp_downloader-0.3.1.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/42.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.1.0 tqdm/4.66.1 importlib-metadata/7.0.0 keyring/24.3.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.11.7

File hashes

Hashes for hibp_downloader-0.3.1.tar.gz
Algorithm Hash digest
SHA256 54a0119672bcf9d86a6e2a531c34c89a300532c52b1167ae9f6ecc67d8f95b1e
MD5 a9af145eb8ddd7e098cf5c7ae408b5e1
BLAKE2b-256 0eb9f18a66f51a8184abd788f6e1ce3bda629de6a4f846145ae76d6cceb7b222

See more details on using hashes here.

File details

Details for the file hibp_downloader-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: hibp_downloader-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 25.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/42.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.1.0 tqdm/4.66.1 importlib-metadata/7.0.0 keyring/24.3.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.11.7

File hashes

Hashes for hibp_downloader-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 025d961f6957e1cb859178e553d2568890136913e1d67000d36f79a0ca9a3a29
MD5 7414e82e6d9c91248777c37af5290ce9
BLAKE2b-256 b349570b9fe497295aa6401e9b88f03c8692d1670aaeb568ca1e7a67f46ee54b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page