Skip to main content

Efficiently download HIBP new pwned password data by hash-prefix for a local-copy

Project description

hibp-downloader

pypi python build tests docs license

This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome HIBP pwned passwords api-endpoint using all the good bits; multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to make things as fast as is Pythonly possible.

Features

  • Easily resume interrupted download operations into a --data-path without re-clobbering api-source.
  • Only download hash-prefix content blocks when the source content has changed (via content ETAG values); thus making it easy to periodically re-sync when needed.
  • Ability to directly query for compromised password values from the data in-place; efficient enough to attach a service with reasonable loads.
  • Ability to generate a single text file with in-order pwned password hash values, similar to PwnedPasswordsDownloader from the HIBP team.
  • Per prefix file metadata in JSON format for easy data reuse by other tooling if required.

Install

pip install --upgrade hibp-downloader

Usage

screenshot-help.png

Performance

Sample download activity log; host with 12 cores on 45Mbit/s DSL connection.

2023-11-12T21:25:08+1000 | INFO | hibp-downloader | prefix=00ec3 source=[lc:10 et:2 rc:3800 ro:0 xx:0] processed=[62.0MB ~43589H/s] api=[105req/s 60.0MB] runtime=1.2min
2023-11-12T21:25:09+1000 | INFO | hibp-downloader | prefix=00eff source=[lc:10 et:2 rc:3850 ro:0 xx:0] processed=[62.8MB ~43547H/s] api=[105req/s 60.8MB] runtime=1.2min
2023-11-12T21:25:10+1000 | INFO | hibp-downloader | prefix=00f3b source=[lc:10 et:2 rc:3900 ro:0 xx:0] processed=[63.7MB ~43528H/s] api=[105req/s 61.7MB] runtime=1.2min
2023-11-12T21:25:11+1000 | INFO | hibp-downloader | prefix=00f6d source=[lc:10 et:2 rc:3950 ro:0 xx:0] processed=[64.5MB ~43541H/s] api=[105req/s 62.5MB] runtime=1.3min
  • 105x requests per second to api.pwnedpasswords.com
  • Log sources are shorthand:
    • lc: 10x prefix files from local-cache
    • et: 2x etag-match responses
    • rc: 3950x from remote-cache
    • ro: 0x from remote-origin
    • xx: 0x failed download
  • 62MB downloaded in ~75 seconds
  • Approx ~43k hash values per second

Project

Copyright

All rights reserved.

License

  • BSD-3-Clause - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hibp_downloader-0.2.0.tar.gz (19.7 kB view hashes)

Uploaded Source

Built Distribution

hibp_downloader-0.2.0-py3-none-any.whl (24.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page