Skip to main content

Efficiently download HIBP new pwned password data by hash-prefix for a local-copy

Project description

hibp-downloader

pypi python build tests license

This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome HIBP pwned passwords api-endpoint using all the good bits; multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to probably make things as fast as is Pythonly possible.

Features

  • Interface to directly query for compromised password values from the compressed file data-store!
  • Download and store acquired data in gzip'd compressed to save on storage and speed up queries.
  • Download the full dataset in under 45 mins (generally CPU bound)
  • Easily resume interrupted download operations into a --data-path without re-clobbering api-source.
  • Only download hash-prefix content blocks when the source content has changed (via content ETAG values); making it easy to periodically sync-up when needed.
  • Query interface performance is efficient enough to attach a user web-service with reasonable loads (ie don't waste your own resources decompressing the dataset and storing in a database!)
  • Ability to generate a single text file with in-order pwned password hash values, similar to PwnedPasswordsDownloader from the awesome HIBP team.
  • Per prefix file metadata in JSON format for easy data reuse by other tooling if required.
  • Standalone validation command to verify the local copy dataset, clean up corrupted or incomplete files, and remove orphaned metadata files.

Install

pipx install hibp-downloader

Usage (download)

screenshot-help.png

Performance

Sample download activity log; host with 32 cores on 500Mbit/s connection.

...
2024-05-16T10:18:01-0400 | INFO | hibp-downloader | prefix=f80c7 source=[lc:13616 et:3 rc:1002358 ro:25 xx:1] processed=[17836.6MB ~414462H/s] api=[918req/s 17597.4MB] runtime=36.4min
2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f81af source=[lc:13616 et:3 rc:1002558 ro:25 xx:1] processed=[17840.1MB ~414454H/s] api=[918req/s 17600.9MB] runtime=36.4min
2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f826f source=[lc:13616 et:3 rc:1002758 ro:25 xx:1] processed=[17843.6MB ~414454H/s] api=[918req/s 17604.4MB] runtime=36.4min
2024-05-16T10:18:03-0400 | INFO | hibp-downloader | prefix=f833f source=[lc:13616 et:3 rc:1002958 ro:25 xx:1] processed=[17847.1MB ~414450H/s] api=[918req/s 17607.9MB] runtime=36.4min
  • 918x requests per second to api.pwnedpasswords.com
  • Log sources are shorthand:
    • lc: local-cache - request-responses handled locally without hitting the network.
    • et: ETag match - request-responses that confirmed our local data was up-to-date and did not require a new download.
    • rc: remote-cache - request-responses that were downloaded to local, but came from the remote-server cache.
    • ro: remote-origin - request-responses that were downloaded to local, and the download needed to be fetched from remote origin source.
    • xx: unknown/failed - request-responses that failed (and successfully retried).
  • ~17GB downloaded in ~36 minutes (full dataset)
  • Approx ~414k hash values received per second
  • Processing in this example appears to be CPU bound, measured traffic around ~160 Mbit/s.

Usage (query)

screenshot-help.png

Usage (generate)

Generate a single in-order text file with compromised hashes from your local data store:

hibp-downloader --data-path /path/to/data generate --filename pwned-hashes.txt --hash-type sha1

Usage (validate)

Validate local pwned password files and automatically clean up corrupted data or orphaned metadata files:

hibp-downloader --data-path /path/to/data validate --hash-type sha1

Project

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hibp_downloader-0.4.6.tar.gz (32.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hibp_downloader-0.4.6-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file hibp_downloader-0.4.6.tar.gz.

File metadata

  • Download URL: hibp_downloader-0.4.6.tar.gz
  • Upload date:
  • Size: 32.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hibp_downloader-0.4.6.tar.gz
Algorithm Hash digest
SHA256 a9b6cc71cd42dd30b7a2f599fd8315672bc35a845cb50c9a524f5fbe0eddf3db
MD5 fcc3e973f83033c5e965f2978aafae74
BLAKE2b-256 de691d3f607ac3f6ac1eb258cd9de19a51dc96f84e20d093e5c18fb03ba68fb4

See more details on using hashes here.

File details

Details for the file hibp_downloader-0.4.6-py3-none-any.whl.

File metadata

  • Download URL: hibp_downloader-0.4.6-py3-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hibp_downloader-0.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1e468db055529e20dfe89cf2ed71b9fa8cacfa07ab108f4b5547fd23e872cffa
MD5 3a18d557b37a2720ddb250c796891bb0
BLAKE2b-256 5446a6206da0d3a1bc6a90f07b39cffa7e2f43191e45a9332b66a6e54c18f8ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page