Skip to main content

Module to wrap around hashlib and facilitate generating checksums / hashes of files and directories.

Project description

Python module to facilitate calculating the checksum or hash of a file. Tested against Python 2.7, Python 3.6, PyPy 2.7 and PyPy 3.5.

FileHash class

The FileHash class wraps around the hashlib module and contains the following methods:

  • hash_file(filename) - Calculate the file has for a single file. Returns a string with the hex digest.

  • hash_dir(path, pattern='*') - Calculate the file hashes for an entire directory. Returns a list of tuples where each tuple contains the filename and the calculated hash.

  • verify_checksums(checksum_filename) - Reads the specified file and calculates the hashes for the files listed, comparing the calculated hashes against the specified expected hashes. Returns a list of tuples where each tuple contains the filename and a boolean value indicating if the calculated hash matches the expected hash.

For the checksum file, the file is expected to be a plain text file where each line has an entry formatted as follows:

{hash}[SPACE][ASTERISK]{filename}

This format is the format used by programs such as the sha1sum family of tools for generating checksum files. Here is an example generated by sha1sum:

f7ef3b7afaf1518032da1b832436ef3bbfd4e6f0 *lorem_ipsum.txt
03da86258449317e8834a54cf8c4d5b41e7c7128 *lorem_ipsum.zip

The FileHash constructor has two optional arguments:

  • hash_algorithm='sha256' - Specifies the hashing algorithm to use. Use hashlib.algorithms_available to get a list of possible hashing algorithms to use. Defaults to SHA256.

  • chunk_size=4096 - Integer specifying the chunk size to use (in bytes) when reading the file. This comes in useful when processing very large files to avoid having to read the entire file into memory all at once. Default chunk size is 4096 bytes.

Example usage

The library can be used as follows:

>>> import os
>>> from filehash import FileHash
>>> md5hasher = FileHash('md5')
>>> md5hasher.hash_file("./testdata/lorem_ipsum.txt")
'72f5d9e3a5fa2f2e591487ae02489388'
>>> sha1hasher = FileHash('sha1')
>>> sha1hasher.hash_dir("./testdata", "*.zip")
[FileHashResult(filename='lorem_ipsum.zip', hash='03da86258449317e8834a54cf8c4d5b41e7c7128')]
>>> sha512hasher = FileHash('sha512')
>>> os.chdir("./testdata")
>>> sha512hasher.verify_checksums("./hashes.sha512")
[VerifyHashResult(filename='lorem_ipsum.txt', hashes_match=True), VerifyHashResult(filename='lorem_ipsum.zip', hashes_match=True)]

License

This is released under an MIT license. See the LICENSE file in this repository for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filehash-0.1.dev0.tar.gz (4.4 kB view hashes)

Uploaded Source

Built Distribution

filehash-0.1.dev0-py2.py3-none-any.whl (6.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page