Skip to main content

A generalised efficient checksum library for python hashlib

Project description

Checksum

A flexible, high-performance Python package for calculating and caching file checksums using any hash algorithm supported by hashlib.

Note for source code readers

We use invoke to drive a bunch of internal tasks. You can see the list of tasks in tasks.py at the top level.

Features

  • Multiple Algorithm Support: Use any hash algorithm available in hashlib (MD5, SHA1, SHA256, SHA512, etc.)
  • Type-Safe Enum Interface: HashAlgorithm enum for better type safety and IDE auto-completion
  • Performance Optimized: Configurable block sizes for reading large files efficiently
  • Smart Caching: Built-in caching system based on file modification times and content
  • Low Memory Footprint: Stream-based processing keeps memory usage low, even for very large files
  • Comprehensive CLI: Powerful command-line interface with recursive directory processing and multiple output formats
  • full cli support: The package includes a powerful command-line tool (clichecksum) for computing and verifying checksums.

Installation

# Using pip
pip install pychecksumtool

# From source
git clone https://github.com/yourusername/checksum.git
cd checksum
pip install -e .

Basic Usage

Computing Checksums

from src.pychecksumtool import Checksum, HashAlgorithm

# Calculate a SHA-256 checksum (default)
checksum = Checksum("myfile.txt")
print(f"SHA-256: {checksum.checksum}")

# Calculate with a different algorithm
md5_checksum = Checksum("myfile.txt", hash_algorithm=HashAlgorithm.MD5)
print(f"MD5: {md5_checksum.checksum}")

# Calculate with a custom block size (for large files)
large_file_checksum = Checksum("largefile.iso", block_size=1048576, hash_algorithm=HashAlgorithm.SHA512)
print(f"SHA-512: {large_file_checksum.checksum}")

Using the Cache

from src.pychecksumtool import CachedChecksum, HashAlgorithm

# First calculation computes and caches
cached = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.SHA256)
print(f"SHA-256: {cached.checksum}")

# Second calculation uses the cache if the file hasn't changed
cached2 = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.SHA256)
print(f"SHA-256 (from cache): {cached2.checksum}")

# Force a fresh calculation
no_cache = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.SHA256, use_cache=False)
print(f"SHA-256 (fresh): {no_cache.checksum}")

# Using a different algorithm creates a separate cache entry
md5_cached = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.MD5)
print(f"MD5: {md5_cached.checksum}")

Using Static Methods

from src.pychecksumtool import CachedChecksum, HashAlgorithm

# Static method for computing hash with caching
sha256_hash = CachedChecksum.compute_hash("myfile.txt", HashAlgorithm.SHA256)
print(f"SHA-256: {sha256_hash}")

# Hash in-memory data
data = b"Hello, World!"
data_hash = CachedChecksum.hash_data(data, HashAlgorithm.SHA256)
print(f"SHA-256 of data: {data_hash}")

Available Hash Algorithms

from src.pychecksumtool import HashAlgorithm

# List all available algorithms
available_algos = HashAlgorithm.get_available()
print("Available algorithms:", [algo.value for algo in available_algos])

# Check if an algorithm is available
if HashAlgorithm.is_available(HashAlgorithm.BLAKE2B):
    print("BLAKE2B is available")
else:
    print("BLAKE2B is not available")

Command Line Interface

The package includes a powerful command-line tool for computing and verifying checksums.

Generating Checksums

# Basic usage with default SHA-256
checksum hash myfile.txt

# Specify a different algorithm
checksum hash myfile.txt --algorithm md5

# Generate multiple hashes at once
checksum hash myfile.txt --multi md5 --multi sha1 --multi sha256

# Process multiple files
checksum hash file1.txt file2.txt file3.txt

# Process directories recursively
checksum hash /path/to/directory --recursive

# Exclude files/directories
checksum hash /path/to/directory --recursive --exclude "*.tmp" --exclude "*cache*"

# Change output format
checksum hash myfile.txt --format json

# Save results to a file
checksum hash myfile.txt --output results.json --format json

Verifying Checksums

# Verify a file against a checksum
checksum verify myfile.txt abc123def456...

# Specify algorithm
checksum verify myfile.txt abc123def456... --algorithm md5

# Batch verify from a checksums file
checksum batch checksums.txt

# Specify a base directory for relative paths in batch file
checksum batch checksums.txt --base-dir /path/to/files

Getting Help

# List all commands
checksum --help

# Command-specific help
checksum hash --help
checksum verify --help
checksum batch --help

# List available hash algorithms
checksum hash --list-algorithms

API Reference

Core Classes

  • HashAlgorithm: Enum of supported hash algorithms
  • Checksum: Base class for computing file checksums
  • HashCache: Class for caching hash values
  • CachedChecksum: Wrapper that adds caching to Checksum operations

HashAlgorithm Enum

Member Value
MD5 'md5'
SHA1 'sha1'
SHA224 'sha224'
SHA256 'sha256'
SHA384 'sha384'
SHA512 'sha512'
BLAKE2B 'blake2b'
BLAKE2S 'blake2s'
SHA3_224 'sha3_224'
SHA3_256 'sha3_256'
SHA3_384 'sha3_384'
SHA3_512 'sha3_512'

Performance Tips

  • For large files, a larger block size (e.g., 1MB = 1048576 bytes) can improve performance
  • For many small files, using the default block size is generally optimal
  • The cache dramatically improves performance when checking the same files multiple times
  • When verifying a large number of files, use the batch command with --parallel for multi-threading

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pychecksumtool-0.1.5.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pychecksumtool-0.1.5-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file pychecksumtool-0.1.5.tar.gz.

File metadata

  • Download URL: pychecksumtool-0.1.5.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/24.3.0

File hashes

Hashes for pychecksumtool-0.1.5.tar.gz
Algorithm Hash digest
SHA256 d66de64bd1d9d7719105fde2a22c2c8892d9a9c3aaffbd131a1dc7400e39840b
MD5 d69b9ac2ab80048e92a7102faf00ed84
BLAKE2b-256 198c30416706ccae32e6524c02648c18634d18fad1538ad39f3e7f1993f55f4b

See more details on using hashes here.

File details

Details for the file pychecksumtool-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: pychecksumtool-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/24.3.0

File hashes

Hashes for pychecksumtool-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e4284e509bb64b057e416de2efd3b2dc48a1eee9fecff7ba98897cdb3257a87f
MD5 10cf4a7969e26f1603e0a994781be63c
BLAKE2b-256 93edc775cfbf144159983f3ce72a78c8c121abd68f06d3e0d55d1defe07cd8e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page