A generalised efficient checksum library for python hashlib
Project description
Checksum
A flexible, high-performance Python package for calculating and caching file checksums using any hash algorithm supported by hashlib.
Note for source code readers
We use invoke to drive a bunch of internal tasks.
You can see the list of tasks in tasks.py at the top level.
Features
- Multiple Algorithm Support: Use any hash algorithm available in hashlib (MD5, SHA1, SHA256, SHA512, etc.)
- Type-Safe Enum Interface: HashAlgorithm enum for better type safety and IDE auto-completion
- Performance Optimized: Configurable block sizes for reading large files efficiently
- Smart Caching: Built-in caching system based on file modification times and content
- Low Memory Footprint: Stream-based processing keeps memory usage low, even for very large files
- Comprehensive CLI: Powerful command-line interface with recursive directory processing and multiple output formats
- full cli support: The package includes a powerful command-line tool (clichecksum) for computing and verifying checksums.
Installation
# Using pip
pip install pychecksumtool
# From source
git clone https://github.com/yourusername/checksum.git
cd checksum
pip install -e .
Basic Usage
Computing Checksums
from src.pychecksumtool import Checksum, HashAlgorithm
# Calculate a SHA-256 checksum (default)
checksum = Checksum("myfile.txt")
print(f"SHA-256: {checksum.checksum}")
# Calculate with a different algorithm
md5_checksum = Checksum("myfile.txt", hash_algorithm=HashAlgorithm.MD5)
print(f"MD5: {md5_checksum.checksum}")
# Calculate with a custom block size (for large files)
large_file_checksum = Checksum("largefile.iso", block_size=1048576, hash_algorithm=HashAlgorithm.SHA512)
print(f"SHA-512: {large_file_checksum.checksum}")
Using the Cache
from src.pychecksumtool import CachedChecksum, HashAlgorithm
# First calculation computes and caches
cached = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.SHA256)
print(f"SHA-256: {cached.checksum}")
# Second calculation uses the cache if the file hasn't changed
cached2 = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.SHA256)
print(f"SHA-256 (from cache): {cached2.checksum}")
# Force a fresh calculation
no_cache = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.SHA256, use_cache=False)
print(f"SHA-256 (fresh): {no_cache.checksum}")
# Using a different algorithm creates a separate cache entry
md5_cached = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.MD5)
print(f"MD5: {md5_cached.checksum}")
Using Static Methods
from src.pychecksumtool import CachedChecksum, HashAlgorithm
# Static method for computing hash with caching
sha256_hash = CachedChecksum.compute_hash("myfile.txt", HashAlgorithm.SHA256)
print(f"SHA-256: {sha256_hash}")
# Hash in-memory data
data = b"Hello, World!"
data_hash = CachedChecksum.hash_data(data, HashAlgorithm.SHA256)
print(f"SHA-256 of data: {data_hash}")
Available Hash Algorithms
from src.pychecksumtool import HashAlgorithm
# List all available algorithms
available_algos = HashAlgorithm.get_available()
print("Available algorithms:", [algo.value for algo in available_algos])
# Check if an algorithm is available
if HashAlgorithm.is_available(HashAlgorithm.BLAKE2B):
print("BLAKE2B is available")
else:
print("BLAKE2B is not available")
Command Line Interface
The package includes a powerful command-line tool for computing and verifying checksums.
Generating Checksums
# Basic usage with default SHA-256
checksum hash myfile.txt
# Specify a different algorithm
checksum hash myfile.txt --algorithm md5
# Generate multiple hashes at once
checksum hash myfile.txt --multi md5 --multi sha1 --multi sha256
# Process multiple files
checksum hash file1.txt file2.txt file3.txt
# Process directories recursively
checksum hash /path/to/directory --recursive
# Exclude files/directories
checksum hash /path/to/directory --recursive --exclude "*.tmp" --exclude "*cache*"
# Change output format
checksum hash myfile.txt --format json
# Save results to a file
checksum hash myfile.txt --output results.json --format json
Verifying Checksums
# Verify a file against a checksum
checksum verify myfile.txt abc123def456...
# Specify algorithm
checksum verify myfile.txt abc123def456... --algorithm md5
# Batch verify from a checksums file
checksum batch checksums.txt
# Specify a base directory for relative paths in batch file
checksum batch checksums.txt --base-dir /path/to/files
Getting Help
# List all commands
checksum --help
# Command-specific help
checksum hash --help
checksum verify --help
checksum batch --help
# List available hash algorithms
checksum hash --list-algorithms
API Reference
Core Classes
- HashAlgorithm: Enum of supported hash algorithms
- Checksum: Base class for computing file checksums
- HashCache: Class for caching hash values
- CachedChecksum: Wrapper that adds caching to Checksum operations
HashAlgorithm Enum
| Member | Value |
|---|---|
| MD5 | 'md5' |
| SHA1 | 'sha1' |
| SHA224 | 'sha224' |
| SHA256 | 'sha256' |
| SHA384 | 'sha384' |
| SHA512 | 'sha512' |
| BLAKE2B | 'blake2b' |
| BLAKE2S | 'blake2s' |
| SHA3_224 | 'sha3_224' |
| SHA3_256 | 'sha3_256' |
| SHA3_384 | 'sha3_384' |
| SHA3_512 | 'sha3_512' |
Performance Tips
- For large files, a larger block size (e.g., 1MB = 1048576 bytes) can improve performance
- For many small files, using the default block size is generally optimal
- The cache dramatically improves performance when checking the same files multiple times
- When verifying a large number of files, use the
batchcommand with--parallelfor multi-threading
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pychecksumtool-0.1.5.tar.gz.
File metadata
- Download URL: pychecksumtool-0.1.5.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d66de64bd1d9d7719105fde2a22c2c8892d9a9c3aaffbd131a1dc7400e39840b
|
|
| MD5 |
d69b9ac2ab80048e92a7102faf00ed84
|
|
| BLAKE2b-256 |
198c30416706ccae32e6524c02648c18634d18fad1538ad39f3e7f1993f55f4b
|
File details
Details for the file pychecksumtool-0.1.5-py3-none-any.whl.
File metadata
- Download URL: pychecksumtool-0.1.5-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4284e509bb64b057e416de2efd3b2dc48a1eee9fecff7ba98897cdb3257a87f
|
|
| MD5 |
10cf4a7969e26f1603e0a994781be63c
|
|
| BLAKE2b-256 |
93edc775cfbf144159983f3ce72a78c8c121abd68f06d3e0d55d1defe07cd8e3
|