Skip to main content

A generalised efficient checksum library for python hashlib

Project description

Checksum

A flexible, high-performance Python package for calculating and caching file checksums using any hash algorithm supported by hashlib.

Note for source code readers

We use invoke to drive a bunch of internal tasks. You can see the list of tasks in tasks.py at the top level.

Features

  • Multiple Algorithm Support: Use any hash algorithm available in hashlib (MD5, SHA1, SHA256, SHA512, etc.)
  • Type-Safe Enum Interface: HashAlgorithm enum for better type safety and IDE auto-completion
  • Performance Optimized: Configurable block sizes for reading large files efficiently
  • Smart Caching: Built-in caching system based on file modification times and content
  • Low Memory Footprint: Stream-based processing keeps memory usage low, even for very large files
  • Comprehensive CLI: Powerful command-line interface with recursive directory processing and multiple output formats
  • full cli support: The package includes a powerful command-line tool (clichecksum) for computing and verifying checksums.

Installation

# Using pip
pip install pychecksumtool

# From source
git clone https://github.com/yourusername/checksum.git
cd checksum
pip install -e .

Basic Usage

Computing Checksums

from src.pychecksumtool import Checksum, HashAlgorithm

# Calculate a SHA-256 checksum (default)
checksum = Checksum("myfile.txt")
print(f"SHA-256: {checksum.checksum}")

# Calculate with a different algorithm
md5_checksum = Checksum("myfile.txt", hash_algorithm=HashAlgorithm.MD5)
print(f"MD5: {md5_checksum.checksum}")

# Calculate with a custom block size (for large files)
large_file_checksum = Checksum("largefile.iso", block_size=1048576, hash_algorithm=HashAlgorithm.SHA512)
print(f"SHA-512: {large_file_checksum.checksum}")

Using the Cache

from src.pychecksumtool import CachedChecksum, HashAlgorithm

# First calculation computes and caches
cached = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.SHA256)
print(f"SHA-256: {cached.checksum}")

# Second calculation uses the cache if the file hasn't changed
cached2 = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.SHA256)
print(f"SHA-256 (from cache): {cached2.checksum}")

# Force a fresh calculation
no_cache = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.SHA256, use_cache=False)
print(f"SHA-256 (fresh): {no_cache.checksum}")

# Using a different algorithm creates a separate cache entry
md5_cached = CachedChecksum("myfile.txt", hash_algorithm=HashAlgorithm.MD5)
print(f"MD5: {md5_cached.checksum}")

Using Static Methods

from src.pychecksumtool import CachedChecksum, HashAlgorithm

# Static method for computing hash with caching
sha256_hash = CachedChecksum.compute_hash("myfile.txt", HashAlgorithm.SHA256)
print(f"SHA-256: {sha256_hash}")

# Hash in-memory data
data = b"Hello, World!"
data_hash = CachedChecksum.hash_data(data, HashAlgorithm.SHA256)
print(f"SHA-256 of data: {data_hash}")

Available Hash Algorithms

from src.pychecksumtool import HashAlgorithm

# List all available algorithms
available_algos = HashAlgorithm.get_available()
print("Available algorithms:", [algo.value for algo in available_algos])

# Check if an algorithm is available
if HashAlgorithm.is_available(HashAlgorithm.BLAKE2B):
    print("BLAKE2B is available")
else:
    print("BLAKE2B is not available")

Command Line Interface

The package includes a powerful command-line tool for computing and verifying checksums.

Generating Checksums

# Basic usage with default SHA-256
checksum hash myfile.txt

# Specify a different algorithm
checksum hash myfile.txt --algorithm md5

# Generate multiple hashes at once
checksum hash myfile.txt --multi md5 --multi sha1 --multi sha256

# Process multiple files
checksum hash file1.txt file2.txt file3.txt

# Process directories recursively
checksum hash /path/to/directory --recursive

# Exclude files/directories
checksum hash /path/to/directory --recursive --exclude "*.tmp" --exclude "*cache*"

# Change output format
checksum hash myfile.txt --format json

# Save results to a file
checksum hash myfile.txt --output results.json --format json

Verifying Checksums

# Verify a file against a checksum
checksum verify myfile.txt abc123def456...

# Specify algorithm
checksum verify myfile.txt abc123def456... --algorithm md5

# Batch verify from a checksums file
checksum batch checksums.txt

# Specify a base directory for relative paths in batch file
checksum batch checksums.txt --base-dir /path/to/files

Getting Help

# List all commands
checksum --help

# Command-specific help
checksum hash --help
checksum verify --help
checksum batch --help

# List available hash algorithms
checksum hash --list-algorithms

API Reference

Core Classes

  • HashAlgorithm: Enum of supported hash algorithms
  • Checksum: Base class for computing file checksums
  • HashCache: Class for caching hash values
  • CachedChecksum: Wrapper that adds caching to Checksum operations

HashAlgorithm Enum

Member Value
MD5 'md5'
SHA1 'sha1'
SHA224 'sha224'
SHA256 'sha256'
SHA384 'sha384'
SHA512 'sha512'
BLAKE2B 'blake2b'
BLAKE2S 'blake2s'
SHA3_224 'sha3_224'
SHA3_256 'sha3_256'
SHA3_384 'sha3_384'
SHA3_512 'sha3_512'

Performance Tips

  • For large files, a larger block size (e.g., 1MB = 1048576 bytes) can improve performance
  • For many small files, using the default block size is generally optimal
  • The cache dramatically improves performance when checking the same files multiple times
  • When verifying a large number of files, use the batch command with --parallel for multi-threading

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pychecksumtool-0.1.2.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pychecksumtool-0.1.2-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file pychecksumtool-0.1.2.tar.gz.

File metadata

  • Download URL: pychecksumtool-0.1.2.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/24.3.0

File hashes

Hashes for pychecksumtool-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9f4bc632de458360238e4a52c1af8740eb546ba74d3e827b835c501065742e2e
MD5 1b5c0d53d9716d74eca7416b656a5c2d
BLAKE2b-256 f47d48596f72e9965610b802b80590f5c97953ff97cceedf40a9f3fc5f6e542a

See more details on using hashes here.

File details

Details for the file pychecksumtool-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pychecksumtool-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/24.3.0

File hashes

Hashes for pychecksumtool-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 696bdd74c90699d339091391187befd428cfdb9d22f991592079cb506387cf29
MD5 de64cc7b16101359664268336cc7121a
BLAKE2b-256 ad115e14a280a03651e1be64d8991598e33e41b8de42cbd31974c29aa071b9d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page