A lightweight Python library that uses MD5 checksums to track file changes and execute code only when files have been modified. PyChecksumCache maintains a persistent cache of file checksums, making it ideal for build systems, asset processors, incremental compilers, and any tools that need efficient change detection.
Project description
PyChecksumCache
A lightweight Python library that uses MD5 checksums to track file changes and execute code only when files have been modified. PyChecksumCache maintains a persistent cache of file checksums, making it ideal for build systems, asset processors, incremental compilers, and any tools that need efficient change detection.
How It Works
flowchart TD
A[File to Process] --> B{Calculate MD5}
B --> C[Current Checksum]
D[(Checksum Cache)] --> E{Compare with\nCached Checksum}
C --> E
E -->|Different or\nNot in Cache| F[Execute Code]
E -->|Same| G[Skip Processing]
F --> H[Update Cache\nwith New Checksum]
style A fill:#f9f9f9,stroke:#333,stroke-width:2px
style D fill:#c9e6ff,stroke:#0066cc,stroke-width:2px
style F fill:#d9f7be,stroke:#389e0d,stroke-width:2px
style G fill:#ffccc7,stroke:#cf1322,stroke-width:2px
Features
- Track changes to files using MD5 checksums
- Persistent cache storage in JSON format
- Execute functions only when file content has changed
- Monitor individual files or groups of files
- Batch transform multiple files with automatic output management
- Full async/await support for improved performance
- Consistent handling of both absolute and relative paths
- Optional base directory for project-relative paths
- Simple, intuitive API
- Works with Python 3.8+
- No external dependencies
Installation
pip install pychecksumcache
Or with uv:
uv pip install pychecksumcache
Quick Start
Basic Usage
from pychecksumcache import ChecksumCache
# Initialize with default cache file (.checksum_cache.json)
cache = ChecksumCache()
# Check if a file has changed since last check
if cache.has_changed("path/to/file.txt"):
print("File has changed!")
# Process the file...
# Execute a function only if the file has changed
def process_file(file_path):
with open(file_path, 'r') as f:
content = f.read()
return f"Processed {len(content)} characters"
result = cache.execute_if_changed("path/to/file.txt", process_file, "path/to/file.txt")
if result:
print(result)
else:
print("File unchanged, processing skipped")
# Check if any file in a list has changed
files_to_check = ["file1.txt", "file2.py", "file3.json"]
if cache.any_changed(files_to_check):
print("At least one file has changed!")
Using with Project-Relative Paths
from pychecksumcache import ChecksumCache
from pathlib import Path
# Set up a project directory structure
project_root = Path("/path/to/project")
cache_file = project_root / ".cache" / "checksums.json"
# Initialize with project-specific cache file and base directory
cache = ChecksumCache(
cache_file=cache_file,
base_dir=project_root # All relative paths will be resolved against this
)
# Now you can use project-relative paths throughout your code
if cache.has_changed("src/main.py"): # Will resolve to /path/to/project/src/main.py
print("Main source file has changed!")
# Transform files using relative paths
input_files = ["src/templates/page1.html", "src/templates/page2.html"]
output_folder = "build/html" # Will resolve to /path/to/project/build/html
cache.transform(input_files, output_folder, ".processed.html", process_template)
Async Usage
import asyncio
from pychecksumcache import ChecksumCache
# Define an async processing function
async def process_file_async(file_path):
# Simulate some async IO operation
with open(file_path, 'r') as f:
content = f.read()
await asyncio.sleep(0.1) # Simulate async work
return f"Processed {len(content)} characters asynchronously"
async def main():
cache = ChecksumCache()
# Check if a file has changed asynchronously
has_changed = await cache.has_changed_async("path/to/file.txt")
if has_changed:
print("File has changed!")
# Execute an async function only if file changed
result = await cache.execute_if_changed_async(
"path/to/file.txt",
process_file_async,
"path/to/file.txt"
)
if result:
print(result)
else:
print("File unchanged, processing skipped")
# Check if any files have changed asynchronously
files = ["file1.txt", "file2.py", "file3.json"]
if await cache.any_changed_async(files):
print("At least one file has changed!")
# Run the async example
asyncio.run(main())
Batch File Processing
The library provides a high-level API for batch transformations of multiple files:
# Define input files and output settings
input_files = ['file1.txt', 'file2.txt']
output_folder = 'output'
output_extension = '.generated.txt'
# Define a transformation function
def transform_func(input_path, output_path):
with open(input_path, 'r') as infile, open(output_path, 'w') as outfile:
content = infile.read()
# Apply some transformation
transformed = content.upper() # For example, convert to uppercase
outfile.write(transformed)
# Perform the transformation - only changed files will be processed
results = cache.transform(input_files, output_folder, output_extension, transform_func)
# See which files were processed
for output_file, was_transformed in results:
if was_transformed:
print(f"Transformed: {output_file}")
else:
print(f"Skipped (unchanged): {output_file}")
Async Batch Processing
Process multiple files concurrently for improved performance:
import asyncio
from pychecksumcache import ChecksumCache
# Define an async transformation function
async def transform_func_async(input_path, output_path):
# Read input file
with open(input_path, 'r') as infile:
content = infile.read()
# Simulate async processing
await asyncio.sleep(0.1)
transformed = content.upper()
# Write output file
with open(output_path, 'w') as outfile:
outfile.write(transformed)
async def main():
cache = ChecksumCache()
input_files = ['file1.txt', 'file2.txt', 'file3.txt', 'file4.txt']
output_folder = 'output'
# Process files concurrently (up to 5 at a time)
results = await cache.transform_async(
input_files,
output_folder,
'.processed.txt',
transform_func_async,
concurrency_limit=5
)
# Report results
for output_file, was_transformed in results:
if was_transformed:
print(f"Transformed: {output_file}")
else:
print(f"Skipped (unchanged): {output_file}")
# Run the async batch processing
asyncio.run(main())
Use Cases
- Build Systems: Rebuild only when source files change
- Static Site Generators: Regenerate pages only for modified content
- Data Processing: Process data files only when they've been updated
- Asset Compilation: Recompile assets only when source files change
- Testing: Run tests only for modified modules
- Documentation Generation: Update docs only for changed source files
- Code Generation: Generate code from templates only when inputs change
- Parallel Processing: Process multiple files concurrently with async support
- Project Tools: Work with project-relative paths for better portability
API Reference
ChecksumCache
cache = ChecksumCache(
cache_file=".checksum_cache.json",
base_dir=None # Defaults to current working directory
)
Basic Methods
calculate_md5(file_path): Calculate MD5 hash for a filehas_changed(file_path): Check if a file has changed since last checkexecute_if_changed(file_path, function, *args, **kwargs): Execute function only if file changedany_changed(file_list): Check if any file in the list has changedall_changed(file_list): Check if all files in the list have changedexecute_if_any_changed(file_list, function, *args, **kwargs): Execute if any file changed
Async Methods
calculate_md5_async(file_path): Calculate MD5 hash asynchronouslyhas_changed_async(file_path): Check if a file has changed asynchronouslyexecute_if_changed_async(file_path, function, *args, **kwargs): Execute function async if file changedany_changed_async(file_list): Check if any file changed asynchronouslyall_changed_async(file_list): Check if all files changed asynchronouslyexecute_if_any_changed_async(file_list, function, *args, **kwargs): Execute async if any
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pychecksumcache-1.0.0.tar.gz.
File metadata
- Download URL: pychecksumcache-1.0.0.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fd58d23c6041952da67030e57d6744b5b460f544245dfb7f69772927cb156f6
|
|
| MD5 |
82abaff96b35768c221b295fced74557
|
|
| BLAKE2b-256 |
2818439046eda4fe4699e9ef48a786387f53a1b02f2185647ccb8e2a86db3ec2
|
File details
Details for the file pychecksumcache-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pychecksumcache-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42e1f8996daee7a1f146e99a473ae4fa0bd42b136b39f1ca27d5c4ca3e43b438
|
|
| MD5 |
f555490f357caf355b1da8de87e38689
|
|
| BLAKE2b-256 |
46c7e04953da4b3299446b3a502a8aeb5752b36eadadac99bf79e5d967e27da8
|