Skip to main content

A lightweight Python library that uses MD5 checksums to track file changes and execute code only when files have been modified. PyChecksumCache maintains a persistent cache of file checksums, making it ideal for build systems, asset processors, incremental compilers, and any tools that need efficient change detection.

Project description

PyChecksumCache

A lightweight Python library that uses MD5 checksums to track file changes and execute code only when files have been modified. PyChecksumCache maintains a persistent cache of file checksums, making it ideal for build systems, asset processors, incremental compilers, and any tools that need efficient change detection.

How It Works

flowchart TD
    subgraph "Individual Processing"
        A[File to process] --> B{Calculate MD5}
        B --> C[Current checksum]
        D[(Checksum cache)] --> E{Compare with cached checksum}
        C --> E
        
        E -->|Different or not in cache| F[Execute code]
        E -->|Same| G[Skip processing]
        
        F --> H[Update cache with new checksum]
    end
    
    style A fill:#f9f9f9,stroke:#333,stroke-width:2px
    style D fill:#c9e6ff,stroke:#0066cc,stroke-width:2px
    style F fill:#d9f7be,stroke:#389e0d,stroke-width:2px
    style G fill:#ffccc7,stroke:#cf1322,stroke-width:2px
flowchart TD
    subgraph "Individual Processing"
        A1[Input Files] --> B1{Check\nChecksums}
        B1 -->|Changed| C1[Process\nEach Changed File]
        B1 -->|Unchanged| D1[Skip\nUnchanged Files]
        C1 --> E1[Multiple\nOutput Files]
    end
    
    subgraph "Aggregate Processing"
        A2[Input Files] --> B2{Any File\nChanged?}
        B2 -->|Yes| C2[Process All Files\nTogether]
        B2 -->|No| D2[Skip\nProcessing]
        C2 --> E2[Single\nOutput File]
    end
    
    style A1 fill:#f9f9f9,stroke:#333,stroke-width:2px
    style A2 fill:#f9f9f9,stroke:#333,stroke-width:2px
    
    style C1 fill:#d9f7be,stroke:#389e0d,stroke-width:2px
    style C2 fill:#d9f7be,stroke:#389e0d,stroke-width:2px
    
    style D1 fill:#ffccc7,stroke:#cf1322,stroke-width:2px
    style D2 fill:#ffccc7,stroke:#cf1322,stroke-width:2px
    
    style E1 fill:#b5f5ec,stroke:#13a8a8,stroke-width:2px
    style E2 fill:#ffd666,stroke:#d48806,stroke-width:2px

Features

  • Track changes to files using MD5 checksums
  • Persistent cache storage in JSON format
{
  "tests/file1.txt": "ff1e0283123d14cf8bd52ac449770017",
  "tests/file2.txt": "b445bf8b5da4cf880dd14e98c18c1bfa"
}
  • Execute functions only when file content has changed
  • Aggregate multiple files into a single output file
  • Batch transform multiple files with automatic output management
  • Full async/await support
  • Works with Python 3.10+
  • No external dependencies

Installation

pip install pychecksumcache

Or with uv:

uv pip install pychecksumcache

Quick Start

Basic Usage

import pytest
from pychecksumcache import PyChecksumCache
import os

input_files = ["tests/file1.txt", "tests/file2.txt"]
output_folder = "output"
output_extension = ".generated.txt"

@pytest.mark.asyncio
async def test_pychecksumcache_skipped():
    # Define a transformation function
    def transform_func(input_path, output_path):
        with open(input_path, 'r') as infile, open(output_path, 'w') as outfile:
            content = infile.read()
            # Apply some transformation
            transformed = content.upper()  # For example, convert to uppercase
            outfile.write(transformed)

    # Perform the transformation
    results = PyChecksumCache().transform(input_files, output_folder, output_extension, transform_func)

    # See which files were processed
    for output_file, was_transformed in results:
        assert not was_transformed, f"Skipped (unchanged): {output_file}"
    assert os.path.exists("checksum_cache.json") is True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pychecksumcache-1.1.0.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pychecksumcache-1.1.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file pychecksumcache-1.1.0.tar.gz.

File metadata

  • Download URL: pychecksumcache-1.1.0.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pychecksumcache-1.1.0.tar.gz
Algorithm Hash digest
SHA256 310825b5f07172409a993fc7790e6af2267c837832505daf0fed44eb58aa1c1e
MD5 68d1a703572c420da446114713f3e8cf
BLAKE2b-256 7d7e78c8424d5cd0744d403a260d993ac6b0b7404e123da132a62c6e51bf868f

See more details on using hashes here.

File details

Details for the file pychecksumcache-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pychecksumcache-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4033ae3a2602e4ddd3fa37bcf553a41999cdd37c827f9a98c0619d6729c271e9
MD5 345dd0a923906e5bc8784cb2136487fe
BLAKE2b-256 bba370dfaef58dd2697d5ca00786f5eaf66a50b7e8f10c7710446ad7d0b039a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page