Skip to main content

Skip pipeline steps when inputs are unchanged — content-aware, with module dependency tracking

Project description

pycache_skip

Skip pipeline steps when their inputs have not changed.

uv add pycache_skip

What it does

cache_skip wraps a pipeline step function and skips re-execution when all inputs are unchanged. It stores a compact state file (.input_state.json) alongside each output directory. On subsequent calls it compares the current inputs against the stored state and only reruns the function when something actually changed.

Usage

Basic example (single input directory)

from pathlib import Path
from cache_skip import cache_skip, Dirmaker

dm = Dirmaker(Path("/data/pipeline/run-001"))

@cache_skip
def step_transform(raw: Path, *, _output: Path) -> Path:
    # heavy transformation ...
    return _output

# First call — runs the function and records input state.
step_transform(Path("/data/raw"), _output=dm.path_for("transform"))

# Second call — skips the function, returns the output path immediately.
step_transform(Path("/data/raw"), _output=dm.path_for("transform"))

Example with non-Path args

Non-Path arguments (dates, strings, ints, etc.) are also part of the cache key. Changing them triggers a rerun.

import datetime as dt

@cache_skip(track_dependencies=False)
def step_build_config(
    schedule_date: dt.date,
    template: Path,
    *,
    _output: Path,
) -> Path:
    ...

# Changing schedule_date from 2025-01-01 to 2025-01-02 invalidates the cache.

Dirmaker companion

Dirmaker allocates named output directories under a staging root. Use path_for(name) to resolve the path without side effects (for @cache_skip), or new_output_dir(name) to delete and recreate explicitly.

dm = Dirmaker(Path("/data/pipeline/run-001"))

# Pass path to decorator — decorator manages deletion on rerun.
step_transform(raw, _output=dm.path_for("transform"))

# Or manage the directory yourself:
out = dm.new_output_dir("transform")   # deletes existing, creates fresh

How invalidation works

Three-tier change detection on every call after the first:

  1. Args hash — all non-Path, non-_output arguments are hashed via repr(). A change in any scalar argument (date, string, int, …) triggers a rerun immediately.

  2. Dependency hash — the source files of the decorated function and all modules it imports (static AST analysis) are hashed. Editing the function's source code triggers a rerun. Disable with track_dependencies=False.

  3. File content hash — every file under each input Path is compared. Metadata (mtime, inode, size) is checked first as a fast path. If metadata is identical the stored hash is trusted. If metadata drifted but content hash matches, the state file is updated silently without a rerun (handles rsync / cp -p copies with timestamp noise).

track_dependencies

@cache_skip(track_dependencies=False)
def step(...):
    ...

Set track_dependencies=False to skip module source hashing. Useful when the function imports large, rarely-changing libraries and startup cost matters, or in tests.

Comparison with auto_skip

cache_skip is a simpler, self-contained alternative to auto_skip:

Feature cache_skip auto_skip
Input detection explicit Path args strace / audit hooks
Non-Path args hashed ignored
Module dep tracking static AST runtime import list
External deps xxhash, loguru heavier stack
Output format dir with .input_state.json opaque cache store

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycache_skip-0.1.2.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycache_skip-0.1.2-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file pycache_skip-0.1.2.tar.gz.

File metadata

  • Download URL: pycache_skip-0.1.2.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for pycache_skip-0.1.2.tar.gz
Algorithm Hash digest
SHA256 65c8dc207cff7b725a40f1691eb02017afe42a26225f99f91c3fea08abbbf451
MD5 266d35619cfa53ce1b5bacfdf5bc6391
BLAKE2b-256 e0bcaabc3bd529ac056ee93e5d93ff3d7cd9b36fe5a867dfe4ecfa5d331ce214

See more details on using hashes here.

File details

Details for the file pycache_skip-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pycache_skip-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for pycache_skip-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ebb166cd3c7a5b0fea80c942b17079093d4108553a69e458f33e5c3d17ef1aa0
MD5 ee7929447feba15c2358256a4faf4d63
BLAKE2b-256 1f55b719f8f3d5b7d89c3ed62917de24c53932764a10ca87c8f0f2bab48667ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page