Skip to main content

Skip pipeline steps when inputs are unchanged — content-aware, with module dependency tracking

Project description

pycache_skip

Skip pipeline steps when their inputs have not changed.

uv add pycache_skip

What it does

cache_skip wraps a pipeline step function and skips re-execution when all inputs are unchanged. It stores a compact state file (.input_state.json) alongside each output directory. On subsequent calls it compares the current inputs against the stored state and only reruns the function when something actually changed.

Usage

Basic example (single input directory)

from pathlib import Path
from cache_skip import cache_skip, Dirmaker

dm = Dirmaker(Path("/data/pipeline/run-001"))

@cache_skip
def step_transform(raw: Path, *, _output: Path) -> Path:
    # heavy transformation ...
    return _output

# First call — runs the function and records input state.
step_transform(Path("/data/raw"), _output=dm.path_for("transform"))

# Second call — skips the function, returns the output path immediately.
step_transform(Path("/data/raw"), _output=dm.path_for("transform"))

Example with non-Path args

Non-Path arguments (dates, strings, ints, etc.) are also part of the cache key. Changing them triggers a rerun.

import datetime as dt

@cache_skip(track_dependencies=False)
def step_build_config(
    schedule_date: dt.date,
    template: Path,
    *,
    _output: Path,
) -> Path:
    ...

# Changing schedule_date from 2025-01-01 to 2025-01-02 invalidates the cache.

Dirmaker companion

Dirmaker allocates named output directories under a staging root. Use path_for(name) to resolve the path without side effects (for @cache_skip), or new_output_dir(name) to delete and recreate explicitly.

dm = Dirmaker(Path("/data/pipeline/run-001"))

# Pass path to decorator — decorator manages deletion on rerun.
step_transform(raw, _output=dm.path_for("transform"))

# Or manage the directory yourself:
out = dm.new_output_dir("transform")   # deletes existing, creates fresh

How invalidation works

Three-tier change detection on every call after the first:

  1. Args hash — all non-Path, non-_output arguments are hashed via repr(). A change in any scalar argument (date, string, int, …) triggers a rerun immediately.

  2. Dependency hash — the source files of the decorated function and all modules it imports (static AST analysis) are hashed. Editing the function's source code triggers a rerun. Disable with track_dependencies=False.

  3. File content hash — every file under each input Path is compared. Metadata (mtime, inode, size) is checked first as a fast path. If metadata is identical the stored hash is trusted. If metadata drifted but content hash matches, the state file is updated silently without a rerun (handles rsync / cp -p copies with timestamp noise).

track_dependencies

@cache_skip(track_dependencies=False)
def step(...):
    ...

Set track_dependencies=False to skip module source hashing. Useful when the function imports large, rarely-changing libraries and startup cost matters, or in tests.

Comparison with auto_skip

cache_skip is a simpler, self-contained alternative to auto_skip:

Feature cache_skip auto_skip
Input detection explicit Path args strace / audit hooks
Non-Path args hashed ignored
Module dep tracking static AST runtime import list
External deps xxhash, loguru heavier stack
Output format dir with .input_state.json opaque cache store

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycache_skip-0.1.0.tar.gz (42.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycache_skip-0.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file pycache_skip-0.1.0.tar.gz.

File metadata

  • Download URL: pycache_skip-0.1.0.tar.gz
  • Upload date:
  • Size: 42.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for pycache_skip-0.1.0.tar.gz
Algorithm Hash digest
SHA256 14c51f52d297be3d08afdb98dc550d3052cd9b3faeba044d3123bb05be6e253a
MD5 d1a2ffa6d783c1e4586383ca8a0fb5e6
BLAKE2b-256 1cefe8952ccbe38b02f8b05c9b7116bbc43b58595816253afa6463b95730795a

See more details on using hashes here.

File details

Details for the file pycache_skip-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pycache_skip-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for pycache_skip-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 310712c3fd3ffa2c187fba8e5c94ec4d929742356b3afaf8882c796fe95221f7
MD5 50ce6b964d21dd8b816b5401239d8e6c
BLAKE2b-256 8f6204693591ac23c02a9edfeb075f326d40e60646aed8023bfed3441a3e349e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page