Skip pipeline steps when inputs are unchanged — content-aware, with module dependency tracking
Project description
pycache_skip
Skip pipeline steps when their inputs have not changed.
uv add pycache_skip
What it does
cache_skip wraps a pipeline step function and skips re-execution when all
inputs are unchanged. It stores a compact state file (.input_state.json)
alongside each output directory. On subsequent calls it compares the current
inputs against the stored state and only reruns the function when something
actually changed.
Usage
Basic example (single input directory)
from pathlib import Path
from cache_skip import cache_skip, Dirmaker
dm = Dirmaker(Path("/data/pipeline/run-001"))
@cache_skip
def step_transform(raw: Path, *, _output: Path) -> Path:
# heavy transformation ...
return _output
# First call — runs the function and records input state.
step_transform(Path("/data/raw"), _output=dm.path_for("transform"))
# Second call — skips the function, returns the output path immediately.
step_transform(Path("/data/raw"), _output=dm.path_for("transform"))
Example with non-Path args
Non-Path arguments (dates, strings, ints, etc.) are also part of the cache
key. Changing them triggers a rerun.
import datetime as dt
@cache_skip(track_dependencies=False)
def step_build_config(
schedule_date: dt.date,
template: Path,
*,
_output: Path,
) -> Path:
...
# Changing schedule_date from 2025-01-01 to 2025-01-02 invalidates the cache.
Dirmaker companion
Dirmaker allocates named output directories under a staging root. Use
path_for(name) to resolve the path without side effects (for @cache_skip),
or new_output_dir(name) to delete and recreate explicitly.
dm = Dirmaker(Path("/data/pipeline/run-001"))
# Pass path to decorator — decorator manages deletion on rerun.
step_transform(raw, _output=dm.path_for("transform"))
# Or manage the directory yourself:
out = dm.new_output_dir("transform") # deletes existing, creates fresh
How invalidation works
Three-tier change detection on every call after the first:
-
Args hash — all non-
Path, non-_outputarguments are hashed viarepr(). A change in any scalar argument (date, string, int, …) triggers a rerun immediately. -
Dependency hash — the source files of the decorated function and all modules it imports (static AST analysis) are hashed. Editing the function's source code triggers a rerun. Disable with
track_dependencies=False. -
File content hash — every file under each input
Pathis compared. Metadata (mtime, inode, size) is checked first as a fast path. If metadata is identical the stored hash is trusted. If metadata drifted but content hash matches, the state file is updated silently without a rerun (handlesrsync/cp -pcopies with timestamp noise).
track_dependencies
@cache_skip(track_dependencies=False)
def step(...):
...
Set track_dependencies=False to skip module source hashing. Useful when the
function imports large, rarely-changing libraries and startup cost matters, or
in tests.
Comparison with auto_skip
cache_skip is a simpler, self-contained alternative to auto_skip:
| Feature | cache_skip |
auto_skip |
|---|---|---|
| Input detection | explicit Path args |
strace / audit hooks |
| Non-Path args | hashed | ignored |
| Module dep tracking | static AST | runtime import list |
| External deps | xxhash, loguru |
heavier stack |
| Output format | dir with .input_state.json |
opaque cache store |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycache_skip-0.1.2.tar.gz.
File metadata
- Download URL: pycache_skip-0.1.2.tar.gz
- Upload date:
- Size: 43.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65c8dc207cff7b725a40f1691eb02017afe42a26225f99f91c3fea08abbbf451
|
|
| MD5 |
266d35619cfa53ce1b5bacfdf5bc6391
|
|
| BLAKE2b-256 |
e0bcaabc3bd529ac056ee93e5d93ff3d7cd9b36fe5a867dfe4ecfa5d331ce214
|
File details
Details for the file pycache_skip-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pycache_skip-0.1.2-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebb166cd3c7a5b0fea80c942b17079093d4108553a69e458f33e5c3d17ef1aa0
|
|
| MD5 |
ee7929447feba15c2358256a4faf4d63
|
|
| BLAKE2b-256 |
1f55b719f8f3d5b7d89c3ed62917de24c53932764a10ca87c8f0f2bab48667ce
|