Skip to main content

Skip pipeline steps when inputs are unchanged — content-aware, with module dependency tracking

Project description

pycache_skip

Skip pipeline steps when their inputs have not changed.

uv add pycache_skip

What it does

cache_skip wraps a pipeline step function and skips re-execution when all inputs are unchanged. It stores a compact state file (.input_state.json) alongside each output directory. On subsequent calls it compares the current inputs against the stored state and only reruns the function when something actually changed.

Usage

Basic example (single input directory)

from pathlib import Path
from cache_skip import cache_skip, Dirmaker

dm = Dirmaker(Path("/data/pipeline/run-001"))

@cache_skip
def step_transform(raw: Path, *, _output: Path) -> Path:
    # heavy transformation ...
    return _output

# First call — runs the function and records input state.
step_transform(Path("/data/raw"), _output=dm.path_for("transform"))

# Second call — skips the function, returns the output path immediately.
step_transform(Path("/data/raw"), _output=dm.path_for("transform"))

Example with non-Path args

Non-Path arguments (dates, strings, ints, etc.) are also part of the cache key. Changing them triggers a rerun.

import datetime as dt

@cache_skip(track_dependencies=False)
def step_build_config(
    schedule_date: dt.date,
    template: Path,
    *,
    _output: Path,
) -> Path:
    ...

# Changing schedule_date from 2025-01-01 to 2025-01-02 invalidates the cache.

Dirmaker companion

Dirmaker allocates named output directories under a staging root. Use path_for(name) to resolve the path without side effects (for @cache_skip), or new_output_dir(name) to delete and recreate explicitly.

dm = Dirmaker(Path("/data/pipeline/run-001"))

# Pass path to decorator — decorator manages deletion on rerun.
step_transform(raw, _output=dm.path_for("transform"))

# Or manage the directory yourself:
out = dm.new_output_dir("transform")   # deletes existing, creates fresh

How invalidation works

Three-tier change detection on every call after the first:

  1. Args hash — all non-Path, non-_output arguments are hashed via repr(). A change in any scalar argument (date, string, int, …) triggers a rerun immediately.

  2. Dependency hash — the source files of the decorated function and all modules it imports (static AST analysis) are hashed. Editing the function's source code triggers a rerun. Disable with track_dependencies=False.

  3. File content hash — every file under each input Path is compared. Metadata (mtime, inode, size) is checked first as a fast path. If metadata is identical the stored hash is trusted. If metadata drifted but content hash matches, the state file is updated silently without a rerun (handles rsync / cp -p copies with timestamp noise).

track_dependencies

@cache_skip(track_dependencies=False)
def step(...):
    ...

Set track_dependencies=False to skip module source hashing. Useful when the function imports large, rarely-changing libraries and startup cost matters, or in tests.

Comparison with auto_skip

cache_skip is a simpler, self-contained alternative to auto_skip:

Feature cache_skip auto_skip
Input detection explicit Path args strace / audit hooks
Non-Path args hashed ignored
Module dep tracking static AST runtime import list
External deps xxhash, loguru heavier stack
Output format dir with .input_state.json opaque cache store

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycache_skip-0.1.1.tar.gz (42.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycache_skip-0.1.1-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file pycache_skip-0.1.1.tar.gz.

File metadata

  • Download URL: pycache_skip-0.1.1.tar.gz
  • Upload date:
  • Size: 42.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for pycache_skip-0.1.1.tar.gz
Algorithm Hash digest
SHA256 100e9173358de0a6f2adf5d3f72974e98d5d2d0fff396cfacdc169ab6b5f1c2e
MD5 16829561a566fd7e2361f944a0218e76
BLAKE2b-256 2433c0b00b4af549c6983d82f7f66b56fcf006934b8e89348f9720b096fb4a8d

See more details on using hashes here.

File details

Details for the file pycache_skip-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pycache_skip-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for pycache_skip-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 23e1e061b428e6301cae56c731a99ed23a91085b89f23bcedba9c4d763d00550
MD5 631cce223da870d706fd6102008e1c10
BLAKE2b-256 f2e8a3324bfd84b25cb44e7bb43bd9a224a836bdfb9c0f241589411e6980dffc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page