Skip to main content

Cache function results and side effects (stdout, stderr, file writes) with automatic file I/O discovery via strace or audit hooks

Project description

pycasher

Cache Python function results and their side effects — stdout, stderr, and filesystem writes — with automatic invalidation.

uv add pycasher

If you want the casher CLI outside a project environment, install it as a tool instead:

uv tool install pycasher

What makes it different

Most caching libraries cache return values. casher also captures and replays:

  • stdout/stderr printed during execution
  • Files written by the function (restored from cache on hit)
  • Files read by the function (used as cache keys — change a true upstream input file, cache auto-invalidates)

No manual file declarations needed. casher discovers file I/O automatically via strace (subprocess mode) or wrapped Python file handles plus tracked shutil.copyfile()-based copies in in-process mode. Files that are written before they are first read during one invocation are treated as generated outputs, not cache inputs.

Usage

from casher import cached, expand_input_dir

@cached
def train(data_path: str, output_path: str, lr: float = 0.01) -> dict:
    df = read_csv(data_path)
    model = fit(df, lr=lr)
    save(model, output_path)
    return {"accuracy": model.score}

# First call — runs function, traces file I/O, caches everything
result = train("train.csv", "model.pkl")

# Second call — instant replay from cache (model.pkl restored too)
result = train("train.csv", "model.pkl")

# Change train.csv — casher detects it, re-runs automatically

For directory-shaped inputs, keep the argument semantics explicit instead of making every directory Path recursive by magic:

from pathlib import Path

from casher import cached, expand_input_dir


@cached(input_files=lambda data_dir: expand_input_dir(data_dir, "*.csv"))
def build_dataset(data_dir: Path) -> int:
    return len(list(data_dir.glob("*.csv")))

Path arguments that point to files are hashed by file content for the function-argument portion of the cache key. Auto-discovered input files remain path-sensitive and content-sensitive.

Workflow-style functions can declare output directories explicitly to keep reads under those roots out of input_files:

from pathlib import Path

from casher import cached


work_dir = Path("work")


@cached(output_roots=[work_dir], replay_outputs="if-missing")
def assemble_workset() -> Path:
    generated = work_dir / "reference" / "mworld.par"
    generated.parent.mkdir(parents=True, exist_ok=True)
    generated.write_text("patched content")
    generated.read_text()
    return generated

On cache hit, unchanged output files are not restored again. You can also set replay_outputs=False or replay_outputs="if-missing" to control file replay.

Cache any shell command without code changes:

casher -- python train.py --data train.csv

Key features

  • Automatic file tracking: strace (kernel-level, catches C extensions) or wrapped file handles plus shutil.copy2() / copytree() coverage in in-process mode
  • Generated-output awareness: files written before their first read are excluded from input_files
  • Dependency invalidation: changes to imported .py files invalidate the cache
  • Narrow dependency overrides: use dep_files=[...] when only specific modules should invalidate a function
  • File-hash memoization: unchanged files reuse cached content hashes from a small SQLite metadata store
  • LRU eviction: configurable via max_cache_bytes or CASHER_MAX_CACHE_BYTES env var (default 32 GB)
  • Faster hits for large artifacts: output replay skips files whose current hash already matches the cached output
  • DataFrame support: polars and pandas DataFrames serialized via Arrow IPC
  • Environment-aware: include env vars in cache key with env_vars=["MY_VAR"]
  • Miss diagnostics: diagnose_misses=True logs which recorded input changed or disappeared
  • Structured logging: loguru INFO for config changes, enablement, hit/miss, mode, eviction
  • Explicit directory expansion helper: expand_input_dir() for stable input_files lists

Configuration

Env var Default Description
CASHER_CACHE_DIR unset Cache storage directory. Caching stays disabled until this is set.
CASHER_MAX_CACHE_BYTES 34359738368 (32 GB) Max cache size before LRU eviction

Or set programmatically (takes priority over env vars):

from casher import configure, get_config

configure(cache_dir="/data/my_cache", max_cache_bytes=10 * 1024**3)
print(get_config())  # effective config

If no cache directory is configured via CASHER_CACHE_DIR, configure(cache_dir=...), @cached(cache_dir=...), or casher --cache-dir ..., casher runs transparently without caching and emits a one-time warning.

Platform support

Full caching on Linux only (requires strace for subprocess mode, fcntl for locking). On macOS and Windows the decorator is a transparent pass-through — functions execute normally, caching is skipped with a one-time warning.

Documentation

See documentation/ for detailed docs:

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycasher-0.5.10.tar.gz (88.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycasher-0.5.10-py3-none-any.whl (50.5 kB view details)

Uploaded Python 3

File details

Details for the file pycasher-0.5.10.tar.gz.

File metadata

  • Download URL: pycasher-0.5.10.tar.gz
  • Upload date:
  • Size: 88.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for pycasher-0.5.10.tar.gz
Algorithm Hash digest
SHA256 46b2aec2debc7f99f3692152867ed0c4d3e004e6e6d05dc9860c92ac5fe22696
MD5 180df3a0066be00260abe752418ddc33
BLAKE2b-256 f237a73df882b2d238e07aa7595af0b25014642c410f0dd358d844bc06b6c06b

See more details on using hashes here.

File details

Details for the file pycasher-0.5.10-py3-none-any.whl.

File metadata

  • Download URL: pycasher-0.5.10-py3-none-any.whl
  • Upload date:
  • Size: 50.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for pycasher-0.5.10-py3-none-any.whl
Algorithm Hash digest
SHA256 67097e0e4dfa64a8f4bbaca2cc34c92cfa51a946abc7119ae39d848df7498a8e
MD5 9d09828be8d57caeeb4e52caeea3af8d
BLAKE2b-256 0b6f19781d5c73e5a991769134b8c13bb769115326320a34e8202a4d66f93a09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page