Automatic, dependency-aware memoization for Python — a modern pure-Python reimplementation of IncPy (Guo & Engler, ISSTA 2011).
Project description
rote
Automatic, dependency-aware memoization for Python research scripts. No interpreter fork, no decorators required.
rote is a pure-Python reimplementation of IncPy (Guo & Engler, ISSTA 2011) on contemporary CPython (≥3.12). Same goal as the original: observe a script at runtime, find the function calls that are pure and long-running, and persist their results across runs. The implementation is new, built on sys.monitoring (PEP 669) and audit hooks (PEP 578), so no patched interpreter is needed.
There's a companion site that walks through the design, the speedups, and where rote diverges from the paper. If you're reading this for the first time, start there.
Why
You change one line in analyze.py, save, re-run. Plain Python re-does the 90 seconds of feature extraction, the 30 seconds of model training, and the 2 seconds of plotting, all to look at one tweaked plot. That re-work is what IncPy was built to remove in 2011. It's still the problem.
Install
rote isn't on PyPI yet, so install from source for now:
# Plain pip
pip install "git+https://github.com/puppyum/rote.git"
pip install "rote[all] @ git+https://github.com/puppyum/rote.git" # plus pyarrow, numpy, safetensors
# uv (recommended for research workflows)
uv add "git+https://github.com/puppyum/rote.git"
Local development:
git clone https://github.com/puppyum/rote.git
cd rote
uv venv --python 3.13 && source .venv/bin/activate
uv pip install -e ".[dev,all]"
Requires Python 3.12 or later. Apache-2.0.
Use
Three ways, ordered by how much you have to opt in.
Zero-config, paper-style
Prefix your script invocation:
rote run analyze.py
The CLI AST-wraps every top-level function in your script and in any helper modules it imports. Run the script a second time after a downstream edit; only the changed function re-executes.
Decorator
When you want to be explicit:
import rote
@rote.cache
def build_features(df):
...
Inside a notebook or REPL
import rote
with rote.auto():
result = my_pipeline(data)
In Jupyter, %load_ext rote makes every cell a memoization candidate.
What gets cached
A function call is memoized when all of these hold:
- It ran for at least
min_duration_s(default 1 s). Below that, the cache write costs more than re-running. - No impure I/O happened during the call. Network, subprocess, file appends,
exec/eval, and stdlib non-determinism sources (time.time(),random.random(),uuid.uuid4(),os.environ) all disqualify it. - No argument mutated. Arguments are fingerprinted on entry and re-checked on exit.
- The function's source, every function it transitively calls, and every file it read are unchanged from the cached version.
If any check fails, the cache misses and the function runs. A cached value that can't be proven safe never gets returned; the tests/correctness/ suite includes 36 perturbation tests and 60 differential tests that fail loudly if a cached value drifts from a fresh run.
The serializer dispatches by type: Arrow IPC for DataFrames, numpy.save for arrays, safetensors for Torch tensors, msgpack for primitives, cloudpickle as a last resort. Rationale in docs/DECISIONS.md.
Measured performance
Apple Silicon, Python 3.13. Warm-hit timings are medians of 20 iterations; the cross-process and pipeline numbers are medians of 5 runs.
Per-function warm-hit cost against joblib.Memory:
| Workload | joblib warm | rote warm | speedup |
|---|---|---|---|
| 2 M-term Leibniz | 96 µs | 31 µs | 3.09× |
| Basel sum | 101 µs | 30 µs | 3.37× |
| 400×400 NumPy QR | 253 µs | 33 µs | 7.68× |
| 200K-char bag-of-words | 93 µs | 31 µs | 2.97× |
| 200×200 matrix inverse | 104 µs | 49 µs | 2.14× |
Geomean across the five workloads: 3.48× faster than joblib.Memory.
On the paper-style multi-stage pipeline (parse → aggregate → format), with an edit to the final stage and everything in one process: plain Python re-runs the whole thing in 264 ms; rote skips the upstream stages and finishes the warm run in 6.3 ms, about 42× faster than the cold pipeline. joblib.Memory is faster on the same benchmark (1.4 ms warm) because it keys purely on argument values, where rote content-hashes the intermediate files on every hit so a mtime-preserving edit cannot return a stale result.
The tradeoff at the level you actually live with — edit, save, rerun, fresh Python process each time:
| wall-clock | vs plain | |
|---|---|---|
| plain Python (whole pipeline) | 1.83 s | — |
rote warm (fresh interpreter) |
0.38 s | 4.8× |
joblib warm (fresh interpreter) |
0.19 s | 9.6× |
A persistent stat → content-hash table in the cache store is what keeps rote's file-dep validation cheap across process boundaries: each warm subprocess does a stat() per dependency and reuses the stored hash unless (size, mtime_ns, ctime_ns) change. Joblib still wins here because it skips content validation outright. Full numbers, the correctness/speed tradeoff, and a serializer breakdown live in docs/BENCHMARKS.md.
Test suite: 381 tests pass, including 60 differential and 36 perturbation tests. On the corpus/realistic/ subset (five multi-second scripts), auto-mode eliminates 100% of cold compute on warm re-run. mypy --strict and ruff clean. CI runs Linux, macOS, and Windows on Python 3.12 and 3.13.
Public API
| Name | Purpose |
|---|---|
rote.cache |
Decorator. The explicit escape hatch. |
rote.auto() |
Context manager. Every call inside the block is a candidate. |
rote.invalidate(target=None) |
Drop entries. target is a function, a qualname string, or None for everything. |
rote.clear() |
Wipe all tiers (in-memory + SQLite + blobs). |
rote.configure(**kwargs) |
Override defaults (cache dir, min_duration_s, fsync, telemetry, ...). |
rote.stats() |
Hits, misses, time saved, invalidation reasons. |
rote.graph() |
A networkx.DiGraph of observed caller → callee edges. |
rote run <script> |
CLI: run a script under auto-mode. |
rote status |
CLI: print stats for the cache in the CWD. |
rote clear |
CLI: wipe the cache in the CWD. |
Layout
src/rote/ the package (13 modules, ~4K lines)
tests/ unit / property / integration / correctness suites
docs/ architecture, decisions log, benchmarks, evaluation
bench/ workload + serializer microbenchmarks
corpus/ 30 fast scripts for differential tests, plus a realistic/ subset for coverage
examples/ demos used by the integration tests
Architecture in detail: docs/architecture.md. Every paper deviation logged: docs/DECISIONS.md. Recent changes: CHANGELOG.md.
Limitations
- Python 3.12+ only.
sys.monitoring(PEP 669) is the load-bearing primitive; there's no fallback for older interpreters. - Functions doing real I/O are skipped. Network reads, append-mode file writes, and subprocess calls all disqualify a call. The system is built for compute-heavy steps that take a data file in and return a value out.
- First run pays an AST-transform cost. Auto-mode rewrites your script through
libcstonce per source change; the rewrite is cached on disk after that. - The 1-second default threshold is conservative. Sub-second calls aren't memoized unless you lower it explicitly with
rote.configure(min_duration_s=0.05).
License
Apache-2.0. See LICENSE.
Citing IncPy
If you use rote in academic work, cite the original paper:
Guo, P. J., & Engler, D. (2011). Using automatic persistent memoization to
facilitate data analysis scripting. Proceedings of the 2011 International
Symposium on Software Testing and Analysis (ISSTA '11), 287–297.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rote-0.1.0.tar.gz.
File metadata
- Download URL: rote-0.1.0.tar.gz
- Upload date:
- Size: 249.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
227e25f2cca04b7f28f0c7e876a0927e237baf91519d95ac806ba2dbdc111230
|
|
| MD5 |
1fe48bfbf695e1f5ffa214113c796451
|
|
| BLAKE2b-256 |
78b211e35a45e84191b5f45dd6bc0da1dd1967bb1865f58829197817dd58ac5c
|
Provenance
The following attestation bundles were made for rote-0.1.0.tar.gz:
Publisher:
release.yml on puppyum/rote
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rote-0.1.0.tar.gz -
Subject digest:
227e25f2cca04b7f28f0c7e876a0927e237baf91519d95ac806ba2dbdc111230 - Sigstore transparency entry: 1576176713
- Sigstore integration time:
-
Permalink:
puppyum/rote@8bb488e416930343e80178fe75833b40cf386b94 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/puppyum
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8bb488e416930343e80178fe75833b40cf386b94 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rote-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rote-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffb1857d3a04ea6d6a2c84e6183a0c4f306aee5579d1f814c5ee62c4a1e49505
|
|
| MD5 |
bafb6dfb1394a9c1dcba8b66dc0050ae
|
|
| BLAKE2b-256 |
2b596dc85646d14d8ad12bd6a9125e1db155ee97fcb7618231b627ac35919ad2
|
Provenance
The following attestation bundles were made for rote-0.1.0-py3-none-any.whl:
Publisher:
release.yml on puppyum/rote
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rote-0.1.0-py3-none-any.whl -
Subject digest:
ffb1857d3a04ea6d6a2c84e6183a0c4f306aee5579d1f814c5ee62c4a1e49505 - Sigstore transparency entry: 1576176721
- Sigstore integration time:
-
Permalink:
puppyum/rote@8bb488e416930343e80178fe75833b40cf386b94 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/puppyum
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8bb488e416930343e80178fe75833b40cf386b94 -
Trigger Event:
push
-
Statement type: