Skip to main content

checkpointer adds code-aware caching to Python functions, maintaining correctness and speeding up execution as your code changes.

Project description

checkpointer · License pypi pypi

checkpointer is a Python library offering a decorator-based API for memoizing (caching) function results with code-aware cache invalidation. It works with sync and async functions, supports multiple storage backends, and invalidates caches automatically when your code or dependencies change - helping you maintain correctness, speed up execution, and smooth out your workflows by skipping redundant, costly operations.

📦 Installation

pip install checkpointer

🚀 Quick Start

Apply the @checkpoint decorator to any function:

from checkpointer import checkpoint

@checkpoint
def expensive_function(x: int) -> int:
    print("Computing...")
    return x ** 2

result = expensive_function(4)  # Computes and stores the result
result = expensive_function(4)  # Loads from the cache

🧠 How It Works

When you decorate a function with @checkpoint and call it, checkpointer computes a unique identifier that represents that specific call. This identifier is based on:

  • The function's source code and all its user-defined dependencies,
  • Global variables used by the function (if capturing is enabled or explicitly annotated),
  • The actual arguments passed to the function.

checkpointer then looks up this identifier in its cache. If a valid cached result exists, it returns that immediately. Otherwise, it runs the original function, stores the result, and returns it.

checkpointer is designed to be flexible through features like:

  • Support for decorated methods, correctly caching results bound to instances.
  • Support for decorated async functions, compatible with any async runtime.
  • Robust hashing, covering complex Python objects and large NumPy/PyTorch arrays via its internal ObjectHash.
  • Targeted hashing, allowing you to optimize how arguments and captured variables are hashed.
  • Multi-layered caching, letting you stack decorators for layered caching strategies without losing cache consistency.

🚨 What Causes Cache Invalidation?

To ensure cache correctness, checkpointer tracks two types of hashes:

1. Function Identity Hash (Computed Once Per Function)

This hash represents the decorated function itself and is computed once (usually on first invocation). It covers:

  • Function Code and Signature:
    The actual logic and parameters of the function are hashed - but not parameter type annotations or formatting details like whitespace, newlines, comments, or trailing commas, which do not trigger invalidation.

  • Dependencies:
    All user-defined functions and methods that the decorated function calls or relies on, including indirect dependencies, are included recursively. Dependencies are identified by:

    • Inspecting the function's global scope for referenced functions and objects.
    • Inferring from the function's argument type annotations.
    • Analyzing object constructions and method calls to identify classes and methods used.
  • Exclusions:
    Changes elsewhere in the module unrelated to the function or its dependencies do not cause invalidation.

2. Call Hash (Computed on Every Function Call)

Every function call produces a call hash, combining:

  • Passed Arguments:
    Includes positional and keyword arguments, combined with default values. Changing defaults alone doesn't necessarily trigger invalidation unless it affects actual call values.

  • Captured Global Variables:
    When capture=True or explicit capture annotations are used, checkpointer includes referenced global variables in the call hash. Variables annotated with CaptureMe are hashed on every call, causing immediate cache invalidation if they change. Variables annotated with CaptureMeOnce are hashed only once per Python session, improving performance by avoiding repeated hashing.

  • Custom Argument Hashing:
    Using HashBy annotations, arguments or captured variables can be transformed before hashing (e.g., sorting lists to ignore order), allowing more precise or efficient call hashes.

💡 Usage

Once a function is decorated with @checkpoint, you can interact with its caching behavior using the following methods:

  • expensive_function(...):
    Call the function normally. This will compute and cache the result or load it from cache.

  • expensive_function.rerun(...):
    Force the original function to execute and overwrite any existing cached result.

  • expensive_function.fn(...):
    Call the undecorated function directly, bypassing the cache (useful in recursion to prevent caching intermediate steps).

  • expensive_function.get(...):
    Retrieve the cached result without executing the function. Raises CheckpointError if no valid cache exists.

  • expensive_function.exists(...):
    Check if a cached result exists without computing or loading it.

  • expensive_function.delete(...):
    Remove the cached entry for given arguments.

  • expensive_function.reinit(recursive: bool = True):
    Recalculate the function identity hash and recapture CaptureMeOnce variables, updating the cached function state within the same Python session.

⚙️ Configuration & Customization

The @checkpoint decorator accepts the following parameters:

  • storage (Type: str or checkpointer.Storage, Default: "pickle")
    Storage backend to use: "pickle" (disk-based, persistent), "memory" (in-memory, non-persistent), or a custom Storage class.

  • directory (Type: str or pathlib.Path or None, Default: ~/.cache/checkpoints)
    Base directory for disk-based checkpoints (only for "pickle" storage).

  • capture (Type: bool, Default: False)
    If True, includes global variables referenced by the function in call hashes (except those excluded via NoHash).

  • expiry (Type: Callable[[datetime.datetime], bool] or datetime.timedelta, Default: None)
    A custom callable that receives the datetime timestamp of a cached result. It should return True if the cached result is considered expired and needs recomputation, or False otherwise.

  • fn_hash_from (Type: Any, Default: None)
    Override the computed function identity hash with any hashable object you provide (e.g., version strings, config IDs). This gives you explicit control over the function's version and when its cache should be invalidated.

  • when (Type: bool, Default: True)
    Enable or disable checkpointing dynamically, useful for environment-based toggling.

  • verbosity (Type: int (0, 1, or 2), Default: 1)
    Controls the level of logging output from checkpointer.

    • 0: No output.
    • 1: Shows when functions are computed and cached.
    • 2: Also shows when cached results are remembered (loaded from cache).

🔬 Customize Argument Hashing

You can customize how arguments are hashed without modifying the actual argument values to improve cache hit rates or speed up hashing.

  • Annotated[T, HashBy[fn]]:
    Transform the argument via fn(argument) before hashing. Useful for normalization (e.g., sorting lists) or optimized hashing for complex inputs.

  • NoHash[T]:
    Exclude the argument from hashing completely, so changes to it won't trigger cache invalidation.

Example:

from typing import Annotated
from checkpointer import checkpoint, HashBy, NoHash
from pathlib import Path
import logging

def file_bytes(path: Path) -> bytes:
    return path.read_bytes()

@checkpoint
def process(
    numbers: Annotated[list[int], HashBy[sorted]],   # Hash by sorted list
    data_file: Annotated[Path, HashBy[file_bytes]],  # Hash by file content
    log: NoHash[logging.Logger],                     # Exclude logger from hashing
):
    ...

In this example, the hash for numbers ignores order, data_file is hashed based on its contents rather than path, and changes to log don't affect caching.

🎯 Capturing Global Variables

checkpointer can include captured global variables in call hashes - these are globals your function reads during execution that may affect results.

Use capture=True on @checkpoint to capture all referenced globals (except those explicitly excluded with NoHash).

Alternatively, you can opt-in selectively by annotating globals with:

  • CaptureMe[T]:
    Capture the variable on every call (triggers invalidation on changes).

  • CaptureMeOnce[T]:
    Capture once per Python session (for expensive, immutable globals).

You can also combine these with HashBy to customize how captured variables are hashed (e.g., hash by subset of attributes).

Example:

from typing import Annotated
from checkpointer import checkpoint, CaptureMe, CaptureMeOnce, HashBy
from pathlib import Path

def file_bytes(path: Path) -> bytes:
    return path.read_bytes()

captured_data: CaptureMe[Annotated[Path, HashBy[file_bytes]]] = Path("data.txt")
session_config: CaptureMeOnce[dict] = {"mode": "prod"}

@checkpoint
def process():
    # `captured_data` is included in the call hash on every call, hashed by file content
    # `session_config` is hashed once per session
    ...

🗄️ Custom Storage Backends

Implement your own storage backend by subclassing checkpointer.Storage and overriding required methods.

Within storage methods, call_hash identifies calls by arguments. Use self.fn_id() to get function identity (name + hash/version), important for organizing checkpoints.

Example:

from checkpointer import checkpoint, Storage
from datetime import datetime

class MyCustomStorage(Storage):
    def exists(self, call_hash):
        fn_dir = self.checkpointer.directory / self.fn_id()
        return (fn_dir / call_hash).exists()

    def store(self, call_hash, data):
        ...  # Store serialized data
        return data  # Must return data to checkpointer

    def checkpoint_date(self, call_hash): ...
    def load(self, call_hash): ...
    def delete(self, call_hash): ...

@checkpoint(storage=MyCustomStorage)
def custom_cached_function(x: int):
    return x ** 2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

checkpointer-2.14.10.tar.gz (43.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

checkpointer-2.14.10-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file checkpointer-2.14.10.tar.gz.

File metadata

  • Download URL: checkpointer-2.14.10.tar.gz
  • Upload date:
  • Size: 43.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for checkpointer-2.14.10.tar.gz
Algorithm Hash digest
SHA256 4f660875e9949cfe22b6d5a933e8225ac1ba866460d68dd42dc9bc9b795a8f89
MD5 1f4957d7965474ffed6d0871b914e51f
BLAKE2b-256 e935fb9dbfae14793819218a5a355ecadab835b734f27cf127c3c05c2b606a62

See more details on using hashes here.

File details

Details for the file checkpointer-2.14.10-py3-none-any.whl.

File metadata

  • Download URL: checkpointer-2.14.10-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for checkpointer-2.14.10-py3-none-any.whl
Algorithm Hash digest
SHA256 85a527f7cb51d3dbefc1aae896899c7f8d4b6bc6a092a015b7673e0ddf90310f
MD5 971ba1688c1f967348f354b1bf5d8916
BLAKE2b-256 14bc5ae99f76090ea11cb2a75571d311ef84a3a2e84fe8e1cdede216166f2550

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page