Skip to main content

Persistent cache for Python functions

Project description

checkpointing

Persistent cache for Python functions.

Codecov PyPI - Python Version PyPI GitHub Workflow Status

Documentation

Introduction

checkpointing provides a decorator which allows you to cache the return value of a pure function[1], by default as a pickle file on the disk. When the function is called later with the same arguments, it automatically skips the function execution, retrieves the cached value and return.

For example,

from checkpointing import checkpoint

@checkpoint()
def calc(a, b):
    print(f"calc is running for {a}, {b}")
    return a + b

if __name__ == "__main__":
    result = calc(1, 2)
    print(f"result: {result}")

Run this script, and the output will be

calc is running for 1, 2
result: 3

Now the return value has been cached, and if you rerun this script, the output will be

result: 3

The execution of calc is skipped, but the result value is retrieved from the disk and returned as expected.

However, if the function call context has changed, the function will be re-executed and return the new value. For example,

  • if it is passed with different arguments, e.g. calc(1, 3), calc would rerun and return 4
  • if the code logic has changed, e.g. return a - b, calc would rerun and return -1

The checkpoint has a built-in wise strategy to decide when it needs or doesn't need to re-execute the function. More details are discussed in Behavior on Code Change. This is also the main advantage of checkpointing compared to other similar packages, see Comparison with similar packages.

!!! attention However, there are some cases where the checkpoint cannot correctly make the rerun decision. Please read through the Caveats page and avoid those patterns.

Although the package focuses on persisting the cache across different executions, it also works if you call the same function multiple times within one execution.

Use cases

The built-in checkpoint is designed for projects that

  • runs in a local development environment
  • involves repeatedly executing long-running pure functions[1] on the same set of arguments
  • are somewhat "experimental", so it involves a lot of code changes back and forth

For example, such use cases are very common in the preliminary stage of machine learning projects.

Installation

This package is available on PyPI, and can be installed with pip.

$ pip install checkpointing

Basic usage

Create a checkpoint

Import the checkpoint from this package and use it as the decorator of a function (notice the () after checkpoint)

from checkpointing import checkpoint

@checkpoint()
def foo():
    return 0

After that, foo will be automatically cached, skipped, or re-executed as described previously. You can call foo in the same way as you normally would.

Configure the checkpoint

Cache directory

By default, the results are saved as pickle files in ./.checkpointing/, if you want to store them elsewhere, you can do

@checkpoint(directory="other_dir")

Behavior on internal error

During the execution, there could be unexpected errors within the checkpoint. When this happens, the default behavior is to give you a warning, and just rerun the function without the caching stuff. This ensures that your code won't fail because of using this package. However, you can change this behavior with the on_error option.

@checkpoint(on_error="raise")

This will terminate the function call and raise the internal error.

@checkpoint(on_error="ignore")

This will rerun the function when an internal error occurs without raising any warning.

Pickle protocol

The function return value will be saved with the built-in pickle module. We use protocol 5 by default for all Python versions, in favor of its ability to efficiently handle large data. However, if you want to change the protocol, you could use the cache_pickle_protocol option.

import pickle

@checkpoint(cache_pickle_protocol=pickle.DEFAULT_PROTOCOL)

Global setting

By modifying a global dictionary, you can change the configurations for all checkpoints.

from checkpointing import defaults
import pickle

defaults["cache.filesystem.directory"] = "other_dir"
defaults["checkpoint.on_error"] = "ignore"
defaults["cache.pickle_protocol"] = pickle.DEFAULT_PROTOCOL

Please set this at the top level of your module/script, before you create any checkpoint.

Further customization

If you want more flexibility, such as storing the cache not as a pickle file, or ignore/consider some additional aspects of the function call context, please see Extending the Checkpoint for details.

Force rerun a checkpoint

You can force rerun a checkpointed function with

foo.rerun(arg)

where foo is the decorated function. This would be equivalent to directly invoking foo(arg). The return value of this rerun will be cached to the disk and overwrite the previous one, if it exists.

This is useful if some factors that would affect the function return value have changed, but checkpoint failed to capture this difference, as described in the Caveats.

Usage notes

Please be aware that

  • Since the function will be skipped if it was cached before, user shouldn't mutate an argument in the function body (as required by the definition of pure function)
  • If the project involves randomness, it's the user's responsibility to set the random seed or random state, such that the arguments and reference global variables of the cached function are identical
  • The built-in strategy to determine if a function needs to be re-executed is imperfect. Please see Caveats, and avoid those cases when the rerun condition cannot be correctly determined.

Footnotes

[1]: We take the alternative definition of the "pure function", meaning that it only has property 2: "the function has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or input/output streams)". We do allow the return value to vary due to changes in non-local variables and other factors, as it's often the case in project development.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

checkpointing-1.0.0.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

checkpointing-1.0.0-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

File details

Details for the file checkpointing-1.0.0.tar.gz.

File metadata

  • Download URL: checkpointing-1.0.0.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for checkpointing-1.0.0.tar.gz
Algorithm Hash digest
SHA256 23d1c29607106625e9c9ba349c80977c65cdbdb8e8cafd6229c27159866c2a55
MD5 bc5c1847966dbb4949ad193ec9e32e4c
BLAKE2b-256 1ee8abe3e5ed6dbf1fbaeb2e940a03211c304f937bc1a1710c6f2143ee24f4ce

See more details on using hashes here.

File details

Details for the file checkpointing-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for checkpointing-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 64399866fffe4f3e7d510d8b73c39c0545fa1907d6af316dadd9358fd8e1fc1c
MD5 bcea1f99d8797f4ae78a42b06643061a
BLAKE2b-256 ffaf744b00c1a45546d3e6b22bf07189f1a1c1cdecb6e076099bb1ce407f0a30

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page