Skip to main content

Persistent caching for Python functions

Project description

pkld

pkld (pronounced "pickled") caches function calls to your disk.

This saves you from repeating the same function calls every time you run your code. It's especially useful in data engineering or machine learning pipelines, where function calls are often expensive or time-consuming.

from pkld import pkld

@pkld
def foo(input):
    # Slow or expensive operations...
    return stuff

Features:

  • Uses pickle to store function outputs locally
  • Supports functions with mutable or un-hashable arguments (e.g. dicts, lists, numpy arrays)
  • Can also be used as an in-memory (i.e. transient) cache
  • Supports asynchronous functions
  • Thread-safe

Installation

> pip install pkld

Usage

To use, just add the @pkld decorator to your function:

from pkld import pkld

@pkld
def foo():
    return stuff

Then if you run the program, the function will be executed:

stuff = foo() # Takes a long time

And if you run it again:

stuff = foo() # Fast af

The function will not execute, and instead the output will be pulled from the cache.

Clearing the cache

Every pickled function has a clear method attached to it. You can use it to reset the cache:

foo.clear()

Disabling the cache

You can disable caching for a pickled function using the disabled parameter:

@pkld(disabled=True)
def foo():
    return stuff

This will execute the function as if it wasn't decorated, which is useful if you modify the function and need to invalidate the cache.

Changing cache location

By default, pickled function outputs are stored in the same directory as the files the functions are defined in. You'll find them in a folder called .pkljar.

codebase/
│
├── my_file.py # foo is defined in here
│
└── .pkljar/
    ├── foo_cd7648e2.pkl # foo w/ one set of args
    └── foo_95ad612b.pkl # foo w/ a different set of args

However, you can change this by setting the cache_dir parameter:

@pkld(cache_dir="~/my_cache_dir")
def foo():
    return stuff

You can also specify a cache directory for all pickled functions:

from pkld import set_cache_dir

set_cache_dir("~/my_cache_dir")

Using the memory cache

pkld caches results to disk by default. But you can also use it as an in-memory cache:

@pkld(store="memory")
def foo():
    return stuff # Output will be loaded/stored in memory

This is preferred if you only care about memoizing operations within a single run of your program, rather than across runs.

You can also enable both in-memory and on-disk caching by setting store="both". Loading from a memory cache is faster than a disk cache. So by using both, you can get the speed benefits of in-memory and the persistence benefits of on-disk.

Arguments

pkld(cache_fp=None, cache_dir=None, disabled=False, store="disk", verbose=False, branch_factor=0)

  • cache_fp: str: File where the cached results will be stored.
  • cache_dir: str: Directory where the cached results will be stored.
  • disabled: bool: If set to True, caching is disabled and the function will execute normally without storing or loading results.
  • store: "disk" | "memory" | "both": Determines the caching method. "disk" for on-disk caching, "memory" for in-memory caching, and "both" for using both methods.
  • verbose: bool: If set to True, enables logging of cache operations for debugging purposes.
  • branch_factor: int: # of subdirectories to group pickle files together in. Useful for functions that are called many times with many different parameters. If a cache directory has too many pickle files in it, you will see performance degradations.

Limitations

Not all functions can and should be pickled. The requirements are:

  1. Functions cannot have side-effects. This means they cannot mutate objects defined outside of the function (including its arguments).
  2. Functions cannot return an unpickleable object, e.g. a socket or database connection.
  3. Functions must be deterministic. Meaning they should always produce the same output given the same input.
  4. If you're passing an instance of a user-defined class as a function input, it must have a __hash__ method defined on it.

Authors

Created by Paul Bogdan and Jonathan Shobrook.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pkld-1.0.3.tar.gz (7.4 kB view details)

Uploaded Source

File details

Details for the file pkld-1.0.3.tar.gz.

File metadata

  • Download URL: pkld-1.0.3.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.19

File hashes

Hashes for pkld-1.0.3.tar.gz
Algorithm Hash digest
SHA256 9f779d21672384125723f9cbd32c40472d0acc8eae18c21dbd2cff1125e6bddd
MD5 ec250fe62ef599e0a00b9e8ed40100a4
BLAKE2b-256 8c928f2bcfc44aa46d9fd226405e0ce2c586fedb5dd995df22bc4fa2c8e4da97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page