Skip to main content

Various caching utilities

Project description

Filecache

Utilities for caching various things on file. Currently main two cachers are a FunctionCacher similar to the python standard library functools.lru_cache, and a file hashing utility FileCacher which can be used to hash files in a directory nestedly.

Examples might contain some examples of interest.

FunctionCacher

FunctionCacher allows caching the invocations of functions, with subsequent invocations being loaded from the in-memory cache. The cache can also be saved to and loaded from file, allowing separate runs of a program to reuse the calculated values. The main aim with this solution is to save even more time over the likes of the cache utilities in the standard library.

As a simple example: (copied from examples)

function_cacher = FunctionCacher(
    save_path = Path() / "caches" / __name__ / "cache",
    cache_size = 3,
    auto_save = True
)

@function_cacher()
def dummy_function(value = 0):
    time.sleep(0.5)
    return f"You passed {value=}"

The save path -- where the cache is saved to -- is set, cache_size sets a limit to how many different invocations of each function can be saved, and auto_save saves the in-memory cache to file after each invocation if the invocation is a never before seen one. The cacher is then used to wrap the wanted function, and the function can be called after this. Multiple functions can be wrapped with the same cacher which will keep track of all their invocations. By default, loading in a cache happens automatically when a new instance is created and the passed save path contains a viable cache. The cacher also allows setting a validity period for the cached data, which allows getting rid of very old, unused invocations more easily to limit how large the cache becomes.

Behind the scenes, FunctionCacher uses the shelve module to save and load the cached data. This allows working with a variety of types without having to explicitly define (de)serialisation. Values gotten from the cache are deepcopied such that the returned value can be modified without modifying the value in the cache.

Determining different invocations

In order for FunctionCacher to know when a new function is invoked, it needs to know two things: the state of the function it is wrapping and the passed-in arguments.

If a function's body changes (e.g. above, dummy_function prints "executing" on its first line), the function's operation might have changed and might now return something new. To keep track of changes to the body, the body's state is hashed with the help of inspect and hashlib. The cached invocations are matched to this hash.

When the passed in arguments change, the function may return something new. Therefore, the inputs are also tracked, again using inspect, and cached along with the output from the function. While shelve allows saving and loading a wide variety of Python data types, the solution here requires also that the input types are comparable, which is not always the case by default. One example is using pandas dataframes, which don't allow direct equality comparisons based on content alone:

>>> df = pd.DataFrame(dict(values = range(5)))
>>> bool(df == df)
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> df2 = df.copy()
>>> _dict = dict(df = df)
>>> _dict2 = dict(df = df2)
>>> _dict == _dict2
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

To solve this issue, it is possible to pass comparison functions to the wrapper. These functions are used to compare individual objects in the input argument dictionary. An example (copied from examples):

def compare_df(one, two):
    if all_instance_of(pd.DataFrame, one, two):

        if not (
            (len(one.index) == len(two.index))
            and (len(one.columns) == len(two.columns))
        ):
            return False
        
        return bool((one == two).all().all())

@function_cacher(compare_funcs = [compare_df])
def add_one(df: pd.DataFrame):
    time.sleep(1)
    called["add_one"] += 1
    return df + 1

compare_df obviously compares two dataframes, returning True if they are the same (under the comparison). If the passed objects are deemed uncomparable by the function (it returns None), the next comparison function is called on the objects, or the basic equality comparison is defaulted to if no comparison functions remain.

FileCacher

Hashes the contents of files at given paths, allowing caching the hashes in a JSON format. Useful for e.g. checking if the contents of a data folder has changed and should be loaded in again. Overall fairly simple and also less developed than FunctionCacher. See the example.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filecacheutils-0.1.0.post1.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

filecacheutils-0.1.0.post1-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file filecacheutils-0.1.0.post1.tar.gz.

File metadata

  • Download URL: filecacheutils-0.1.0.post1.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for filecacheutils-0.1.0.post1.tar.gz
Algorithm Hash digest
SHA256 7da2398722383c15b8aa3bcc3d1746895a861e125133a92f33accc90c1727492
MD5 9a161d4a066949b4b909f5383706bbbf
BLAKE2b-256 e2113cb695f4e2c2e65df2832901161125845b0c821ddd45f1944bbd0af57db5

See more details on using hashes here.

Provenance

The following attestation bundles were made for filecacheutils-0.1.0.post1.tar.gz:

Publisher: publish.yml on aaltop/filecache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file filecacheutils-0.1.0.post1-py3-none-any.whl.

File metadata

File hashes

Hashes for filecacheutils-0.1.0.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 c9e2fa148fd5504521a1c1aa860cc95d0cd77ca12f69f24b563e998fb06e6b33
MD5 2153f14df28c1ae0ecc470678835a359
BLAKE2b-256 174bcc50b6deca3439416fe92c528f44cbf46554f11c97136f74fb32f20faac7

See more details on using hashes here.

Provenance

The following attestation bundles were made for filecacheutils-0.1.0.post1-py3-none-any.whl:

Publisher: publish.yml on aaltop/filecache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page