Various caching utilities

Project description

Filecache

Utilities for caching various things on file. Currently main two cachers are a FunctionCacher similar to the python standard library functools.lru_cache, and a file hashing utility FileCacher which can be used to hash files in a directory nestedly.

Examples might contain some examples of interest.

FunctionCacher

FunctionCacher allows caching the invocations of functions, with subsequent invocations being loaded from the in-memory cache. The cache can also be saved to and loaded from file, allowing separate runs of a program to reuse the calculated values. The main aim with this solution is to save even more time over the likes of the cache utilities in the standard library.

As a simple example: (copied from examples)

function_cacher = FunctionCacher(
    save_path = Path() / "caches" / __name__ / "cache",
    cache_size = 3,
    auto_save = True
)

@function_cacher()
def dummy_function(value = 0):
    time.sleep(0.5)
    return f"You passed {value=}"

The save path -- where the cache is saved to -- is set, cache_size sets a limit to how many different invocations of each function can be saved, and auto_save saves the in-memory cache to file after each invocation if the invocation is a never before seen one. The cacher is then used to wrap the wanted function, and the function can be called after this. Multiple functions can be wrapped with the same cacher which will keep track of all their invocations. By default, loading in a cache happens automatically when a new instance is created and the passed save path contains a viable cache. The cacher also allows setting a validity period for the cached data, which allows getting rid of very old, unused invocations more easily to limit how large the cache becomes.

Behind the scenes, FunctionCacher uses the shelve module to save and load the cached data. This allows working with a variety of types without having to explicitly define (de)serialisation. Values gotten from the cache are deepcopied such that the returned value can be modified without modifying the value in the cache.

Determining different invocations

In order for FunctionCacher to know when a new function is invoked, it needs to know two things: the state of the function it is wrapping and the passed-in arguments.

If a function's body changes (e.g. above, dummy_function prints "executing" on its first line), the function's operation might have changed and might now return something new. To keep track of changes to the body, the body's state is hashed with the help of inspect and hashlib. The cached invocations are matched to this hash.

When the passed in arguments change, the function may return something new. Therefore, the inputs are also tracked, again using inspect, and cached along with the output from the function. While shelve allows saving and loading a wide variety of Python data types, the solution here requires also that the input types are comparable, which is not always the case by default. One example is using pandas dataframes, which don't allow direct equality comparisons based on content alone:

>>> df = pd.DataFrame(dict(values = range(5)))
>>> bool(df == df)
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> df2 = df.copy()
>>> _dict = dict(df = df)
>>> _dict2 = dict(df = df2)
>>> _dict == _dict2
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

To solve this issue, it is possible to pass comparison functions to the wrapper. These functions are used to compare individual objects in the input argument dictionary. An example (copied from examples):

def compare_df(one, two):
    if all_instance_of(pd.DataFrame, one, two):

        if not (
            (len(one.index) == len(two.index))
            and (len(one.columns) == len(two.columns))
        ):
            return False
        
        return bool((one == two).all().all())

@function_cacher(compare_funcs = [compare_df])
def add_one(df: pd.DataFrame):
    time.sleep(1)
    called["add_one"] += 1
    return df + 1

compare_df obviously compares two dataframes, returning True if they are the same (under the comparison). If the passed objects are deemed uncomparable by the function (it returns None), the next comparison function is called on the objects, or the basic equality comparison is defaulted to if no comparison functions remain.

FileCacher

Hashes the contents of files at given paths, allowing caching the hashes in a JSON format. Useful for e.g. checking if the contents of a data folder has changed and should be loaded in again. Overall fairly simple and also less developed than FunctionCacher. See the example.

Project details

Release history Release notifications | RSS feed

0.1.0.post1

Apr 29, 2025

This version

0.1.0

Apr 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filecacheutils-0.1.0.tar.gz (22.0 kB view details)

Uploaded Apr 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

filecacheutils-0.1.0-py3-none-any.whl (20.4 kB view details)

Uploaded Apr 28, 2025 Python 3

File details

Details for the file filecacheutils-0.1.0.tar.gz.

File metadata

Download URL: filecacheutils-0.1.0.tar.gz
Upload date: Apr 28, 2025
Size: 22.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for filecacheutils-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4ad02444fb132d79a85ed36c18c378a391debbe81bb4d666e8b783272a20cdb3`
MD5	`1e8eb9f21222fa697a91c319b04830b7`
BLAKE2b-256	`651dad682ad29b644ed226f1be3181a92f18f170c0fb24c65eac78fe292683a6`

See more details on using hashes here.

File details

Details for the file filecacheutils-0.1.0-py3-none-any.whl.

File metadata

Download URL: filecacheutils-0.1.0-py3-none-any.whl
Upload date: Apr 28, 2025
Size: 20.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for filecacheutils-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`09e08ad19731536b0c4cad7ea3b5c40ef0969514630102531cc4bbe29cf823b9`
MD5	`f77d253017720e0c270476dcdbb43b0b`
BLAKE2b-256	`1f879d0d5650c045896cf2ff0beb0c4525b8e0120e64066f9bccf160db50128f`

See more details on using hashes here.

filecacheutils 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Filecache

FunctionCacher

Determining different invocations

FileCacher

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes