Skip to main content

Python data cache decorator

Project description

PyPI version Python Versions License: MIT

Data cache

Works by hashing the combinations of arguments of a function call with the function name to create a unique id of a table retrieval. If the function call is new the original function will be called, and the resulting tables(s) will be stored in a HDFStore indexed by the hashed key. Next time the function is called with the same args the tables(s) will be retrieved from the store instead of executing the function.

The hashing of the arguments is done by first applying str() on the argument, and then taking th md5 hash of the combination of these args together with the function name. This means that if a argument for some reason does not have a str representation the key generation will fail. To omit this issue one can specify which arguments the cache should consider such that 'un-stringable' arguments are skipped. This functionality is also used for skipping arguments the should by design not be considered for the key-generation like for example database-clients.

Setting cache file location

The module automatically creates a cache/data.h5 relative to __main__, to change this set the environment variable CACHE_PATH to be the desired directory of the data.h5 file.

Disabling the cache with env-variable

To disable the cache set the environment variable DISABLE_CACHE to TRUE.

Usage

Decorating functions

from data_cache import pandas_cache
from time import sleep
from datetime import datetime
import pandas as pd

@pandas_cache
def simple_func():
    sleep(5)
    return pd.DataFrame([[1,2,3], [2,3,4]])


t0 = datetime.now()
print(simple_func())
print(datetime.now() - t0)

t0 = datetime.now()
print(simple_func())
print(datetime.now() - t0)
   0  1  2
0  1  2  3
1  2  3  4
0:00:05.343027
   0  1  2
0  1  2  3
1  2  3  4
0:00:00.015987

Decorating class methods

The decorator ignores arguments named 'self' such that it will work across different instances of the same object.

from data_cache import pandas_cache
from time import sleep
from datetime import datetime
import pandas as pd


class PandasClass:
    def __init__(self):
        print(self)

    @pandas_cache
    def simple_func(self):
        sleep(5)
        return pd.DataFrame([[1,2,3], [2,3,4]])

c = PandasClass()
t0 = datetime.now()
print(c.simple_func())
print(datetime.now() - t0)

c = PandasClass()
t0 = datetime.now()
print(c.simple_func())
print(datetime.now() - t0)
<__main__.PandasClass object at 0x003451F0>
   0  1  2
0  1  2  3
1  2  3  4
0:00:05.375342
<__main__.PandasClass object at 0x124814B0>
   0  1  2
0  1  2  3
1  2  3  4
0:00:00.014959

Selecting arguments

from data_cache import pandas_cache
from time import sleep
from datetime import datetime
import pandas as pd

@pandas_cache("a", "c")
def simple_func(a, b, c=True):
    sleep(5)
    return pd.DataFrame([[1,2,3], [2,3,4]])


t0 = datetime.now()
print(simple_func(a=1, b=2))
print(datetime.now() - t0)

# b is not considered
t0 = datetime.now()
print(simple_func(a=1, b=3))
print(datetime.now() - t0)
   0  1  2
0  1  2  3
1  2  3  4
0:00:05.619620
   0  1  2
0  1  2  3
1  2  3  4
0:00:00.017980

Multi-DataFrame returns

from data_cache import pandas_cache
from time import sleep
from datetime import datetime
import pandas as pd


@pandas_cache("a", "c")
def simple_func(a, *args, **kwargs):
    sleep(5)
    return pd.DataFrame([[1,2,3], [2,3,4]]), pd.DataFrame([[1,2,3], [2,3,4]]) * 10


t0 = datetime.now()
print(simple_func(1, b=2, c=True))
print(datetime.now() - t0)

t0 = datetime.now()
print(simple_func(a=1, b=3, c=True))
print(datetime.now() - t0)
(   0  1  2
0  1  2  3
1  2  3  4,     0   1   2
0  10  20  30
1  20  30  40)
0:00:05.368545
(   0  1  2
0  1  2  3
1  2  3  4,     0   1   2
0  10  20  30
1  20  30  40)
0:00:00.019578

Disabling cache for tests

Caching can be disabled using the environment variable DISABLE_CACHE to TRUE

from mock import patch
def test_cached_function():
    with patch.dict("os.environ", {"DISABLE_CACHE": "TRUE"}, clear=True):
        assert cached_function() == target

Numpy caching

from data_cache import numpy_cache
from time import sleep
from datetime import datetime
import numpy as np


@numpy_cache("a", "c")
def simple_func(a, *args, **kwargs):
    sleep(5)
    return np.array([[1, 2, 3], [2, 3, 4]]), np.array([[1, 2, 3], [2, 3, 4]]) * 10


t0 = datetime.now()
print(simple_func(1, b=2, c=True))
print(datetime.now() - t0)

t0 = datetime.now()
print(simple_func(a=1, b=3, c=True))
print(datetime.now() - t0)
(array([[1, 2, 3],
       [2, 3, 4]]), array([[10, 20, 30],
       [20, 30, 40]]))
0:00:05.009084
(array([[1, 2, 3],
       [2, 3, 4]]), array([[10, 20, 30],
       [20, 30, 40]]))
0:00:00.002000

Metadata

Metadata is automatically stored with the data on the group node containing the DataFrame/Array.

from data_cache import numpy_cache, pandas_cache, read_metadata
import pandas as pd
import numpy as np
from datetime import datetime


@pandas_cache
def function1(a, *args, b=1, **kwargs):
    return pd.DataFrame()

@numpy_cache
def function2(a, *args, b=1, **kwargs):
    return np.array([])

function1(1, True, datetime.date(2019, 11, 11))
function2(2, False, b=2, c=1.1)
read_metadata("path_to_data.h5")

results:

{
    "/a86f0a323bf20998b5deda81e9f90bb49/a5d320e5dcdc5d3f35a4ca366980b2dc1": {
        "a": "1",
        "arglist": "(True, datetime.date(2019, 11, 11))",
        "b": "1",
        "date_stored": "01/05/2020, 10:00:00",
        "function_name": "function1",
        "module_path": "path_to_module"
    },
    "/a56ad8af46bc5fd8b9320b00b12e6c115/a62734531fc99855292c9db04d5eba60a": {
        "a": "2",
        "arglist": "(False,)",
        "b": "2",
        "c": "1.1",
        "date_stored": "01/05/2020, 10:00:00",
        "function_name": "function2",
        "module_path":  "path_to_module"
    }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_cache-0.1.8.tar.gz (111.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_cache-0.1.8-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file data_cache-0.1.8.tar.gz.

File metadata

  • Download URL: data_cache-0.1.8.tar.gz
  • Upload date:
  • Size: 111.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for data_cache-0.1.8.tar.gz
Algorithm Hash digest
SHA256 27a25b0fe505444263354cdfbee3352e456378f7ac35ddd8b31313923471b665
MD5 9d4924a9c4cc37d9c8eaa4c9a0f4729d
BLAKE2b-256 cf146ea9a3c7f2dd74f64052292cd6d918d11f66df33560ef9be27a75b6af616

See more details on using hashes here.

File details

Details for the file data_cache-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: data_cache-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for data_cache-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 f7fed46357a812fc6afd467d227b09bd3ad342484ed1b2172c6440d6740bd1f4
MD5 d5c08896208060624e126265cd4edb44
BLAKE2b-256 ea19e91bb793f78144ee3fed3a7992127ccaec14e81b54fa2608108661b5c21a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page