Just a simple system to manage a set of metrics (string name / function / returned data) that supports caching (memory and disk)
Project description
Simple Metrics Manager facilitates managing metrics.
A "metric" as defined here consists of:
- A string name (and an associated python constant)
- A function that takes no arguments and returns some data
- The data returned from the metric function
StorageInterfaces
A StorageInterface
is a class that supports storing metric data using some
persistent backend. The current default is the fairly generic '.npz' format
used to store numpy arrays and simple python objects NpzStorageInterface
.
To prevent data corruption, it always saves data to a temp file and then moves it into the real file (potentially replacing any older copies).
It has a simple save/load/exists API.
Pre-defined StorageInterface classes:
JsonStorageInterface
uses json.load
and json.dump
and
automatically coerces numpy arrays to lists.
NpyStorageInterface
just uses np.save
and np.load
directly.
NpzStorageInterface
uses numpy's np.load
and (hidden) _savez
method.
It and does some simple checks to allow saving arrays as well as lists
of arrays and dictionaries of arrays.
This is very fast, efficient, reliable, and gives quite a bit of flexibility.
The option to allow_pickle=False
means that saving object arrays will fail,
so change it to True
if you need it, but pickle is considered volatile,
so changes to python or relevant package versions might mean your data
is unreadable. This is super convenient though, so it hasn't stopped big
packages like scikit-learn from relying on it.
You could also easily add a pure pickle-based storage interface yourself
with pickle.load
and pickle.dump
, but I don't include it here for
the reasons listed above :)
Cache Managers
A DatedCacheManager
uses a StorageInterface
and supports the following:
- Automatic caching to both memory and persistent storage,
- Automatic cache utilization, falling back on memory cache and then
persistent cache (override with
force=False
) - Dating of all metrics using "side-car" metrics (*_date)
- Printing of all major actions (disable with
verbose=False
)
It has the following API:
set_functions_dict
- Set the core data for the manager, a dictionary that actually defines the metrics.
Keys are metric names and values are metric functions (no arguments).
By calling this after
\__init__
, the manager itself can be used within metric functions. See "Usage" below.
- Set the core data for the manager, a dictionary that actually defines the metrics.
Keys are metric names and values are metric functions (no arguments).
By calling this after
exists
- Boolean, whether the metric is in the memory cache.
clear_cache
- Remove all metrics from the cache.
compute
- Call a metric function. Caches and returns the data.
save
- Compute a metric and save the result with the StorageInterface. Caches to memory and disk and returns the data.
load
- Load a metric (assumed to exist) Tries the cache, then the StorageInterface, fails otherwise. Returns the data
get
- Call save and then load.
Caches to disk and the
StorageInterface
. Returns the cached data. Overloaded with[]
(aka__getitem__
).
- Call save and then load.
Caches to disk and the
In addition, ParameterizedDatedCacheManager
adds a powerful decorator
called collect
that collects metrics automatically.
Basic usage:
CM = DataCacheManager(NpyStorageInterface(SOME_DIRECTORY))
@CM.collect
def metric_1():
stuff = do_something()
return stuff
@CM.collect
def metric_2():
stuff = do_something_else(CM.metric_1()) # <- use of the cached version here
other_stuff = convert(stuff)
return other_stuff
Now calls to CM.metric_1()
and CM.metric_2()
seamlessly use the
cache (both in-memory and on disk).
This allows for complex dependencies to be handled automatically and efficiently.
If you prefer, metric_1(cache=True)
is equivalent to CM.metric_1()
.
Both will invoke CM['metric_1']
which in turn will invoke the
"undecorated" metric_1()
as it appears above when it does not exist.
Passing use_stored=False
on any of these will invalidate the
cache and overwrite it with the new value.
The original function (called without cache=True
) will be unchanged.
Example: metric_1()
will intentionally skip the cache and just run
like normal.
Parameterized usage:
CM = ParameterizedDataCacheManager(NpyStorageInterface(SOME_DIRECTORY))
@CM.collect(params_list=[[1], [4]])
def poly_X(a):
return do_something_hard(a)
@CM.collect(params_list=[1, 2], [1, 3], [4, 5])
def poly_Y(a, b):
return CM.poly_X(a) ** 2 + do_something_else(b)
CM.poly_Y(1, 2)
will run poly_Y(1, 2)
which will run CM.poly_X(1)
which will run poly_X(1)
Then another call to CM.poly_Y(1, 2)
will load CM['poly_Y_1_2']
and return it
A call to poly_Y(1, 3)
will run poly_Y(1, 3)
which will run CM.poly_X(1)
which will load CM['poly_X_1]
The point is that if poly_X
or poly_Y
take a long time, subsequent calls
will be fast.
Pre-running everything can be done like so:
CM.poly_X.get_all()
CM.poly_Y.get_all()
which is handy, for instance, if you need to run a large set of computations overnight.
The above will fail if you pass undefinied parameters (which is good if you want to ensure you have pre-cached everything you might invoke).
If you really want to call the functions in the cache manager with any parameters, just add this option to the code above:
CM = ParameterizedDataCacheManager(NpyStorageInterface(SOME_DIRECTORY),
dynamic_metric_creation=True)
With this change, things like CM.poly_Y(12, 13)
will also work.
Finally, to reiterate, the original functions poly_X
and poly_Y
will
still work with any parameters regardless of this setting because they
are unchanged unless you specify cache=True
.
Manual usage of DatedCacheManager:
METRIC_1 = 'metric_1'
METRIC_2 = 'metric_2'
CM = DataCacheManager(NpyStorageInterface(SOME_DIRECTORY))
def metric_1_function():
stuff = do_something()
return stuff
def metric_2_function():
stuff = do_something_else(CM[METRIC_1])
other_stuff = convert(stuff)
return other_stuff
FUNCTIONS_DICT = {
METRIC_1: metric_1_function,
METRIC_2: metric_2_function,
...
}
CM.set_functions_dict(FUNCTIONS_DICT)
Then in some other place invoke: CM[METRIC_1]
or CM[METRIC_2]
This has the same effect as the "Basic usage" example above but with a lot more boilerplate. You can still use this if you despise decorator magic though ;)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file simple_metrics_manager-0.5.0.tar.gz
.
File metadata
- Download URL: simple_metrics_manager-0.5.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/51.3.3 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23b8c4718892a584f77f485db0f51fad5a537a044778a4f85a935164f88014aa |
|
MD5 | 61caba306042fcdd31a016b5444f7d4d |
|
BLAKE2b-256 | b41fc6605b91abb4a2774b285e7ed1c42a1509795658656c28302cc58abb3b19 |