Persist expensive operations on disk.
Project description
Installation
pip install .
By default, a folder called .persist_to_disk
is created under your home directory, and will be used to store cache files.
If you want to change it, see "Global Settings" below.
Global Settings
To set global settings (for example, where the cache should go by default), please do the following:
import persist_to_disk as ptd
ptd.config.generate_config()
Then, you could change the settings there:
persist_path
: where to store the cache. All projects you have on this machine will have a folder underpersist_path
by default, unless you specify it within the project (See examples below).hashsize
: How many hash buckets to use to store each function's outputs. Default=500.lock_granularity
: How granular the lock is. This could becall
,func
orglobal
.call
means each hash bucket will have one lock, so only only processes trying to write/read to/from the same hash bucket will share the same lock.func
means each function will have one lock, so if you have many processes calling the same function they will all be using the same lock.global
all processes share the same lock (I tested that it's OK to have nested mechanism on Unix).
Example
Using persist_to_disk
is very easy.
@ptd.persistf()
def train_a_model(dataset, model_cls, lr, epochs):
...
return trained_model_or_key
Note that ptd.persistf
can be used with multiprocessing directly.
If target function (e.g. train_a_model
) is not gonna be pickled by such pipelines, you could use persist
:
@ptd.persist()
def _train_a_model(dataset, model_cls, lr, epochs):
...
return trained_model_or_key
def train_a_model(*args, **kwargs):
trained_model_or_key = _train_a_model(*args, **kwargs)
... # Do more stuff
return trained_model_or_key
persist
and persistf
take the same arguments.
For example, if you want to group the cache folder by dataset (so you can manage them easier manually), and your function takes some dictionary as input (which is not hashable), you could do:
@ptd.persistf(groupby=['dataset'], expand_dict_kwargs=['model_kwargs'])
def train_a_model(dataset, model_cls, model_kwargs, lr, epochs):
...
Project-specific persist_path
You could specify the place to save cache on the fly by:
import persist_to_disk as ptd
ptd.config.set_persist_path(YOUR_PATH)
Note that you can also set_hashsize
.
Project-level settings will overwrite the global settings.
Function-level settings (e.g. hashsize
) will further overwrite project-level settings.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for persist_to_disk-0.0.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 510638139c97b935997604d0618c70d2f8e1b036b7d9fbc1e824ed263a8c587f |
|
MD5 | 37e2dded1ecfe555358feefde554f7a4 |
|
BLAKE2b-256 | 0a2b1320fe70bdd4c30967d7e07504e64d0b59900f4db8fb8dd0f47438c1e0f1 |