Simple on-disk dictionary
A dictionary that spills to disk.
Chest acts likes a dictionary but it can write its contents to disk. This is useful in the following two occasions:
- Chest can hold datasets that are larger than memory
- Chest persists and so can be saved and loaded for later use
How it works
Chest stores data in two locations
- An in-memory dictionary
- On the filesystem in a directory owned by the chest
As a user adds contents to the chest the in-memory dictionary fills up. When a chest stores more data in memory than desired (see available_memory= keyword argument) it writes the larger contents of the chest to disk as pickle files (the choice of pickle is configurable). When a user asks for a value chest checks the in-memory store, then checks on-disk and loads the value into memory if necessary, pushing other values to disk.
Chest is a simple project. It was intended to provide a simple interface to assist in the storage and retrieval of numpy arrays. However it’s design and implementation are agnostic to this case and so could be used in a variety of other situations.
With minimal work chest could be extended to serve as a communication point between multiple processes.
Chest was designed to hold a moderate amount of largish numpy arrays. It doesn’t handle the very many small key-value pairs usecase (though could with small effort). In particular chest has the following deficiencies
- Chest is not multi-process safe. We should institute a file lock at least around the .keys file.
- Chest does not support mutation of variables on disk.
New BSD. See License
chest is available through conda:
conda install chest
chest is on the Python Package Index (PyPI):
pip install chest
>>> from chest import Chest >>> c = Chest() >>> # Acts like a normal dictionary >>> c['x'] = [1, 2, 3] >>> c['x'] [1, 2, 3] >>> # Data persists to local files >>> c.flush() >>> import os >>> os.listdir(c.path) ['.keys', 'x'] >>> # These files hold pickled results >>> import pickle >>> pickle.load(open(c.key_to_filename('x'))) [1, 2, 3] >>> # Though one normally accesses these files with chest itself >>> c2 = Chest(path=c.path) >>> c2.keys() ['x'] >>> c2['x'] [1, 2, 3] >>> # Chest is configurable, so one can use json, etc. instead of pickle >>> import json >>> c = Chest(path='my-chest', dump=json.dump, load=json.load) >>> c['x'] = [1, 2, 3] >>> c.flush() >>> json.load(open(c.key_to_filename('x'))) [1, 2, 3]
Chest supports Python 2.6+ and Python 3.2+ with a common codebase.
It currently depends on the heapdict library.
It is a light weight dependency.