Memory mapped of datasets
Project description
Memmpy
Memmpy is a Python library for storing datasets in, and loading datasets from, memory mapped files. This is particularly useful for large datasets that do not fit in memory and therefore need to be processed in batches. Memmpy is based on the numpy.memmap
implementation.
Who should use Memmpy?
Memmpy is primarily intended for use in medium to large scale machine learning applications in high energy particle physics, where the whole dataset would not fit into memory at once and iterating over the ROOT files is too slow. This could be because shuffling of datapoints is desired, or because only a fraction of the information or events is needed for training.
Memmpy is not intended for use in small applications where the entire dataset fits into memory and can be loaded at once. It is also not intended for use in very large applications where training is massively distributed.
Installation
Memmpy can be installed directly from PyPI using pip
. It requires Python 3.10 or higher.
If you want to process .root
files, uproot
is required. This can also be installed using pip
.
pip install memmpy
Usage
A simple memory mapped file can be created as follows:
with WriteVector(path="data.mmpy", name="testdata") as memfile:
# Append a single numpy array.
# The shape and dtype will be inferred from the array.
memfile.append(np.array([1, 2, 3]))
# Append another numpy array of the same shape and dtype
memfile.append(np.array([4, 5, 6]))
# Extend the file by an array with an additional axis.
memfile.extend(np.array([[7, 8, 9], [10, 11, 12]]))
memmap_data = read_vector(path="data.mmpy", name="testdata")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file memmpy-0.1.9.tar.gz
.
File metadata
- Download URL: memmpy-0.1.9.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c68b8d41f447c3fd2732fc73e94664154fb62c7c6ba74a1abeb06fb9b5edbd8 |
|
MD5 | 5aa07b3f9203416d0fc1666645d416fb |
|
BLAKE2b-256 | f20ce56ee3b7b12ad916121a72854a1e1bd4b6b8430f8654a1c9345e17b72c64 |
File details
Details for the file memmpy-0.1.9-py3-none-any.whl
.
File metadata
- Download URL: memmpy-0.1.9-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 057160634206a76b8684f97e04538cb387aa9b3807a1149fa0f9424b968df38a |
|
MD5 | 1bc5223a50039ee370465172806edd19 |
|
BLAKE2b-256 | fe26be4c08c78e0b713f6588e36aa8aef7cb4adb65c86e4d6cde813e3b6b755c |