Memory mapped of datasets
Project description
Memmpy
Memmpy is a Python library for working with memory mapped numpy arrays. It supports
- Saving to temporary or permanent memory mapped files
- Appending and extending arrays in constant time
- Loading arrays in batches
- Train-validation-test-splits and k-fold cross validation
- Fast & lazy shuffling by shuffling in blocks and bands
Who should use Memmpy?
Memmpy is primarly intended for use in data science and machine learning applications where the dataset is too large to fit into memory at once. Memmpy is not intended for use in small applications where the entire dataset fits into memory and can be loaded at once. It is also not intended for use in very large applications where training is massively distributed.
Installation
Memmpy can be installed directly from PyPI using pip
. It requires Python 3.10 or higher.
pip install memmpy
Usage
Writing to a memory mapped file.
# Create a memory mapped array
file = memmpy.Vector()
for i in range(4):
file.append(np.random.rand(100)) # O(1)
file.extend(np.random.rand(32, 100))
# access the array
assert file.array.shape == (4 + 32, 100)
# save to non-temporary file
file.save("data.npy")
Loading random batches from a memory mapped file.
array = np.memmap(data.npy, dtype=np.float64, mode='r', shape=(36, 100))
# Load the array in batches
batch_indicies = memmpy.batch_indicies_split(
array.shape[0],
4,
"train",
valid_part=10, # size of the validation and train set
kfold_index=2, # take the second out of 10 folds
)
shuffle = memmpy.shuffle_fast(array.shape[0], seed=42) # O(1)
for indicies in batch_indicies:
indicies = shuffle(indicies)
batch = array[indicies]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file memmpy-0.1.10.tar.gz
.
File metadata
- Download URL: memmpy-0.1.10.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 650e0288e8107e53315d29cb65746ce59368c690aa74859614c381a42cc5c834 |
|
MD5 | 458b7e850e7185c206c25ffc1a741e37 |
|
BLAKE2b-256 | 5dd61784a402c2ee6f63ae5aa92955bfcf7c788207215f39a331c9b3c3e38fd7 |
File details
Details for the file memmpy-0.1.10-py3-none-any.whl
.
File metadata
- Download URL: memmpy-0.1.10-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f4abf19d5811c0f388c6c0575e7b8b5a6c74a1ea3fc44def36beec1b4c47db0 |
|
MD5 | f37bba0815b95940714a71d774636b25 |
|
BLAKE2b-256 | 92f83a0ad01a6a90de597645c4535a54982cff0165811f6321d1582078f176f7 |