Skip to main content

Memory mapped of datasets

Project description

Memmpy

Memmpy is a Python library for working with memory mapped numpy arrays. It supports

  • Saving to temporary or permanent memory mapped files
  • Appending and extending arrays in constant time
  • Loading arrays in batches
  • Train-validation-test-splits and k-fold cross validation
  • Fast & lazy shuffling by shuffling in blocks and bands

Who should use Memmpy?

Memmpy is primarly intended for use in data science and machine learning applications where the dataset is too large to fit into memory at once. Memmpy is not intended for use in small applications where the entire dataset fits into memory and can be loaded at once. It is also not intended for use in very large applications where training is massively distributed.

Installation

Memmpy can be installed directly from PyPI using pip. It requires Python 3.10 or higher.

pip install memmpy

Usage

Writing to a memory mapped file.

# Create a memory mapped array
file = memmpy.Vector()

for i in range(4):
    file.append(np.random.rand(100))  # O(1)


file.extend(np.random.rand(32, 100))

# access the array
assert file.array.shape == (4 + 32, 100)

# save to non-temporary file
file.save("data.npy")

Loading random batches from a memory mapped file.

array = np.memmap(data.npy, dtype=np.float64, mode='r', shape=(36, 100))

# Load the array in batches
batch_indicies = memmpy.batch_indicies_split(
    array.shape[0],
    4,
    "train",
    valid_part=10,  # size of the validation and train set
    kfold_index=2,  # take the second out of 10 folds
)
shuffle = memmpy.shuffle_fast(array.shape[0], seed=42)  # O(1)

for indicies in batch_indicies:
    indicies = shuffle(indicies)
    batch = array[indicies]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memmpy-0.1.10.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

memmpy-0.1.10-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file memmpy-0.1.10.tar.gz.

File metadata

  • Download URL: memmpy-0.1.10.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for memmpy-0.1.10.tar.gz
Algorithm Hash digest
SHA256 650e0288e8107e53315d29cb65746ce59368c690aa74859614c381a42cc5c834
MD5 458b7e850e7185c206c25ffc1a741e37
BLAKE2b-256 5dd61784a402c2ee6f63ae5aa92955bfcf7c788207215f39a331c9b3c3e38fd7

See more details on using hashes here.

File details

Details for the file memmpy-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: memmpy-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for memmpy-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 7f4abf19d5811c0f388c6c0575e7b8b5a6c74a1ea3fc44def36beec1b4c47db0
MD5 f37bba0815b95940714a71d774636b25
BLAKE2b-256 92f83a0ad01a6a90de597645c4535a54982cff0165811f6321d1582078f176f7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page