Skip to main content

Memory mapped of datasets

Project description

Memmpy

Memmpy is a Python library for working with memory mapped numpy arrays. It supports

  • Saving to temporary or permanent memory mapped files
  • Appending and extending arrays in constant time
  • Loading arrays in batches
  • Train-validation-test-splits and k-fold cross validation
  • Fast & lazy shuffling by shuffling in blocks and bands

Who should use Memmpy?

Memmpy is primarly intended for use in data science and machine learning applications where the dataset is too large to fit into memory at once. Memmpy is not intended for use in small applications where the entire dataset fits into memory and can be loaded at once. It is also not intended for use in very large applications where training is massively distributed.

Installation

Memmpy can be installed directly from PyPI using pip. It requires Python 3.10 or higher.

pip install memmpy

Usage

Writing to a memory mapped file.

# Create a memory mapped array
file = memmpy.Vector()

for i in range(4):
    file.append(np.random.rand(100))  # O(1)


file.extend(np.random.rand(32, 100))

# access the array
assert file.array.shape == (4 + 32, 100)

# save to non-temporary file
file.save("data.npy")

Loading random batches from a memory mapped file.

array = np.memmap(data.npy, dtype=np.float64, mode='r', shape=(36, 100))

# Load the array in batches
batch_indicies = memmpy.batch_indicies_split(
    array.shape[0],
    4,
    "train",
    valid_part=10,  # size of the validation and train set
    kfold_index=2,  # take the second out of 10 folds
)
shuffle = memmpy.shuffle_fast(array.shape[0], seed=42)  # O(1)

for indicies in batch_indicies:
    indicies = shuffle(indicies)
    batch = array[indicies]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memmpy-0.1.11.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

memmpy-0.1.11-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file memmpy-0.1.11.tar.gz.

File metadata

  • Download URL: memmpy-0.1.11.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for memmpy-0.1.11.tar.gz
Algorithm Hash digest
SHA256 a5ccb72d993d146536e1622933b364e5a635252d9105225c35f45aae29d872b3
MD5 2e96f2c8f894408c742befd1cf246896
BLAKE2b-256 04ada3d90be3b1f2db093f9876d581f4873a1d2c6f723709192c3365d0a3b6c6

See more details on using hashes here.

File details

Details for the file memmpy-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: memmpy-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for memmpy-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 5e3ff04628133befb0afebdb0cc0b973f252a32d1cddd6e71cbed809e40eed05
MD5 5a939970c042231a3f8d571c7271faf6
BLAKE2b-256 500f15eb22792f1c34449b05e2291bb6aebdcbbecaa84c5d8c32f83aaaffa78c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page