Skip to main content

Memory mapped of datasets

Project description

Memmpy

Memmpy is a Python library for storing datasets in, and loading datasets from, memory mapped files. This is particularly useful for large datasets that do not fit in memory and therefore need to be processed in batches. Memmpy is based on the numpy.memmap implementation.

Who should use Memmpy?

Memmpy is primarily intended for use in medium to large scale machine learning applications in high energy particle physics, where the whole dataset would not fit into memory at once and iterating over the ROOT files is too slow. This could be because shuffling of datapoints is desired, or because only a fraction of the information or events is needed for training.

Memmpy is not intended for use in small applications where the entire dataset fits into memory and can be loaded at once. It is also not intended for use in very large applications where training is massively distributed.

Installation

Memmpy can be installed directly from PyPI using pip. It requires Python 3.10 or higher. If you want to process .root files, uproot is required. This can also be installed using pip.

pip install memmpy

Usage

A simple memory mapped file can be created as follows:

with WriteVector(path="data.mmpy", name="testdata") as memfile:
    # Append a single numpy array.
    # The shape and dtype will be inferred from the array.
    memfile.append(np.array([1, 2, 3]))
    
    # Append another numpy array of the same shape and dtype
    memfile.append(np.array([4, 5, 6]))

    # Extend the file by an array with an additional axis.
    memfile.extend(np.array([[7, 8, 9], [10, 11, 12]]))

memmap_data = read_vector(path="data.mmpy", name="testdata")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memmpy-0.1.9.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

memmpy-0.1.9-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file memmpy-0.1.9.tar.gz.

File metadata

  • Download URL: memmpy-0.1.9.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for memmpy-0.1.9.tar.gz
Algorithm Hash digest
SHA256 9c68b8d41f447c3fd2732fc73e94664154fb62c7c6ba74a1abeb06fb9b5edbd8
MD5 5aa07b3f9203416d0fc1666645d416fb
BLAKE2b-256 f20ce56ee3b7b12ad916121a72854a1e1bd4b6b8430f8654a1c9345e17b72c64

See more details on using hashes here.

File details

Details for the file memmpy-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: memmpy-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for memmpy-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 057160634206a76b8684f97e04538cb387aa9b3807a1149fa0f9424b968df38a
MD5 1bc5223a50039ee370465172806edd19
BLAKE2b-256 fe26be4c08c78e0b713f6588e36aa8aef7cb4adb65c86e4d6cde813e3b6b755c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page