Skip to main content

Array format specialized for Machine Learning with Blosc2 backend and standardized metadata.

Project description

{ML Array} banner

PyPI Python Version Tests Docs License

tl;dr: Working with large medical or scientific images for machine learning? -> Use MLArray.

MLArray is a purpose-built file format for N-dimensional medical and scientific array data in machine learning workflows. It replaces the usual patchwork of source formats and late-stage conversions to NumPy/Zarr/Blosc2 by layering standardized metadata on top of a Blosc2-backed storage layout, so the same files work reliably across training, analysis, and visualization tools (including Napari and MITK).

Installation

You can install mlarray via pip:

pip install mlarray

To enable the mlarray_convert CLI command, install MLArray with the necessary extra dependencies:

pip install mlarray[all]

Documentaion

See the documentation for the API reference, the metadata schema, usage examples or CLI usage.

Usage

Below are common usage patterns for loading, saving, and working with metadata.

Default usage

import numpy as np
from mlarray import MLArray

array = np.random.random((128, 256, 256))
image = MLArray(array)  # Create MLArray image
image.save("sample.mla")

image = MLArray("sample.mla")  # Loads image

Memory-mapped usage

from mlarray import MLArray
import numpy as np

# read-only, partial access (default)
image = MLArray.open("sample.mla", mmap_mode='r')  
crop = image[10:20, 50:60]  # Read crop

# read/write, partial access
image = MLArray.open("sample.mla", mmap_mode='r+')  
image[10:20, 50:60] *= 5  # Modify crop in memory and disk

# read/write, partial access, create/overwrite
array = np.random.random((128, 256, 256))
image = MLArray.create("sample.mla", shape=array.shape, dtype=array.dtype, mmap_mode='w+')
image[...] = array  # Modify image in memory and disk

Metadata inspection and manipulation

import numpy as np
from mlarray import MLArray

array = np.random.random((64, 128, 128))
image = MLArray(
    array,
    spacing=(1.0, 1.0, 1.5),
    origin=(10.0, 10.0, 30.0),
    direction=[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
    meta={"patient_id": "123", "modality": "CT"},  # Any metadata from the original image source (for example raw DICOM metadata)
)

print(image.spacing)  # [1.0, 1.0, 1.5]
print(image.origin)  # [10.0, 10.0, 30.0]
print(image.meta.source)  # {"patient_id": "123", "modality": "CT"}

image.spacing[1] = 5.3
image.meta.source["study_id"] = "study-001"
image.save("with-metadata.mla")

# Open memory-mapped
image = MLArray.open("with-metadata.mla", mmap_mode='r+')  
image.meta.source["study_id"] = "new-study"  # Modify metadata
image.close()  # Close and save metadata, only necessary to save modified metadata

Copy metadata with overrides

import numpy as np
from mlarray import MLArray

base = MLArray("sample.mla")
array = np.random.random(base.shape)

image = MLArray(
    array,
    spacing=(0.8, 0.8, 1.0),
    copy=base,  # Copies all non-explicitly set arguments from base
)

image.save("copied-metadata.mla")

Standardized metadata usage

import numpy as np
from mlarray import MLArray, Meta

array = np.random.random((64, 128, 128))
image = MLArray(
    array,
    meta=Meta(source={"patient_id": "123", "modality": "CT"}, is_seg=True),  # Add metadata in a pre-defined format
)

print(image.meta.source)  # {"patient_id": "123", "modality": "CT"}
print(image.meta.is_seg)  # True

image.meta.source["study_id"] = "study-001"
image.meta.is_seg = False
image.save("with-metadata.mla")

Patch size variants

Default patch size (192):

from mlarray import MLArray

image = MLArray("sample.mla")  # Existing file
image.save("default-patch.mla")  # Keeps existing layout metadata

loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size='default')
image.save("default-patch-relayout.mla")  # Uses constructor patch_size='default' (192)

Custom isotropic patch size (512):

from mlarray import MLArray

loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size=512)
image.save("patch-512.mla")

Custom non-isotropic patch size:

from mlarray import MLArray

loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size=(128, 192, 256))
image.save("patch-non-iso.mla")

Manual chunk/block size:

from mlarray import MLArray

loaded = MLArray("sample.mla")
image = MLArray(
    loaded.to_numpy(),
    patch_size=None,
    chunk_size=(1, 128, 128),
    block_size=(1, 32, 32),
)
image.save("manual-chunk-block.mla")

Let Blosc2 itself configure chunk/block size:

from mlarray import MLArray

loaded = MLArray("sample.mla")
image = MLArray(loaded.to_numpy(), patch_size=None)
# If patch_size, chunk_size and block_size are all None, Blosc2 will auto-configure chunk and block size
image.save("blosc2-auto.mla")

CLI

mlarray_header

Print the metadata header from a .mla file.

mlarray_header sample.mla

mlarray_convert

Convert between MLArray and NIfTI/NRRD files.

When converting from NIfTI/NRRD to MLArray, source metadata is copied into meta.source.

When converting from MLArray to NIfTI/NRRD, only meta.source is copied into the output header. Spatial metadata (spacing, origin, direction) is set explicitly from meta.spatial.

mlarray_convert sample.nii.gz output.mla
mlarray_convert sample.mla output.nii.gz

Contributing

Contributions are welcome! Please open a pull request with clear changes and add tests when appropriate.

Acknowledgments

    

This repository is developed and maintained by the Applied Computer Vision Lab (ACVL) of Helmholtz Imaging and the Division of Medical Image Computing at DKFZ.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlarray-0.0.52.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlarray-0.0.52-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file mlarray-0.0.52.tar.gz.

File metadata

  • Download URL: mlarray-0.0.52.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlarray-0.0.52.tar.gz
Algorithm Hash digest
SHA256 3dff1fb821ba023495ec101c610d0227bf1d8472c79dc7b77440dd6d45d668f0
MD5 4af5266bd19acfc2f4ebca6053ef7dd5
BLAKE2b-256 a1dffb98bc0c78c94c192d8c554fb02ca82c8d51512ac1b214c182c8f1d7bc0b

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlarray-0.0.52.tar.gz:

Publisher: workflow.yml on MIC-DKFZ/mlarray

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlarray-0.0.52-py3-none-any.whl.

File metadata

  • Download URL: mlarray-0.0.52-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlarray-0.0.52-py3-none-any.whl
Algorithm Hash digest
SHA256 c1950e3e3b10073ff8fbea5b23c3f4aead860d24c7b5bd5719f63462bf5980be
MD5 3ace991d1b74f21ad9711962fec3a5c3
BLAKE2b-256 f130c6aa0e7f0df841a882ff1f72d25620c4457b5cd776b0166c76f450d28dd2

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlarray-0.0.52-py3-none-any.whl:

Publisher: workflow.yml on MIC-DKFZ/mlarray

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page