Array format specialized for Machine Learning with Blosc2 backend and standardized metadata.
Project description
tl;dr: Working with large medical or scientific images for machine learning? -> Use MLArray.
MLArray is a purpose-built file format for N-dimensional medical and scientific array data in machine learning workflows. It replaces the usual patchwork of source formats and late-stage conversions to NumPy/Zarr/Blosc2 by layering standardized metadata on top of a Blosc2-backed storage layout, so the same files work reliably across training, analysis, and visualization tools (including Napari and MITK).
Installation
You can install mlarray via pip:
pip install mlarray
To enable the mlarray_convert CLI command, install MLArray with the necessary extra dependencies:
pip install mlarray[all]
Documentaion
See the documentation for the API reference, the metadata schema, usage examples or CLI usage.
Usage
Below are common usage patterns for loading, saving, and working with metadata.
Default usage
import numpy as np
from mlarray import MLArray
array = np.random.random((128, 256, 256))
image = MLArray(array) # Create MLArray image
image.save("sample.mla")
image = MLArray("sample.mla") # Loads image
Memory-mapped usage
from mlarray import MLArray
import numpy as np
# read-only, partial access (default)
image = MLArray.open("sample.mla", mmap_mode='r')
crop = image[10:20, 50:60] # Read crop
# read/write, partial access
image = MLArray.open("sample.mla", mmap_mode='r+')
image[10:20, 50:60] *= 5 # Modify crop in memory and disk
# read/write, partial access, create/overwrite
array = np.random.random((128, 256, 256))
image = MLArray.create("sample.mla", shape=array.shape, dtype=array.dtype, mmap_mode='w+')
image[...] = array # Modify image in memory and disk
Metadata inspection and manipulation
import numpy as np
from mlarray import MLArray
array = np.random.random((64, 128, 128))
image = MLArray(
array,
spacing=(1.0, 1.0, 1.5),
origin=(10.0, 10.0, 30.0),
direction=[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
meta={"patient_id": "123", "modality": "CT"}, # Any metadata from the original image source (for example raw DICOM metadata)
)
print(image.spacing) # [1.0, 1.0, 1.5]
print(image.origin) # [10.0, 10.0, 30.0]
print(image.meta.source) # {"patient_id": "123", "modality": "CT"}
image.spacing[1] = 5.3
image.meta.source["study_id"] = "study-001"
image.save("with-metadata.mla")
# Open memory-mapped
image = MLArray.open("with-metadata.mla", mmap_mode='r+')
image.meta.source["study_id"] = "new-study" # Modify metadata
image.close() # Close and save metadata, only necessary to save modified metadata
Copy metadata with overrides
import numpy as np
from mlarray import MLArray
base = MLArray("sample.mla")
array = np.random.random(base.shape)
image = MLArray(
array,
spacing=(0.8, 0.8, 1.0),
copy=base, # Copies all non-explicitly set arguments from base
)
image.save("copied-metadata.mla")
Standardized metadata usage
import numpy as np
from mlarray import MLArray, Meta
array = np.random.random((64, 128, 128))
image = MLArray(
array,
meta=Meta(source={"patient_id": "123", "modality": "CT"}, is_seg=True), # Add metadata in a pre-defined format
)
print(image.meta.source) # {"patient_id": "123", "modality": "CT"}
print(image.meta.is_seg) # True
image.meta.source["study_id"] = "study-001"
image.meta.is_seg = False
image.save("with-metadata.mla")
Patch size variants
Default patch size (192):
from mlarray import MLArray
image = MLArray("sample.mla")
image.save("default-patch.mla") # Default patch_size is 'default' -> Isotropic patch size of 192 pixels
image.save("default-patch.mla", patch_size='default')
Custom isotropic patch size (512):
from mlarray import MLArray
image = MLArray("sample.mla")
image.save("patch-512.mla", patch_size=512)
Custom non-isotropic patch size:
from mlarray import MLArray
image = MLArray("sample.mla")
image.save("patch-non-iso.mla", patch_size=(128, 192, 256))
Manual chunk/block size:
from mlarray import MLArray
image = MLArray("sample.mla")
image.save("manual-chunk-block.mla", chunk_size=(1, 128, 128), block_size=(1, 32, 32))
Let Blosc2 itself configure chunk/block size:
from mlarray import MLArray
image = MLArray("sample.mla")
# If patch_size, chunk_size and block_size are all None, Blosc2 will auto-configure chunk and block size
image.save("manual-chunk-block.mla", patch_size=None)
CLI
mlarray_header
Print the metadata header from a .mla or .b2nd file.
mlarray_header sample.mla
mlarray_convert
Convert a NIfTI or NRRD file to MLArray and copy metadata.
mlarray_convert sample.nii.gz output.mla
Contributing
Contributions are welcome! Please open a pull request with clear changes and add tests when appropriate.
Acknowledgments
This repository is developed and maintained by the Applied Computer Vision Lab (ACVL) of Helmholtz Imaging and the Division of Medical Image Computing at DKFZ.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlarray-0.0.45.tar.gz.
File metadata
- Download URL: mlarray-0.0.45.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57ec47ec17b2da97f5bef09543ac84a08e11893e8232a2779b10dafdd63098d5
|
|
| MD5 |
cfc3f0b92604b2a233a6871a142f6bd6
|
|
| BLAKE2b-256 |
0e71242f6f5261376cc7cc60eacfdd1043aadeb5e69bcdb5a4c9c1181bfabbc7
|
Provenance
The following attestation bundles were made for mlarray-0.0.45.tar.gz:
Publisher:
workflow.yml on MIC-DKFZ/mlarray
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlarray-0.0.45.tar.gz -
Subject digest:
57ec47ec17b2da97f5bef09543ac84a08e11893e8232a2779b10dafdd63098d5 - Sigstore transparency entry: 969499444
- Sigstore integration time:
-
Permalink:
MIC-DKFZ/mlarray@67435c288cd862cc4708f14dbe6fa1bc7c745b49 -
Branch / Tag:
refs/tags/v0.0.45 - Owner: https://github.com/MIC-DKFZ
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@67435c288cd862cc4708f14dbe6fa1bc7c745b49 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mlarray-0.0.45-py3-none-any.whl.
File metadata
- Download URL: mlarray-0.0.45-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2b51d86c1e0bbff987f5e4c797ab70dfef6019a26f2b065c27e7e4895129e26
|
|
| MD5 |
8512606350e3bacfc10c2fdbfbca76ba
|
|
| BLAKE2b-256 |
e21874be5813514002e21d894ed18db2f62e110dc85485ccf2bf5c37899deba7
|
Provenance
The following attestation bundles were made for mlarray-0.0.45-py3-none-any.whl:
Publisher:
workflow.yml on MIC-DKFZ/mlarray
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlarray-0.0.45-py3-none-any.whl -
Subject digest:
e2b51d86c1e0bbff987f5e4c797ab70dfef6019a26f2b065c27e7e4895129e26 - Sigstore transparency entry: 969499445
- Sigstore integration time:
-
Permalink:
MIC-DKFZ/mlarray@67435c288cd862cc4708f14dbe6fa1bc7c745b49 -
Branch / Tag:
refs/tags/v0.0.45 - Owner: https://github.com/MIC-DKFZ
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@67435c288cd862cc4708f14dbe6fa1bc7c745b49 -
Trigger Event:
push
-
Statement type: