Skip to main content

Mass spectrometry imaging file format mspix, designed to be an efficient storage and analysis format for mass spectrometry imaging data produced using time-of-flight mass spectrometetry, with an emphasis on being highly space and access-time efficient when working with large numbers (>5 million) of pixels.

Project description

mspix File Format

mspix files are designed to be an efficient storage and analysis format for mass spectrometry imaging data produced using time-of-flight mass spectrometetry, with an emphasis on being highly space and access-time efficient when working with large numbers (>5 million) of pixels.

The main goals of this file format may be summarized as:

  • Store large pixel-count mass images in a space and access-efficient manner mspix files are designed to accomodate ~1 billion pixel mass images and are recommended for mass images exceeding 5 million pixels, mass images with a single mass axis, and/or targeted mass images.

  • Provide easy inspection and simple, obvious, formats for the loose files. The intention behind this decision is to enable ease of tooling and libraries to be written by others.

  • Allow for user-added data in the form of additional files, axes, and metadata This is done using a folder/HDF5-based structure.


File or Folder Structure

A .mspix file may be either a 'loose' collection of files in a folder or a 'packed', HDF5-based file with a similar structure. The advantages of the loose representation is that it:

  • Allows for multiple tools present to inspect the files
  • Enables easy memory mapping larger files
  • Enables ease of modificaiton to individual composite files
  • May be easier to implement in some programming languages and environments that do not have extensive HDF5 support

The advantages of the packed representation are that it facilitates file storage and sharing by ensuring related data remain together


Loose Representation File Structure

The loose file is a folder that contains the following files:

metadata.json

A JSON-compliant, UTF-8 encoded text file of all metadata for the image. At this time, only the following entries are strictly required in this file:

  1. a semantically-versioned "mspix_version", which is "1.0.0". This version number allows for backwards-compatible changes to the file-format.
  2. the width of the image in pixels with key "image_width_pixels".
  3. the height of the image in pixels with key "image_height_pixels".
  4. the list of spectral (mass) channels with key "spectral_channels".
    This list of spectral channels is what the "indices" file indexes into. The largest index in the indices file should be 1 less than the length of this list of spectral channels. These spectral channels will most often be m/z values, however they may also be times-of-flight or other more abstract values to enable easier or more consistent data calibration and processing.
    In cases where these values are not m/z values, it is advised to provide a list of m/z values as an additional metadata list.
  5. the corresponding list of intensities for each spectral channel with key "spectral_intensities".
    The reason for the spectral_intensities entry is to enable quick viewing of the overall spectrum of the mspix data to facilitate some common operations such as spectral normalization and peak picking. This data is redundantly stored and could be generated from the "indices" and "intensities" files.

Entries 2, 3, 4, and 5 consist of numbers or lists of numbers. All these numbers must all be interpretable as 64-bit IEEE floating point numbers.

In addition to the "mspix_version", it is advisable to add metadata such as:

A. the physical dimensions of pixels
B. the overall image's physical dimensions
C. instrument settings that were used for the acquisition (e.g., instrument polarity)
D. image creation and construction parameters
E. Checksums of the constituent files
F. Relevant sample data (e.g., matrix or sample preparation used)
G. Small annotations
H. Alternative axis information (e.g., recalibrated mass axes, m/z labels, etc.)

An example of such a file might be:

{
    "mspix_version": "1.0.0",
    "image_width_pixels": 2412,
    "image_height_pixels": 2411,
    "spectral_channels": [1.000, 1.005, 1.010, ... 999.990, 999.995, 1000.000 ],
    "spectral_intensities": [ 217, 151, 307, ... 58, 30, 176],
    "original_file": "C:\\Users\\user\\Desktop\\example.tpx3",
    "width_mm": 2.0025000934838317,
    "height_mm": 2.001499891281128,
    "acquisition_time_ns": 173700783473,
    "tof_pulse_count": 3358385,
    ...,
}

with truncations indicated by an ellipsis in this example.

pixel_intensities.xXX

A little-endian encoded binary file consisting of either 8, 16, 32, or 64-bit numbers with the total number of numbers equal to the "image_width_pixels" multiplied by the "image_height_pixels". The ".xXX" file extension is a placeholder for indicating the numeric type and size of the file. Valid extensions are ".u8", ".u16", ".u32", ".u64", ".f32", and ".f64" where "u" indicates an unsigned integer, and "f" indicates an IEEE floating point number. The bits of the numbers are indicated with the 8, 16, etc.

This file is similar to the "spectral_intensities" entry in the "metadata.json" file in that its purpose is to facilitate some common operations such as image normalization and easy region-of-interest selection. This data is redundantly stored and could be generated from other files.

pixel_channels.uXX

A little-endian encoded binary file consisting of either 8, 16, 32, or 64-bit numbers with the total number of numbers equal to the "image_width_pixels" multiplied by the "image_height_pixels". The ".uXX" file extension is a placeholder for indicating the numeric type and size of the file. Valid extensions are ".u8", ".u16", ".u32", and ".u64" where "u" indicates an unsigned integer. The bits of the numbers are indicated with the 8, 16, etc.

Each "pixel_channels.uXX" pixel's value is the number of nonzero spectral channels in that pixel. E.g., if the mspix pixel has 15 nonzero m/z channels then the corresponding pixel_channels.uXX value would be 15.

This file facilitates creation of an offset table (via cumulative summation of each pixel's channels) for access of the indices and intensities in the corresponding "intensities.xXX" and "indices.uXX" files specified below.

intensities.xXX

A little-endian encoded binary file of unsigned integers or floating point values. The size and type of the binary numbers is specified by the .xXX extension, where 'x' is either 'u' for 'unsigned integer' or 'f' for 'floating point value'. 'XX' represents the number of bits for the unsigned integer or floating point value and can be either 8, 16, 32, or 64 if 'x' is 'u' or 32 or 64 if 'x' is 'f'. (e.g., "intensities.u8" and "intensities.f64" are valid names)

Each value in the binary file is the intensity of a channel (i.e., m/z value) within a pixel. Which pixel is specified by the offset table created from the "pixel_channels.uXX" file. Which channel is specified by the corresponding unsigned index value in the "indices.uXX" file. Mass channels with intensities of 0 may be included but are recommended to be omitted to save space and, potentially, processing time.

The number of intensities in the "intensities.uXX" and indices in the "indices.uXX" file must be identical.

indices.uXX

A little-endian encoded binary file of unsigned integers. The size of the integers is specified by the .uXX extension, where XX represents the number of bits for the unsigned integer and can be either 8, 16, 32, or 64, depending on the size of the axis (i.e., the number of entries in the axis.csv file).

Each unsigned integer is the index of a nonzero intensity mass/time channel within a mspix pixel. The corresponding intensity of the indexed mass/time channel is found in the "intensities.uXX" file.

The above files may be supplemented with additional (optional) files. These files may be convienient to group with the .mspix data for storage and specific program implementations.

Examples of such optional files are:

  • Another UTF-8 based file that has the same number of entries as the mass axis and time axis, but that contains molecular, fragment, or elemental identifications of particular ions.
  • A stored set of indices indicating different methods of centroiding that could be possible to reduce the data.
  • A corresponding optical light microscopy image of the area imaged

Packed representation file structure

The packed file representation is an HDF5 file that contains equivalent information to the loose files as HDF5 datasets. These files translated to datasets are:

metadata

A 1x1-sized dataset with a single UTF-8-encoded, JSON-compliant string that is equivalent to the data of the loose-representation "metadata.json" file specified in 3.

pixel_intensities

A 2-dimensional dataset of one of the compatible datatypes corresponding to the "pixel_intensities.xXX" loose-representation file specified in 3. The numeric type does not need to be otherwise specified as it is known by the HDF5 dataset. The 2-dimensional dataset is natural as the pixel_intensities represent the overall image data. The width and height of the 2-dimensional dataset must correspond to the specified "image_width_pixels" and "image_height_pixels", respectively.

pixel_channels

A 2-dimensional dataset of one of the compatible datatypes corresponding to the "pixel_channels.uXX" loose-representation file specified in 3. The numeric type does not need to be otherwise specified as it is known by the HDF5 dataset. The 2-dimensional dataset is natural as the pixel_channels represents the overall image data. The width and height of the 2-dimensional dataset must correspond to the specified "image_width_pixels" and "image_height_pixels", respectively.

indices

A 1-dimensional dataset of one of the compatible datatypes corresponding to the "indices.uXX" loose-representation file specified in section 3. The numeric type does not need to be otherwise specified as it is known by the HDF5 dataset.

intensities

A 1-dimensional dataset of one of the compatible datatypes corresponding to the "intensities.xXX" loose-representation file specified in section 3. The numeric type does not need to be otherwise specified as it is known by the HDF5 dataset.

##Image orientation and shape All mspix images start with pixel at index 0, row: 0, column: 0 where row 0 is at the top of the image and column 0 is at the left of the image. The image then proceeds in the "x" direction (along the row 0, incrementing column; row-major) until it is at the right border, the next pixel is then at the far-left in row 1.

The layout of mspix images is constant and independant of the stage or beam motion during acquisition (e.g. serpentine, typewriter, up-down, or left-right). If information regarding the stage motion during acquisition is desired, it may be provided in the metadata.json file or as a separate file within the folder/zip file.

Limitations and restrictions

The mspix file format takes advantage of some restrictions, limitations, and constraints that are common in current mass spectrometry imaging experiments. These constraints do not apply to many mass spectrometry images, which means that mspix is not suitable for losslessly representing such images. Some considerations are that:

  • The data are "sparse" and do not contain digitizer noise. This means that the "processed" and "continuous" options of, for exmaple, the .imzML file format do not apply as all mspix files are "processed". Similarly, "event-based" data that may be encountered with some TOF-SIMS data must be converted to "processed" data for mspix conversion.

  • mspix images are rectangular. It is possible to store non-rectangular imaged areas by "cropping" or deleting pixels outside of a defined area. Represenging these empty "corner" pixels can be very space efficient (taking only a few bytes each). Images with pixels that are not sampled on a rectangular grid cannot be represented without re-binning the pixels (and thus some data loss) using mspix.

  • The data must be represented with a single 1-dimensional axis. This excludes some multi-tandem-MS and ion mobility data as some data acquired with these modes are not able to be represented efficiently with a single axis. MS2 data with a common second axis may be able to be represented by extending the primary mass axis or providing additional metadata that tag specific pixels or axis channels with fragmentation information; however such representations are left unspecified.

  • Only a single, 2-dimensional image will be contained in the mspix file. This means that 3-d images or depth-profiling images are not supported in the 1.0.0 version of mspix.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mspix-1.0.4.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mspix-1.0.4-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file mspix-1.0.4.tar.gz.

File metadata

  • Download URL: mspix-1.0.4.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for mspix-1.0.4.tar.gz
Algorithm Hash digest
SHA256 74d30717e867e4b9d8d990188d19f398f3e09744685e8c56416c15463e7c2f67
MD5 adde90a027ff3ed3a5cd5789f693ab8a
BLAKE2b-256 bd683b42332517db8d03283bc87a175ef3183ce48ac7559cf0347540fef4bd9e

See more details on using hashes here.

File details

Details for the file mspix-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: mspix-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for mspix-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 27102c2042d68eef83219ae5494a7008419f07fb89deae11685dd5491ba161a0
MD5 9d75c9569176d5d301af0826d570b48d
BLAKE2b-256 a68b032cbfad57a7dc79fd831bcc9506e982ca0aba3c7ab77009cb5efc33e2bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page