Skip to main content

Python package for the high-throughput nontargeted metabolite fingerprinting of nominal mass direct injection mass spectrometry.

Project description

DIMEpy: Direct Infusion MEtablomics processing in python

Project Status: Active - The project has reached a stable, usable state and is being actively developed. PyPI - Python Version PyPI PyPI - License DOI PyPI - Status Documentation Status Downloads Build Status

Python package for the high-throughput nontargeted metabolite fingerprinting of nominal mass direct injection mass spectrometry directly from mzML files.

This work is very much inspired by the methods detailed in High-throughput, nontargeted metabolite fingerprinting using nominal mass flow injection electrospray mass spectrometry (Beckmann, et al, 2008).

Features

  • Loading mass spectrometry files from mzML.
    • Support for polarity switching.
    • MAD-estimated infusion profiling.
  • Assay-wide outlier spectrum detection.
  • Spurious peak elimination.
  • Spectrum export for direct dissemination using Metaboanalyst.
  • Spectral binning.
  • Value imputation.
  • Spectral normalisation.
    • including TIC, median, mean...
  • Spectral transformation.
    • including log10, cube, nlog, log2, glog, sqrt, ihs...
  • Export to array for statistical analysis in Metaboanalyst.

Installation

DIMEpy requires Python 3+ and is unfortunately not compatible with Python 2. If you are still using Python 2, a clever workaround is to install Python 3 and use that instead.

You can install it through pypi using pip:

pip install dimepy

If you want the 'bleeding edge' version this, you can also install directly from this repository using git - but beware of dragons:

pip install git+https://www.github.com/AberystwythSystemsBiology/DIMEpy

Usage

To use the package, type the following into your Python console:

>>> import dimepy

At the moment, this pipeline only supports mzML files. You can easily convert proprietary formats to mzML using ProteoWizard.

Loading a single file

If you are only going to load in a single file for fingerprint matrix estimation, then just create a new spectrum object. If the sample belongs to a characteristic, it is recommend that you also pass it through when instantiating a new Spectrum object.

>>> filepath = "/file/to/file.mzML"
>>> spec = dimepy.Spectrum(filepath, identifier="example", stratification="class_one")
/file/to/file.mzML

By default the Spectrum object doesn't set a snr estimator. It is strongly recommended that you set a signal to noise estimation method when instantiating the Spectrum object.

If your experimental protocol makes use of mixed-polarity scanning, then please ensure that you limit the scan ranges to best match what polarity you're interested in analysing:

>>> spec.limit_polarity("negative")

If you are using FIE-MS it is strongly recommended that you use just the infusion profile to generate your mass spectrum. For example, if your scan profiles look like this:

        |        _
      T |       / \
      I |      /   \_
      C |_____/       \_________________
        0     0.5     1     1.5     2 [min]

Then it is fair to assume that the infusion occured during the scans ranging from 30 seconds to 1 minute. The limit_infusion() method does this by estimating the median absolute deviation (MAD) of total ion counts (TIC) before limiting the profile to the range between the time range in which whatever multiple of MAD has been estimated:

>>> spec.limit_infusion(2) # 2 times the MAD.

Now, we are free to load in the scans to generate a base mass_spectrum:

>>> spec.load_scans()

You should now be able to access the generated mass spectrum using the masses and intensities attributes:

>>> spec.masses
array([ ... ])
>>> spec.intensities
array([ ... ])

Working with multiple files

A more realistic pipeline would be to use multiple mass-spectrum files. This is where things really start to get interesting. The SpectrumList object facilitates this through the use of the append method:

>>> speclist = dimepy.SpectrumList()
>>> speclist.append(spec)

You can make use of an iterator to recursively generate Spectrum objects, or do it manually if you want.

If you're only using this pipeline to extract mass spectrum for Metabolanalyst, then you can now simply call the _to_csv method:

>>> speclist.to_csv("/path/to/output.csv", output_type="metaboanalyst")

That being said, this pipeline contains many of the preprocessing methods found in Metaboanalyst - so it may be easier for you to just use ours.

As a diagnostic measure, the TIC can provide an estimation of factos that may adversely affect the overal intensity count of a run. As a rule, it is common to remove spectrum in which the TIC deviates 2/3 times from the median-absolute deviation. We can do this by calling the detect_outliers method:

>>> speclist.detect_outliers(thresh = 2, verbose=True)
Detected Outliers: outlier_one;outlier_two 

A common first step in the analysis of mass-spectrometry data is to bin the data to a given mass-to-ion value. To do this for all Spectrum held within our SpectrumList object, simply apply the bin method:

>>> speclist.bin(0.25) # binning our data to a bin width of 0.25 m/z

In FIE-MS null values should concern no more than 3% of the total number of identified bins. However, imputation is required to streamline the analysis process (as most multivariate techniques are unable to accomodate missing data points). To perform value imputation, just use value_imputate:

>>> speclist.value_imputate()

Now transforming and normalisating the the spectrum objects in an samples independent fashion can be done using the following:

>>> speclist.transform()
>>> speclist.normalise()

Once completed, you are now free to export the data to a data matrix:

>>> speclist.to_csv("/path/to/proc_metabo.csv", output_type="matrix")

This should give you something akin to:

Sample ID M0 M1 M2 M3 ...
Sample 1 213 634 3213 546 ...
Sample 2 132 34 713 6546 ...
Sample 3 1337 42 69 420 ...

Bug reporting and feature suggestions

Please report all bugs or feature suggestions to the issues tracker. Please do not email me directly as I'm struggling to keep track of what needs to be fixed.

We welcome all sorts of contribution, so please be as candid as you want(!)

Documentation

Documentation for the project can be found on its readthedocs page.

Contributors

License

DIMEpy is licensed under the GNU General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DIMEpy-1.0.0.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

DIMEpy-1.0.0-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file DIMEpy-1.0.0.tar.gz.

File metadata

  • Download URL: DIMEpy-1.0.0.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for DIMEpy-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6f91f5ea4cd678fe76c006d2941501c433f39b4d7e543212acdd0542377779fe
MD5 bcd661989b4dd22bc6082e619ce51120
BLAKE2b-256 dd2569855babac9a0546ae7447c90a309a0921dd7a5d7720bcbb903beeeb9f58

See more details on using hashes here.

File details

Details for the file DIMEpy-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: DIMEpy-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for DIMEpy-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5093a7a8590980a96e4c4eecebae666e710c750fdaf2a188df80a1927bf7bf8d
MD5 6f4b68966c6227e87acd1ad6f341863a
BLAKE2b-256 cda08574ef882f3a34e26d7008908f9b0cf0f57d9d88d0390847cd0ef066bd12

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page