Skip to main content

Clean mass spectrometry imaging dataset and extract geologically meaningful features

Project description

SediMine

A data cleaning and datamining workflow for sedimentary MSI data sets.

Prerequisite

Before using the workflow, proprietary mass spectrometry data format (e.g., .D from Bruker) needs to be exported as plain text file (represented as da_exported_txt in the following examples). Only the coordinates and the centroid mass-to-charge ratios along with the peak intensity are needed from each spectrum.

Python >= 3.5 is needed, and the required library is listed in requirements.txt.

The package has been tested on Windows (Windows 10), OSX, and Linux (Archlinux).

Installation

Just run the following command and the package with all dependecies will be installed.

pip install git+https://github.com/weimin-liu/msi_feature_extraction.git

Instruction

Mass calibration

Dataset should be calibrated first if it hasn't been calibrated yet. Currently, a quadric mass error calibration function is available in this package.

from mfe.calibration import suggest_calibrates, SimpleFallbackCalibrate
from mfe.from_txt import msi_from_txt

# get a list of the most abundant peaks in the dataset

candidates, _ = suggest_calibrates(da_exported_txt)

# create a dictionary to store the dataset
msi = msi_from_txt(da_exported_txt)

sfc = SimpleFallbackCalibrate()

# feed the list of calibrates to SimpleFallbackCalibrate. Assign each spectrum with a calibrate. The calibrate is decided as follows: first try to use the first calibrate in the list in all spectra, if the calibrate is missing in some spectra, it will then try to calibrate those spectra with the second calibrate in the list, and so on, until the spectra are all calibrated or the calibrate list is exhausted.
sfc.fit(msi, candidates)

# do the actual calibration on the dataset
msi_calibrated = sfc.transform(msi)

Align peaks into discrete mass bins

Currently, the discrete mass bins are evenly spaced with user designated interval.

from mfe.from_txt import create_feature_table

feature_table = create_feature_table(msi_calibrated)

A 2D table will be produced in this step, with columns being the name of mass bins (m/z ratios), and each row representing one spot.

Pick peaks using grey-level co-occurrences matrix

No peak has been dropped until this step, grey-level co-occurrences matrix (GLCM) are used to detect how structured are those ion images and rank them.

from mfe.peak_picking import get_peak_ranks

t_df, deflated_arr = get_peak_ranks(feature_table)

The result contains the ranked peaks with its corresponding ion image, manual examination is needed to decide a threshold (th) above which the peaks are preserved.

from mfe.peak_picking import sel_peak_by_rank

feature_table, ims = sel_peak_by_rank(t_df, deflated_arr, feature_table, th)

Feature extraction using non-negative matrix factorization

from mfe.feature import rank_estimate, nmf

# first detect the appropriate rank for the data, the list of images are used here instead of the feature table, because the images have already been normalized with quantiles removed.
rank_candidates = list(range(2, 20))

rank_estimate(rank_candidates, ims)

# then do the factorization with an appropriate rank `rk`, getting the basis matrix and the coeffcients
basis, components = nmf(ims, feature_table, rk)

# to get the co-localization molecular network, n_run >1 must be set
basis, components, G = nmf(ims, feature_table, rk, n_run=20)

Notes:

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mfe-0.0.2.tar.gz (19.4 kB view hashes)

Uploaded Source

Built Distribution

mfe-0.0.2-py3-none-any.whl (19.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page