Skip to main content

LCMS Processing tools used by the Metabolomics Platform at the Broad Institute.

Project description

BMXP - The Metabolomics Platform at the Broad Institute

pip install bmxp

Please cite: https://www.biorxiv.org/content/10.1101/2023.06.09.544417v1.full

This is a collection of tools for processing our data, which powers our cloud processing workflow. Each tool is meant to be a standalone module that performs a step in our processing pipeline. They are written in Python and C, and designed to be perfomant and cloud-compatible.

  • Eclipse - Align two or more same-method nontargeted LCMS datasets
  • Gravity - Cluster redundant LCMS features based on RT and Correlation (And someday, XIC shape)
  • Blueshift - Drift Correction via pooled technical replicates and internal standards
  • Formation - Formatting and Final QC
  • Chroma - Read .raw and .mzml files

We expect users to be familiar with Python and already have an understanding of LCMS Metabolomics data processing and the specific steps they wish to accomplish.

While the tools are and always will be standalone, we are working on linking them closer together with a shared schema, and eventually may have a pipeline ability to run all steps, given a set of parameters.

We are open to feedback and suggestions, with a focus on performance and application in pipelines.

Shared Schema

All BMXP modules use a shared schema and file formats with our prefered columns headers. These files are (along with their labels):

  • Feature Metadata bmxp.FMDATA - Describes the feature. Index default is Compound_ID
  • Injection Metadata bmxp.IMDATA - Describes the Injection. Index default is injection_id
  • Sample Metadata bmxp.SMDATA - Describes the biospecimen from which the Injection is derived. Index default is broad_id
  • Feature Abundances - Pivot table of Feature x Injection (Compound_ID x injection_id) containing the abundances.

Some modules (Blueshift, Eclipse) require merging Feature Metadata + Feature Abundances.

These can be changed globally so that all packages will use the same terminology. To update the schema, modify the dictionary objects in the module directly prior to running code. For example:

import bmxp
from bxmp.eclipse import MSAligner
from bxmp.blueshift import DriftCorrection
from bmxp.gravity import cluster
bmxp.FMDATA['Compound_ID'] = 'Feature_ID'
bmxp.IMDATA['injection_id'] = 'Filename'

# continue with work...

With those changes above, Eclipse, Blushift and Gravity will use "Feature_ID" and "Filename" as column headers instead of "Compound_ID" and "injection_id".

Feature Metadata - bmxp.FMDATA

Feature Metadata describes the LCMS feature. This is a mixture of fundamental nontargeted feature information, annotation info, and anything else.

Feature Specific

  • Compound_ID - Index, Project-unique feature ID (a bit of a misnomer)
  • RT - Unitless retention time, may or may not be scaled
  • MZ - Unsigned mass-to-charge ratio
  • Intensity - Average feature intensity
  • Method - Human Readable name of LCMS method used
  • __extraction_method - Name of extraction method/software used. Used to denote mixed Targeted/Nontargeted

Annotation

  • Annotation_ID - Method-unique annotation label
  • Adduct - Adduct form of the annotation
  • __annotation_id - Globally unique annotation identifier
  • Metabolite - Preferred display/reporting name of metabolite
  • Non_Quant - Boolean denoting that a feature is not quanitifiable

Generated by Gravity

  • Cluster_Num - Cluster number assigned during Gravity clustering
  • Cluster_Size - Number of members in the cluster

Generated by Blueshift

  • Batches Skipped - Batches that were skipped due to lack of PREFs

Injection Metadata - bmxp.IMDATA

  • injection_id - Index, Injection name, usually filename without the extension
  • broad_id - Assigned biospeciemn label
  • program_id - Biospecimen label as received (inherited from Sample Metadata)
  • injection_type - Type of injection ("sample", "prefa", "prefb", "blank", "other-", "not_used-")
  • comments - Comments about the injection
  • column_number - Column number, in multi-column studies
  • injection_order - Injection number, not skipping blanks or non-samples
  • batches - Denotes batches ('batch_start' or 'batch_end')

Generated by Blueshift

  • QCRole - Role in drift correction ("QC-drift_correction", "QC-pooled_ref", "QC-not_used", "sample")

Sample Metadata - bmxp.SMDATA

  • broad_id - Assigned biospecimen label
  • Arbitrary Metadata Columns - Any column label except labels in Injection Metadata

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bmxp-0.5.2.tar.gz (2.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bmxp-0.5.2-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file bmxp-0.5.2.tar.gz.

File metadata

  • Download URL: bmxp-0.5.2.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for bmxp-0.5.2.tar.gz
Algorithm Hash digest
SHA256 041ec52ec4b6a36dba4adb3c6b76e2b751f0a1b5df5c1cba168ac166c1754e33
MD5 a411ef46cc91853c1c8f50c084d47dc5
BLAKE2b-256 20cc53b898c41a704a213839a903345147e4988551dc80f5819487e12c20469e

See more details on using hashes here.

File details

Details for the file bmxp-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: bmxp-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for bmxp-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f3c3e1b7aa656f78c33b82973ff0a1ad9622d2d70fc72b81ac5ae24d10d9407b
MD5 5d905373e35c93369ae1eee6207eab23
BLAKE2b-256 5ec434900974ca1c0919303db02ddfd5790a09395a80b70ffc865f1f2e6a88e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page