Skip to main content

This targeted mass spectrometry quality encoder (tmasque) evaluates peak quality systematically for targeted mass spectrometry.

Project description

tmasque: Targeted Mass Spectrometry Quality Encoder

For processing datastes of targeted mass spectrometry, this tmasque package helps to evaluate peak quality of chromatograms objectively after running automatic peak-picking software. We conducted an unsupervised learning approach, the beta-total correlation variational autoencoder (beta-TCVAE), to learn peak quality from our collected TargetedMS data cohort (including both MRM and PRM experiments). The data cohort contains 1,703,827 peak groups from 75 studies and was produced from 22 different models of mass spectrometers, manufactured by the four leading brands: AB Sciex, Agilent Technologies, Thermo Fisher Scientific, and Waters Corporation. Such versatile datasets can contribute to a reliable quality encoder to provide objective scoring of chromatographic peaks.

This Python package carries a command line tool and provides a programming interface to calculate TMSQE scores and generate friendly reports.

Installation

The package is published as a python package. You need to have a Python with version >=3.8 installed and can then use pip to install tmasque.

pip install tmasque

Then, you can use tmasque -h or tmasque --help to see the arguments. You can also use tmasque example to have a quick example (see below).

Required data format

We currently support the TSV (tab-separated values) format file for chromatograms and the CSV (comma-separated values) format file for peak boundaries. These two files can be directly exported via Skyline.

The required nine column headers for chromatograms are listed as follows.

FileName PeptideModifiedSequence PrecursorCharge ProductMz FragmentIon ProductCharge IsotopeLabelType Times Intensities

For peak boundary CSV file, the following five column headers are required.

File Name Peptide Modified Sequence Min Start Time Max End Time Precursor Charge

You may notice that the spaces between words in the above column headers. This is because that these column headers followed the Skyline output formats for chromatograms and peak boundaries.

The PeptideModifiedSequence and Peptide Modified Sequence columns are used for display and for identifications of analyte targets. For metabolites, you may ignore the header and using metabolite name for this column is fine.

We followed the Skyline exported file formats for chromatograms and peak boundaries.

Command line tool

Once you have installed tmasque, you can use tmasque as the command line tool. In your terminal, type tmasque --help to see the detailed descriptions for arguments. The full arguments are also listed in the next section.

1. Quick example

Use the following command to run the quick example.

tmasque example

We have prepared a small example in this package. After running the tmasque example command, a folder will be created in your current working directory. This folder will also contain input files: chromatogram.tsv and peak_boundary.csv. You may also take a look the column headers and the formats. This folder are sample_quality.xlsx and chromatogram_plots.pdf. The former is the summarized peak quality scores for each sample, and the latter is the chromatogram plots. Also, this folder has a transition_quality folder in it. You can see one excel file for each target. The excel file displays the transition quality scores with all the underlying quality features.

2. The basic usage:

tmasque run <chromatogram_path> <peak_boundary_path> <output_sample_quality_excel_path> <output_transition_quality_folder_path>

In the above command, the former two arguments chromatogram_path and peak_boundary_path are the two required input files of the TSV and CSV formats, while the latter two are the output paths. output_sample_quality_excel_path expects an Excel file path and output_transition_quality_folder_path expects a folder path. If the folder path does not exist, the folder will be created during tmasque execution. In this folder, detailed quality feature values of each target would be exported in an Excel file.

3. Quality scores with additional chromatogram plots

We offer an option to plot chromatograms for visual examinations and easy identifications of problematic cases. Use the following command arguments to perform quality evaluation and generate chromatogram plots.

tmasque run <chromatogram_path> <peak_boundary_path> <output_sample_quality_excel_path> <output_transition_quality_folder_path> --output_chromatogram_pdf<output_chromatogram_pdf_path>

By adding the optional argument --output_chromatogram_pdf with the value of the expected output PDF path, a PDF file will be generated to display all chromatogram plots with color indications of good, acceptable, and poor peak quality.

4. Sample grouping for batches or response curve experiments

We can group a couple of samples and summarize the peak quality scores with medians to examine batche quality or the peak quality at different concentrations for response curve experiments. To deal with sample grouping, we can use arguments of file_group_delimiter and file_group_suffix_type to specify the rules for grouping. For example, if the sample names are listed as follows, we can use file_group_delimiter=- and file_group_suffix_type='d' to extract the suffix of the filename as the group name.

Sample name Group
20210727_DSS Response curve_121plex_150um_1-001.wiff 1
20210727_DSS Response curve_121plex_150um_2-001.wiff 1
20210727_DSS Response curve_121plex_150um_1-002.wiff 2
20210727_DSS Response curve_121plex_150um_2-002.wiff 2

Once the file_group_delimiter and file_group_suffix_type are set, quality scores will be further summarized from sample quality. An additional sheet Group Quality will be created in the output_sample_quality_excel_path file.

Full command argument list for tmasque run

Argument Description Value Type Default Values
INPUT
chromatogram_tsv The chromatogram TSV file path File path no default
peak_boundary_csv The Peak Boundary CSV file path File path no default
OUTPUT
output_quality_xlsx The output peak quality Excel file path File path no default
output_transition_folder The output transition quality folder path Folder path no default
OPTIONAL
--help -h Show the detailed argument list (no value) unset
--version -v Display the package version (no value) unset
--thread_num -t The parallel thread number to calculate quality feature values for all peak groups integer 4
--internal_standard_type -s Set the internal standards to heavy or light ions. {heavy, light} heavy
GROUPING
--file_group_delimiter -gs The delimiter of filename to define file groups {None, -, _} None
--file_group_suffix_type -gt The suffix character type to define file groups (d for digit number; w for words) {d, w} None
SCORING
--type1_acceptable_threshold -t1a The threshold of acceptable quality of Type 1 quality score. Scores greater than or equal to threshold value are considered as poor quality. float 6.7
--type1_good_threshold -t1b The threshold of good quality of Type 1 quality score. Scores greater or equal to this threshold are considered as good quality. float 7.7
--type2_acceptable_threshold -t2a The threshold of acceptable quality of Type 2 quality score. Scores greater than or equal to threshold value are considered as poor quality. float 5.0
--type2_good_threshold -t2b The threshold of good quality of Type 2 quality score. Scores greater or equal to this threshold are considered as good quality. float 7.2
--type3_acceptable_threshold -t3a The threshold of acceptable quality of Type 3 quality score. Scores greater than or equal to threshold value are considered as poor quality. float 7.901
--type3_good_threshold -t3b The threshold of good quality of Type 3 quality score. Scores greater or equal to this threshold are considered as good quality. float 8.665
CHROMATOGRAM PLOTS
--output_chromatogram_pdf -plot The output chromatogram pdf path to draw chromatogram plots. If set, all chromatograms will be plotted in a PDF file. File path None
--output_mixed_mol -mix If set, chromatogram data for each target will be mixed in one pdf page. (no value) unset
--output_chromatogram_nrow -nrow The number of chromatograms per row in one pdf page integer 6
--output_chromatogram_ncol -ncol The number of chromatograms per columne in one pdf page integer 6
--output_chromatogram_fig_w -figw The figure width in inches integer 30
--output_chromatogram_fig_h -figh The figure height in inches integer 42
--output_chromatogram_dpi -dpi The dpi of chromatogram plots integer 200
--output_chromatogram_threads -pthread Using more threads will consume more memory integer 1

Use tmasque as a module

The tmasque package can also be used as a module to integrate quality scoring functions in Python scripts.

from tmasque import TargetedMSQualityEncoder as TMSQE
chromatogram_tsv = '/path/to/chromatogram.tsv'
peak_boundary_csv = '/path/to/peak_boundary.csv'

tmsqe = TMSQE(chromatogram_tsv, peak_boundary_csv)
tmsqe.run_encoder()

# Get all quality features for all transitions in transition_df and summarized sample quality in sample_quality_df
transition_df, sample_quality_df = tmsqe.summarize_dataset()


# Output quality results in Excel files
output_transition_folder = '/path/to/a/folder'
output_summarized_sample_quality_xlsx = '/path/to/output_sample_quality.xlsx'
tmsqe.output_transition_quality(output_transition_folder)  # Output quality scores for each transition with all the underlying quality features. One excel per target.
tmsqe.output_sample_quality(output_summarized_sample_quality_xlsx) # Output summarized quality scores for each sample.

# Generate all chromatogram plots in a PDF file.
output_chromatogram_pdf = '/path/to/output_chromatogram_plots.pdf'
tmsqe.plot_chromatograms(output_chromatogram_pdf)

For sample grouping

# for sample names like xxxx-01.raw or xxxx-02.raw, the suffix 01 and 02 will be parsed as numbers (1 and 2) and group quality will be summarized in a different sheet in output_summarized_sample_quality_xlsx.
transition_df, sample_quality_df = tmsqe.summarize_dataset(file_group_delimiter='-', file_group_suffix_type='d')
tmsqe.output_transition(output_transition_folder, file_group_delimiter='-', file_group_suffix_type='d')
tmsqe.output_quality(output_summarized_sample_quality_xlsx, file_group_delimiter='-', file_group_suffix_type='d')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmasque-0.8.4.tar.gz (197.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tmasque-0.8.4-py3-none-any.whl (197.5 kB view details)

Uploaded Python 3

File details

Details for the file tmasque-0.8.4.tar.gz.

File metadata

  • Download URL: tmasque-0.8.4.tar.gz
  • Upload date:
  • Size: 197.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for tmasque-0.8.4.tar.gz
Algorithm Hash digest
SHA256 cd962b213d1d6d44b2fbd179fb371b4fa2eff9e5a45c9e8e7caf674c61d292dd
MD5 acd02bb81836c17c61ee7c09ccd38ad0
BLAKE2b-256 6b4c063ccda4b98341a15188136441f547264188ef6866d9fc710326ffc356d9

See more details on using hashes here.

File details

Details for the file tmasque-0.8.4-py3-none-any.whl.

File metadata

  • Download URL: tmasque-0.8.4-py3-none-any.whl
  • Upload date:
  • Size: 197.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for tmasque-0.8.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d4b183301ff002ea3210c2ffb126d691de6023337a70eb22b95f2e170b1aee7f
MD5 46688a7a6ececa2cb03240c2a9c96cd7
BLAKE2b-256 81dba15036859c637f80caed0dbfa30d272f8eaf84118e6c91fd82304cc8716d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page