Skip to main content

A powerful GC/LC-HRMS data analysis tool

Project description

PyHRMS: Tools For working with High Resolution Mass Spectrometry (HRMS) data in Environmental Science

PyHRMS is a Python package designed to process high-resolution mass spectrometry data that is coupled with gas chromatography (GC) or liquid chromatography (LC). Its primary objective is to provide scientists with a user-friendly tool that can be used to read, process, and visualize LC/GC-HRMS data.

By utilizing PyHRMS, users can more easily analyze complex data sets, resulting in a more efficient and streamlined research process. Whether working with GC or LC coupled HRMS data, PyHRMS is a reliable and effective solution that can help researchers to achieve their scientific goals.

Contributer: Rui Wang

First release date: Nov.15.2021

Update

Mar.23.2023: pyhrms 0.5.5 new features:

  • add sat_intensity functions paramter in peak_picking fuction. The saturation intensity refers to the point where the intensity of an m/z value becomes so high that it may no longer be accurate. In such cases, the retention time can be adjusted to bring the intensity below the saturation intensity, thereby ensuring accurate measurement of the m/z value.

pyhrms can be installed and import as following:

pip install pyhrms

If you just want to update a new version, please update as following:

pip install pyhrms -U

pyhrms requires major dependencies:

  • numpy>=1.19.2

  • pandas>1.3.3

  • matplotlib>=3.3.2

  • pymzml>=2.4.7

  • scipy>=1.6.2

  • molmass>=2021.6.18

  • tqdm>=4.62.3

  • openpyxl>=3.0.9

  • networkx>=2.6.3

  • scikit-learn>=1.0.2

Features

PyHRMS provides following functions:

  • Read raw LC/GC-HRMS data in mzML format;

  • Powerful and accurate peak picking function for LC/GC HRMS;

  • retention time (rt) and mass over Z stands for charge number of ions (m/z) will be aligned based on user defined error range.

  • Accurate function for comparing response between/among two or more samples;

  • Covert profile data to centroid

  • Parallel computing to improve efficiency;

  • Interactive visualizations of raw mzML data;

  • Supporting searching for Local database and massbank;

  • MS quality evaluation for ms data in profile.

  • Can process SWATH data.

Paper Published Utilizing PyHRMS

  • Jiang, X., Xue, Z., Chen, W., Xu, M., Liu, H., Liang, J., Zhang, L., Sun, Y., Liu, C., Yang, X., 2023. Biotransformation kinetics and pathways of typical synthetic progestins in soil microcosms. Journal of Hazardous Materials 446, 130684. https://doi.org/10.1016/j.jhazmat.2022.130684

  • Liang, J., Wang, R., Liu, H., Xie, D., Tao, X., Zhou, J., Yin, H., Dang, Z., Lu, G., 2022. Unintentional formation of mixed chloro-bromo diphenyl ethers (PBCDEs), dibenzo-p-dioxins and dibenzofurans (PBCDD/Fs) from pyrolysis of polybrominated diphenyl ethers (PBDEs). Chemosphere 308, 136246. https://doi.org/10.1016/j.chemosphere.2022.136246

  • Xia, D., Liu, H., Lu, Y., Liu, Y., Liang, J., Xie, D., Lu, G., Qiu, J., Wang, R., 2023. Utility of a non-target screening method to explore the chlorination of similar sulfonamide antibiotics: Pathways and N Cl intermediates. Science of The Total Environment 858, 160042. https://doi.org/10.1016/j.scitotenv.2022.160042

  • Yang, X., Wang, R., He, Z., 2023. Abiotic transformation of synthetic progestins in representative soil mineral suspension. Journal of Environmental Science 127, 375-388. https://doi.org/10.1016/j.jes.2022.06.007

Licensing

The package is open source and can be utilized under MIT license. Please find the detail in licence file.

PyHRMS documentation

I want starting using PyHRMS

from pyhrms import pyhrms as pms

Project structure:

pyhrms/
1. Basic functions
==================
|- multi_process/
   |- first_process
      |- sep_scans
      |- gen_df
      |- peak_picking
          |- peak_finding
          |- evaluate_ms
              |- target_spec
              |- spec_at_rt
              |- interpolate_series
          |- find_locators
          |- cal_bg
          |- isotope_distribution
      |- split_peak_picking
      |- remove_unnamed_columns
      |- identify_isotopes
   |- peak_alignment
      |- gen_ref
   |- second_process
      |- peak_checking_area
      |- peak_checking_area_split
   |- fold_change_filter
      |- concat_alignment
   |- gen_DDA_ms2_df
      |- ms_to_centroid
|- multi_process_database_matching
  |- database_match
      |- ms2_matching
          |- ms2_matching
              |- compare_frag
      |- rt_matching
|- parent_tp_analysis
|- post_filter
|- remove_adducts_all
  |- remove_adducts
|- summarize_results
|- summarized_results_concat
|- summarize_pos_neg_result
|- final_result_filter
|- isotope_matching
  |- formula_to_distribution


2. Swath data processing
=========================
|- swath_process
    |- split_peak_picking_swath
    |- swath_frag_extract
    |- swath_frag_raw
    |- extract

3. Omics functions
==================
|- omics_final_area
|- omics_index_dict
|- omics_filter
|- map_values
|- PCA_analysis
|- omics_cmp_numbers
|- omics_cmp_total_area
|- omics_correcting_area
|- check_istd_quality
|- KMD_cal

4. FT-ICRMS data processing
===========================
|- FT_ICRMS_process
  |- gen_possible_formula
  |- frag_correction
      |- formula_prediction
          |- append_list
  |- formula_sep

5. other functions
==================
|- get_ms2_from_DDA
|- extract_tic
|- ms_bg_removal
|- JsonToExcel
|- suspect_list_matching
|- rename_files
|- Calibration

Table of Content

  1. Quick start

  • Feature prioritization : multi_process()

  • Database matching : multi_process_database_matching()

  • Result filtering : post_filter()

  • Result summarizing : summarize_results()

  • Combining results of all samples : summarized_results_concat2()

  • Combining results of pos & neg : summarize_pos_neg_result()

1. Quick start

1.1 Feature prioritization:

This function primarily includes peak picking, peak alignment, and blank comparison to prioritize features that are unique to the sample compared to the blank.To ensure that the program distinguishes between the sample set and the control set, include the strings ‘methanol’, ‘blank’, and ‘control’ in your control set files, and exclude these strings from your sample set files.

path = '../Users/Desktop/my_HRMS_files'
company = 'Waters'
pms.multi_process(path, company, profile=True, processors=1, p_value=1, ms2_analysis=True, fold_change=1,
              area_threshold=200, filter_type=3)

The output file will have the suffix ‘_unique_cmps.xlsx’ and will be structured as follows:

new_index

rt

mz

intensity

S/N

area

15.48_241.05

15.5

241.0541

90817

1135.21

53476

10.11_591.32

10.11

591.3243

78236

1738.58

12272

1.2 Database matching

How to create a database using excel?

  • Here is an example template for an Excel database of compounds:

Inchikey

Precursor

Frag

Formula

Smile

Mode

RT

Source

Source info

Inchikey1

211.1109

[117.0459, 92.0506]

C13H13N3

smile1

pos

15.36

massbank

MoNA

Inchikey2

165.0425

[135.0293, 135.0301]

C11H14N4O5

smile2

neg

8.54

massbank

MoNA

After setting up your local database, you can use the following function to match compounds and generate output files with the suffix “_rt_ms2_match.xlsx”.

path = '../Users/Desktop/my_HRMS_files'
database = pd.read_excel(r'..//Users/Desktop/my_database.xlsx')
pms.multi_process_database_matching(path, database, processors=4, ms1_error=50, ms2_error=0.015, rt_error=0.1,
                                mode='pos')

1.3 Result filtering

This function lets users filter results based on criteria such as p-value, fold change, intensity, and area. Any feature with a p-value greater than the user-defined threshold (e.g., 0.05) will be removed from the result dataframe. The filtered result will be automatically exported with a filename suffix “_filter.xlsx”.

path = r'../Users/Desktop/my_HRMS_files/excel_files_need_filter'
pms.post_filter(path, fold_change=5, p_value=0.05, i_threshold=500, area_threshold=500, drop=None)

1.4 Single Result summarizing

The function is designed to collect identified features and ignore unidentified ones, resulting in a dataframe with the relevant information. In order to achieve this, the function requires three input dataframes: a suspect list from the Norman network, an ecotoxicity database from the Norman network, and a compound’s category excel.When the function is used, it will extract the name, smile, CAS number, categories, and toxicity data for each identified feature. This information is then compiled into a new dataframe, which includes only the identified features and their associated data. By using this function, users can easily extract and organize the relevant information for identified features, without having to manually sift through large amounts of data.

df = pd.read_excel(r'../Users/Desktop/my_HRMS_files/sample_rt_ms2_match_filter.xlsx')
result_df = pms.summarize_results(df, db_category, suspect_list, db_toxicity)

How to build a category database?

  • Here is an example template for an category database:

Inchikey

category

AAEJJSZYNKXKSW-UHFFFAOYSA-N

[‘PFAS’]

AAIXLNBYXIVUKR-UHFFFAOYSA-N

[‘PFAS’]

[‘..’,’..’]

1.5 Combining Results from Samples with specific ESI Polarity

The function iterates through all result files with specific ESI polarity (positive or negative) and summarizes the results, generating a new Excel file that contains the summarized information.

path = r'../Users/Desktop/my_HRMS_files/summarized_result')
all_name_index = ['site01','site02','site03','site04',...]
mode = 'pos'
result_df = pms.summarized_results_concat(path, all_name_index, mode)

1.6 Combining results of pos & neg

This function combined positive summarized result and negative summarized results into one final result.

all_df_pos = pms.summarized_results_concat(path_pos, all_name_index, 'pos')
all_df_neg = pms.summarized_results_concat(path_neg, all_name_index, 'neg')
result_df = pms.summarize_pos_neg_result(all_df_pos, all_df_neg)

NOTE

Please note that the documentation is currently a work in progress, and there is more content that is being written. I apologize for any inconvenience this may cause, but rest assured that I am continually updating the documentation to provide you with the most comprehensive guide to using PyHRMS.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhrms-0.5.5.zip (68.2 kB view hashes)

Uploaded Source

Built Distribution

pyhrms-0.5.5-py3-none-any.whl (51.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page