A powerful GC/LC-HRMS data analysis tool
Project description
PyHRMS: Tools For working with High Resolution Mass Spectrometry (HRMS) data in Environmental Science
PyHRMS is a Python package designed to process high-resolution mass spectrometry data that is coupled with gas chromatography (GC) or liquid chromatography (LC). Its primary objective is to provide scientists with a user-friendly tool that can be used to read, process, and visualize LC/GC-HRMS data.
By utilizing PyHRMS, users can more easily analyze complex data sets, resulting in a more efficient and streamlined research process. Whether working with GC or LC coupled HRMS data, PyHRMS is a reliable and effective solution that can help researchers to achieve their scientific goals.
Contributer: Rui Wang
First release date: Nov.15.2021
Update
Aug.9.2023: pyhrms 0.6.6 new features:
Added get_chinese_name function
pyhrms can be installed and import as following:
pip install pyhrms
If you just want to update a new version, please update as following:
pip install pyhrms -U
pyhrms requires major dependencies:
numpy>=1.19.2
pandas>1.3.3
matplotlib>=3.3.2
pymzml>=2.4.7
scipy>=1.6.2
molmass>=2021.6.18
tqdm>=4.62.3
openpyxl>=3.0.9
networkx>=2.6.3
scikit-learn>=1.0.2
pyopenms >= 3.0.0
Features
PyHRMS provides following functions:
Read raw LC/GC-HRMS data in mzML format;
Powerful and accurate peak picking function for LC/GC HRMS;
retention time (rt) and mass over Z stands for charge number of ions (m/z) will be aligned based on user defined error range.
Accurate function for comparing response between/among two or more samples;
Covert profile data to centroid
Parallel computing to improve efficiency;
Interactive visualizations of raw mzML data;
Supporting searching for Local database and massbank;
MS quality evaluation for ms data in profile.
Can process SWATH data.
Paper Published Utilizing PyHRMS
Wang, R., Yan, Y., Liu, H., Li, Y., Jin, M., Li, Y., Tao, R., Chen, Q., Wang, X., Zhao, B., Xie, D., 2023. Integrating data dependent and data independent non-target screening methods for monitoring emerging contaminants in the Pearl River of Guangdong Province, China. Science of the Total Environment 891 (2023) 164445. http://dx.doi.org/10.1016/j.scitotenv.2023.164445
Jiang, X., Xue, Z., Chen, W., Xu, M., Liu, H., Liang, J., Zhang, L., Sun, Y., Liu, C., Yang, X., 2023. Biotransformation kinetics and pathways of typical synthetic progestins in soil microcosms. Journal of Hazardous Materials 446, 130684. https://doi.org/10.1016/j.jhazmat.2022.130684
Liang, J., Wang, R., Liu, H., Xie, D., Tao, X., Zhou, J., Yin, H., Dang, Z., Lu, G., 2022. Unintentional formation of mixed chloro-bromo diphenyl ethers (PBCDEs), dibenzo-p-dioxins and dibenzofurans (PBCDD/Fs) from pyrolysis of polybrominated diphenyl ethers (PBDEs). Chemosphere 308, 136246. https://doi.org/10.1016/j.chemosphere.2022.136246
Xia, D., Liu, H., Lu, Y., Liu, Y., Liang, J., Xie, D., Lu, G., Qiu, J., Wang, R., 2023. Utility of a non-target screening method to explore the chlorination of similar sulfonamide antibiotics: Pathways and N Cl intermediates. Science of The Total Environment 858, 160042. https://doi.org/10.1016/j.scitotenv.2022.160042
Yang, X., Wang, R., He, Z., 2023. Abiotic transformation of synthetic progestins in representative soil mineral suspension. Journal of Environmental Science 127, 375-388. https://doi.org/10.1016/j.jes.2022.06.007
Licensing
The package is open source and can be utilized under MIT license. Please find the detail in licence file.
PyHRMS documentation
I want starting using PyHRMS
from pyhrms import pyhrms as pms
Project structure:
pyhrms/
1. Basic functions
==================
|- multi_process/
|- first_process
|- sep_scans
|- gen_df
|- peak_picking
|- peak_finding
|- evaluate_ms
|- target_spec
|- spec_at_rt
|- interpolate_series
|- find_locators
|- cal_bg
|- isotope_distribution
|- split_peak_picking
|- remove_unnamed_columns
|- identify_isotopes
|- peak_alignment
|- gen_ref
|- second_process
|- peak_checking_area
|- peak_checking_area_split
|- fold_change_filter
|- concat_alignment
|- gen_DDA_ms2_df
|- ms_to_centroid
|- multi_process_database_matching
|- database_match
|- ms2_matching
|- ms2_matching
|- compare_frag
|- rt_matching
|- parent_tp_analysis
|- post_filter
|- remove_adducts_all
|- remove_adducts
|- summarize_results
|- summarized_results_concat
|- summarize_pos_neg_result
|- final_result_filter
|- isotope_matching
|- formula_to_distribution
2. Swath data processing
=========================
|- swath_process
|- split_peak_picking_swath
|- swath_frag_extract
|- swath_frag_raw
|- extract
3. Omics functions
==================
|- omics_final_area
|- omics_index_dict
|- omics_filter
|- map_values
|- PCA_analysis
|- omics_cmp_numbers
|- omics_cmp_total_area
|- omics_correcting_area
|- check_istd_quality
|- KMD_cal
4. FT-ICRMS data processing
===========================
|- FT_ICRMS_process
|- gen_possible_formula
|- frag_correction
|- formula_prediction
|- append_list
|- formula_sep
5. Ion mobility mass data processing
==================
|- first_step_for_IMS
|-peak_picking_ion_mobility_DIA1
|-split_peak_picking2
6. other functions
==================
|- get_ms2_from_DDA
|- extract_tic
|- ms_bg_removal
|- JsonToExcel
|- suspect_list_matching
|- rename_files
|- Calibration
|- get_frag_DIA
|- get_chinese_name
|- AIF_multi_ce
|- pubchem_search
|- draw_pie_chart
Table of Content
Quick start
Feature prioritization : multi_process()
Database matching : multi_process_database_matching()
Result filtering : post_filter()
Result summarizing : summarize_results()
Combining results of all samples : summarized_results_concat2()
Combining results of pos & neg : summarize_pos_neg_result()
1. Quick start
1.1 Feature prioritization:
This function primarily includes peak picking, peak alignment, and blank comparison to prioritize features that are unique to the sample compared to the blank.To ensure that the program distinguishes between the sample set and the control set, include the strings ‘methanol’, ‘blank’, and ‘control’ in your control set files, and exclude these strings from your sample set files.
path = '../Users/Desktop/my_HRMS_files'
company = 'Waters'
pms.multi_process(path, company, profile=True, control_group=['lab_blank', 'methanol'], processors=1, ms2_analysis=True,
area_threshold=200, filter_type=2)
The output file will have the suffix ‘_unique_cmps.xlsx’ and will be structured as follows:
new_index |
rt |
mz |
intensity |
S/N |
area |
… |
---|---|---|---|---|---|---|
15.48_241.05 |
15.5 |
241.0541 |
90817 |
1135.21 |
53476 |
… |
10.11_591.32 |
10.11 |
591.3243 |
78236 |
1738.58 |
12272 |
… |
… |
… |
… |
… |
… |
… |
… |
1.2 Database matching
How to create a database using excel?
Here is an example template for an Excel database of compounds:
Inchikey |
Precursor |
Frag |
Formula |
Smile |
Mode |
RT |
Source |
Source info |
---|---|---|---|---|---|---|---|---|
Inchikey1 |
211.1109 |
[117.0459, 92.0506] |
C13H13N3 |
smile1 |
pos |
15.36 |
massbank |
MoNA |
Inchikey2 |
165.0425 |
[135.0293, 135.0301] |
C11H14N4O5 |
smile2 |
neg |
8.54 |
massbank |
MoNA |
… |
… |
… |
… |
… |
… |
… |
… |
… |
After setting up your local database, you can use the following function to match compounds and generate output files with the suffix “_rt_ms2_match.xlsx”.
path = '../Users/Desktop/my_HRMS_files'
database = pd.read_excel(r'..//Users/Desktop/my_database.xlsx')
pms.multi_process_database_matching(path, database, processors=4, ms1_error=50, ms2_error=0.015, rt_error=0.1,
mode='pos')
1.3 Result filtering
This function lets users filter results based on criteria such as p-value, fold change, intensity, and area. Any feature with a p-value greater than the user-defined threshold (e.g., 0.05) will be removed from the result dataframe. The filtered result will be automatically exported with a filename suffix “_filter.xlsx”.
path = r'../Users/Desktop/my_HRMS_files/excel_files_need_filter'
pms.post_filter(path, fold_change=5, p_value=0.05, i_threshold=500, area_threshold=500, drop=None)
1.4 Single Result summarizing
The function is designed to collect identified features and ignore unidentified ones, resulting in a dataframe with the relevant information. In order to achieve this, the function requires three input dataframes: a suspect list from the Norman network, an ecotoxicity database from the Norman network, and a compound’s category excel.When the function is used, it will extract the name, smile, CAS number, categories, and toxicity data for each identified feature. This information is then compiled into a new dataframe, which includes only the identified features and their associated data. By using this function, users can easily extract and organize the relevant information for identified features, without having to manually sift through large amounts of data.
df = pd.read_excel(r'../Users/Desktop/my_HRMS_files/sample_rt_ms2_match_filter.xlsx')
result_df = pms.summarize_results(df, db_category, suspect_list, db_toxicity)
How to build a category database?
Here is an example template for an category database:
Inchikey |
category |
---|---|
AAEJJSZYNKXKSW-UHFFFAOYSA-N |
[‘PFAS’] |
AAIXLNBYXIVUKR-UHFFFAOYSA-N |
[‘PFAS’] |
… |
[‘..’,’..’] |
1.5 Combining Results from Samples with specific ESI Polarity
The function iterates through all result files with specific ESI polarity (positive or negative) and summarizes the results, generating a new Excel file that contains the summarized information.
path = r'../Users/Desktop/my_HRMS_files/summarized_result')
all_name_index = ['site01','site02','site03','site04',...]
mode = 'pos'
result_df = pms.summarized_results_concat(path, all_name_index, mode)
1.6 Combining results of pos & neg
This function combined positive summarized result and negative summarized results into one final result.
all_df_pos = pms.summarized_results_concat(path_pos, all_name_index, 'pos')
all_df_neg = pms.summarized_results_concat(path_neg, all_name_index, 'neg')
result_df = pms.summarize_pos_neg_result(all_df_pos, all_df_neg)
NOTE
Please note that the documentation is currently a work in progress, and there is more content that is being written. I apologize for any inconvenience this may cause, but rest assured that I am continually updating the documentation to provide you with the most comprehensive guide to using PyHRMS.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.