Skip to main content

Automatic analysis of GC-MS data

Project description

PyGCMS

A Python tool to manage multiple GCMS qualitative tables and automatically split chemicals into functional groups.

An open-source Python tool that can automatically:

  • handle multiple GCMS semi-quantitative data tables (derivatized or not)
  • duild a database of all identified compounds and their relevant properties using PubChemPy
  • split each compound into its functional groups using a published fragmentation algorithm
  • apply calibrations and/or semi-calibration using Tanimoto and molecular weight similarities
  • produce single sample reports, comprehensive multi-sample reports and aggregated reports based on functional group mass fractions in the samples
  • provides plotting capabilities

Naming convention for samples

To ensure the code handles replicates of the same sample correctly, names have to follow the convention: name-of-sample-with-dashes-only_replicatenumber

Examples that are correctly handled:

  • Bio-oil-foodwaste-250C_1
  • FW_2

Examples of NON-ACCEPTABLE names are

  • bio_oil_1
  • FW1

Example

A comprehensive example is provided on the GitHub repository to show how inputs should be formatted. To test the module, install the gcms_data_analysis module, download the example folder given in the repository, and run the example_gcms_data_analysis.py. The folder_path needs to be set to where your data folder is.

The example code is shown here for convenience:

import pathlib as plib  # used for the folder_path
from gcms_data_analysis import Project

# you might need to change this to the path were you have the data
folder_path = plib.Path(plib.Path(__file__).cwd(), 'data')

# class methods need to be called at the beginning to influence all instances
Project.set_folder_path(folder_path)  # necessary for every project
Project.set_plot_grid(False)  # to make plots with gridlines
Project.set_plot_font('Sans')  # to use sans font in plots

# initialize project
p = Project()

# load files_info as provided by the user, if not given, create it
# using the GC-MS .txt files in the folder
files_info_0 = p.load_files_info()

# load the provided calibrations as dict, store bool to know if are deriv
calibrations, is_calibr_deriv = p.load_calibrations()
c1, c2 = calibrations['calibration'], calibrations['deriv_calibration']

# load provided classificaiton codes and mass fractions for fun. groups
class_code_frac = p.load_class_code_frac()

# load all GCMS txt files as single files
files0, is_files_deriv0 = p.load_all_files()
f1, f2, f3 = files0['A_1'], files0['Ader_1'], files0['B_1']

# create the list with all compounds in all samples
list_of_all_compounds = p.create_list_of_all_compounds()

# create the list with all derivatized compounds in all samples
list_of_all_deriv_compounds = p.create_list_of_all_deriv_compounds()

if 0: # set to 1 if you want to recreate compounds properties databases
    compounds_properties = p.create_compounds_properties()
    deriv_compounds_properties = p.create_deriv_compounds_properties()
else:  # otherwise load the available one (if unavailable it creates them)
    compounds_properties = p.load_compounds_properties()
    deriv_compounds_properties = p.load_deriv_compounds_properties()

# apply the calibration to all files and store updated files as dict
files, is_files_deriv = p.apply_calibration_to_files()
f11, f22, f33 = files['A_1'], files['Ader_1'], files['B_1']

# compute stats for each file in the files_info df
files_info = p.add_stats_to_files_info()

# create samples_info (ave and std) based on replicate data in files_info
samples_info_0 = p.create_samples_info()

# create samples and samples_std from files and store as dict
samples, samples_std = p.create_samples_from_files()
s1, s2, s3 = samples['A'], samples['Ader'], samples['B']
sd1, sd2, sd3 = samples_std['A'], samples_std['Ader'], samples_std['B']

# add stats to samples_info df
samples_info = p.add_stats_to_samples_info()

# create report (compounds based) for different parameters
rep_files_conc = p.create_files_param_report(param='conc_vial_mg_L')
rep_files_fr= p.create_files_param_report(param='fraction_of_sample_fr')
rep_samples_conc, rep_samples_conc_std = p.create_samples_param_report(param='conc_vial_mg_L')
rep_samples_fr, rep_samples_fr_std = p.create_samples_param_report(param='fraction_of_sample_fr')

# create aggreport (functionl group aggreageted based) for different parameters
agg_files_conc = p.create_files_param_aggrrep(param='conc_vial_mg_L')
agg_files_fr = p.create_files_param_aggrrep(param='fraction_of_sample_fr')
agg_samples_conc, agg_samples_conc_std = p.create_samples_param_aggrrep(param='conc_vial_mg_L')
agg_samples_fr, agg_samples_fr_std = p.create_samples_param_aggrrep(param='fraction_of_sample_fr')

# plot results bases on report
p.plot_ave_std(param='fraction_of_sample_fr', min_y_thresh=0, files_or_samples='files',
    legend_location='outside',
    only_samples_to_plot=['A_1', 'A_2', 'Ader_1', 'Ader_2'], #y_lim=[0, 5000]
            )
# plot results bases on aggreport
p.plot_ave_std(param='fraction_of_sample_fr', aggr=True, files_or_samples='files',
                min_y_thresh=0.01,
    y_lim=[0, .5], color_palette='Set2')

p.plot_ave_std(param='fraction_of_sample_fr', min_y_thresh=0,
    legend_location='outside', only_samples_to_plot=['A', 'Ader'], #y_lim=[0, 5000]
            )
# plot results bases on aggreport
p.plot_ave_std(param='fraction_of_sample_fr', aggr=True, min_y_thresh=0.01,
    y_lim=[0, .5], color_palette='Set2')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcms_data_analysis-0.1.6.tar.gz (33.2 kB view details)

Uploaded Source

Built Distribution

gcms_data_analysis-0.1.6-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file gcms_data_analysis-0.1.6.tar.gz.

File metadata

  • Download URL: gcms_data_analysis-0.1.6.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for gcms_data_analysis-0.1.6.tar.gz
Algorithm Hash digest
SHA256 d9a688eae0816493105c9375d2df7d465598a0bc3f5923ae686ed0af115e9462
MD5 c7201e016f5a23bf2fc630d7f212adcc
BLAKE2b-256 10532e8314cb5b13ae163ef7d1520fcb9dbd8b1322ef36b3160d8ff9f0fd007f

See more details on using hashes here.

File details

Details for the file gcms_data_analysis-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for gcms_data_analysis-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b5aa9bb7393d44549ebe11b4d222e94447b6da6ba21b3597311afb462532bea3
MD5 b1c3f25adf4efafb88e6e0c429c2ab09
BLAKE2b-256 0c049ef91ad89a5556ade88aa73482857ca2319870477824a294598f92ed77e5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page