Skip to main content

Automatic analysis of GC-MS data

Project description

gcms_data_analysis

A Python tool to manage multiple GCMS qualitative tables and automatically split chemicals into functional groups.

An open-source Python tool that can automatically:

  • handle multiple GCMS semi-quantitative data tables (derivatized or not)
  • duild a database of all identified compounds and their relevant properties using PubChemPy
  • split each compound into its functional groups using a published fragmentation algorithm
  • apply calibrations and/or semi-calibration using Tanimoto and molecular weight similarities
  • produce single sample reports, comprehensive multi-sample reports and aggregated reports based on functional group mass fractions in the samples
  • provides plotting capabilities

Naming convention for samples

To ensure the code handles replicates of the same sample correctly, names have to follow the convention: name-of-sample-with-dashes-only_replicatenumber

Examples that are correctly handled:

  • Bio-oil-foodwaste-250C_1
  • FW_2

Examples of NON-ACCEPTABLE names are

  • bio_oil_1
  • FW1

Example

A comprehensive example is provided on the GitHub repository to show how inputs should be formatted. To test the module, install the gcms_data_analysis module, download the example folder given in the repository, and run the example_gcms_data_analysis.py. The folder_path needs to be set to where your data folder is.

The example code is shown here for convenience:

import pathlib as plib  # used for the folder_path
from gcms_data_analysis import Project

# you might need to change this to the path were you have the data
folder_path = plib.Path(plib.Path(__file__).cwd(), 'data')

# class methods need to be called at the beginning to influence all instances
Project.set_folder_path(folder_path)  # necessary for every project
Project.set_plot_grid(False)  # to make plots with gridlines
Project.set_plot_font('Sans')  # to use sans font in plots

# initialize project
p = Project()

# load files_info as provided by the user, if not given, create it
# using the GC-MS .txt files in the folder
files_info_0 = p.load_files_info()

# load the provided calibrations as dict, store bool to know if are deriv
calibrations, is_calibr_deriv = p.load_calibrations()
c1, c2 = calibrations['calibration'], calibrations['deriv_calibration']

# load provided classificaiton codes and mass fractions for fun. groups
class_code_frac = p.load_class_code_frac()

# load all GCMS txt files as single files
files0, is_files_deriv0 = p.load_all_files()
f1, f2, f3 = files0['A_1'], files0['Ader_1'], files0['B_1']

# create the list with all compounds in all samples
list_of_all_compounds = p.create_list_of_all_compounds()

# create the list with all derivatized compounds in all samples
list_of_all_deriv_compounds = p.create_list_of_all_deriv_compounds()

if 0: # set to 1 if you want to recreate compounds properties databases
    compounds_properties = p.create_compounds_properties()
    deriv_compounds_properties = p.create_deriv_compounds_properties()
else:  # otherwise load the available one (if unavailable it creates them)
    compounds_properties = p.load_compounds_properties()
    deriv_compounds_properties = p.load_deriv_compounds_properties()

# apply the calibration to all files and store updated files as dict
files, is_files_deriv = p.apply_calibration_to_files()
f11, f22, f33 = files['A_1'], files['Ader_1'], files['B_1']

# compute stats for each file in the files_info df
files_info = p.add_stats_to_files_info()

# create samples_info (ave and std) based on replicate data in files_info
samples_info_0 = p.create_samples_info()

# create samples and samples_std from files and store as dict
samples, samples_std = p.create_samples_from_files()
s1, s2, s3 = samples['A'], samples['Ader'], samples['B']
sd1, sd2, sd3 = samples_std['A'], samples_std['Ader'], samples_std['B']

# add stats to samples_info df
samples_info = p.add_stats_to_samples_info()

# create report (compounds based) for different parameters
rep_files_conc = p.create_files_param_report(param='conc_vial_mg_L')
rep_files_fr= p.create_files_param_report(param='fraction_of_sample_fr')
rep_samples_conc, rep_samples_conc_std = p.create_samples_param_report(param='conc_vial_mg_L')
rep_samples_fr, rep_samples_fr_std = p.create_samples_param_report(param='fraction_of_sample_fr')

# create aggreport (functionl group aggreageted based) for different parameters
agg_files_conc = p.create_files_param_aggrrep(param='conc_vial_mg_L')
agg_files_fr = p.create_files_param_aggrrep(param='fraction_of_sample_fr')
agg_samples_conc, agg_samples_conc_std = p.create_samples_param_aggrrep(param='conc_vial_mg_L')
agg_samples_fr, agg_samples_fr_std = p.create_samples_param_aggrrep(param='fraction_of_sample_fr')

# plot results bases on report
p.plot_ave_std(param='fraction_of_sample_fr', min_y_thresh=0, files_or_samples='files',
    legend_location='outside',
    only_samples_to_plot=['A_1', 'A_2', 'Ader_1', 'Ader_2'], #y_lim=[0, 5000]
            )
# plot results bases on aggreport
p.plot_ave_std(param='fraction_of_sample_fr', aggr=True, files_or_samples='files',
                min_y_thresh=0.01,
    y_lim=[0, .5], color_palette='Set2')

p.plot_ave_std(param='fraction_of_sample_fr', min_y_thresh=0,
    legend_location='outside', only_samples_to_plot=['A', 'Ader'], #y_lim=[0, 5000]
            )
# plot results bases on aggreport
p.plot_ave_std(param='fraction_of_sample_fr', aggr=True, min_y_thresh=0.01,
    y_lim=[0, .5], color_palette='Set2')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcms_data_analysis-0.1.7.tar.gz (33.2 kB view details)

Uploaded Source

Built Distribution

gcms_data_analysis-0.1.7-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file gcms_data_analysis-0.1.7.tar.gz.

File metadata

  • Download URL: gcms_data_analysis-0.1.7.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for gcms_data_analysis-0.1.7.tar.gz
Algorithm Hash digest
SHA256 ddd6cbab40fc17aee785e8f589ddc5270a4aeecbec95e326ffeabad3fb8b35e5
MD5 55d26dc2b1227a9e7c0d5f0eb33e1727
BLAKE2b-256 d3e4993b2f6cd43e147c8e13ba15b9a4b5efeb86e144313e058b27bd189ca7a0

See more details on using hashes here.

File details

Details for the file gcms_data_analysis-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for gcms_data_analysis-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b1c2ed6b58a1ac92f117ecfebe90c7d2f2b748ff10c4a53891108c031406c202
MD5 7f009a01ccf786d12a86e40429027d3d
BLAKE2b-256 ee5f70d03100e543a9ca5b60a2bd3a52f6c4adc6a4709306055918fa3dbeb5dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page