Automatic analysis of GC-MS data

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

gcms_data_analysis

A Python tool to manage multiple GCMS qualitative tables and automatically split chemicals into functional groups.

An open-source Python tool that can automatically:

handle multiple GCMS semi-quantitative data tables (derivatized or not)
duild a database of all identified compounds and their relevant properties using PubChemPy
split each compound into its functional groups using a published fragmentation algorithm
apply calibrations and/or semi-calibration using Tanimoto and molecular weight similarities
produce single sample reports, comprehensive multi-sample reports and aggregated reports based on functional group mass fractions in the samples
provides plotting capabilities

Naming convention for samples

To ensure the code handles replicates of the same sample correctly, names have to follow the convention: name-of-sample-with-dashes-only_replicatenumber

Examples that are correctly handled:

Bio-oil-foodwaste-250C_1
FW_2

Examples of NON-ACCEPTABLE names are

bio_oil_1
FW1

Example

A comprehensive example is provided on the GitHub repository to show how inputs should be formatted. To test the module, install the gcms_data_analysis module, download the example folder given in the repository, and run the example_gcms_data_analysis.py. The folder_path needs to be set to where your data folder is.

The example code is shown here for convenience:

import pathlib as plib  # used for the folder_path
from gcms_data_analysis import Project

# you might need to change this to the path were you have the data
folder_path = plib.Path(plib.Path(__file__).cwd(), 'data')

# class methods need to be called at the beginning to influence all instances
Project.set_folder_path(folder_path)  # necessary for every project
Project.set_plot_grid(False)  # to make plots with gridlines
Project.set_plot_font('Sans')  # to use sans font in plots

# initialize project
p = Project()

# load files_info as provided by the user, if not given, create it
# using the GC-MS .txt files in the folder
files_info_0 = p.load_files_info()

# load the provided calibrations as dict, store bool to know if are deriv
calibrations, is_calibr_deriv = p.load_calibrations()
c1, c2 = calibrations['calibration'], calibrations['deriv_calibration']

# load provided classificaiton codes and mass fractions for fun. groups
class_code_frac = p.load_class_code_frac()

# load all GCMS txt files as single files
files0, is_files_deriv0 = p.load_all_files()
f1, f2, f3 = files0['A_1'], files0['Ader_1'], files0['B_1']

# create the list with all compounds in all samples
list_of_all_compounds = p.create_list_of_all_compounds()

# create the list with all derivatized compounds in all samples
list_of_all_deriv_compounds = p.create_list_of_all_deriv_compounds()

if 0: # set to 1 if you want to recreate compounds properties databases
    compounds_properties = p.create_compounds_properties()
    deriv_compounds_properties = p.create_deriv_compounds_properties()
else:  # otherwise load the available one (if unavailable it creates them)
    compounds_properties = p.load_compounds_properties()
    deriv_compounds_properties = p.load_deriv_compounds_properties()

# apply the calibration to all files and store updated files as dict
files, is_files_deriv = p.apply_calibration_to_files()
f11, f22, f33 = files['A_1'], files['Ader_1'], files['B_1']

# compute stats for each file in the files_info df
files_info = p.add_stats_to_files_info()

# create samples_info (ave and std) based on replicate data in files_info
samples_info_0 = p.create_samples_info()

# create samples and samples_std from files and store as dict
samples, samples_std = p.create_samples_from_files()
s1, s2, s3 = samples['A'], samples['Ader'], samples['B']
sd1, sd2, sd3 = samples_std['A'], samples_std['Ader'], samples_std['B']

# add stats to samples_info df
samples_info = p.add_stats_to_samples_info()

# create report (compounds based) for different parameters
rep_files_conc = p.create_files_param_report(param='conc_vial_mg_L')
rep_files_fr= p.create_files_param_report(param='fraction_of_sample_fr')
rep_samples_conc, rep_samples_conc_std = p.create_samples_param_report(param='conc_vial_mg_L')
rep_samples_fr, rep_samples_fr_std = p.create_samples_param_report(param='fraction_of_sample_fr')

# create aggreport (functionl group aggreageted based) for different parameters
agg_files_conc = p.create_files_param_aggrrep(param='conc_vial_mg_L')
agg_files_fr = p.create_files_param_aggrrep(param='fraction_of_sample_fr')
agg_samples_conc, agg_samples_conc_std = p.create_samples_param_aggrrep(param='conc_vial_mg_L')
agg_samples_fr, agg_samples_fr_std = p.create_samples_param_aggrrep(param='fraction_of_sample_fr')

# plot results bases on report
p.plot_ave_std(param='fraction_of_sample_fr', min_y_thresh=0, files_or_samples='files',
    legend_location='outside',
    only_samples_to_plot=['A_1', 'A_2', 'Ader_1', 'Ader_2'], #y_lim=[0, 5000]
            )
# plot results bases on aggreport
p.plot_ave_std(param='fraction_of_sample_fr', aggr=True, files_or_samples='files',
                min_y_thresh=0.01,
    y_lim=[0, .5], color_palette='Set2')

p.plot_ave_std(param='fraction_of_sample_fr', min_y_thresh=0,
    legend_location='outside', only_samples_to_plot=['A', 'Ader'], #y_lim=[0, 5000]
            )
# plot results bases on aggreport
p.plot_ave_std(param='fraction_of_sample_fr', aggr=True, min_y_thresh=0.01,
    y_lim=[0, .5], color_palette='Set2')

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.2.0

Mar 29, 2024

1.1.0

Mar 20, 2024

1.0.7

Mar 19, 2024

1.0.6

Mar 19, 2024

1.0.5

Mar 18, 2024

1.0.3

Feb 28, 2024

1.0.2

Feb 27, 2024

1.0.1

Feb 20, 2024

0.1.8

Feb 20, 2024

This version

0.1.7

Feb 16, 2024

0.1.6

Feb 12, 2024

0.1.5

Feb 9, 2024

0.1.4

Feb 9, 2024

0.1.3

Feb 8, 2024

0.1.2

Feb 8, 2024

0.1.1

Feb 8, 2024

0.1.0

Feb 8, 2024

0.0.9

Feb 7, 2024

0.0.8

Feb 7, 2024

0.0.7

Feb 7, 2024

0.0.6

Feb 7, 2024

0.0.5

Feb 7, 2024

0.0.4

Feb 7, 2024

0.0.3

Feb 7, 2024

0.0.2

Feb 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcms_data_analysis-0.1.7.tar.gz (33.2 kB view hashes)

Uploaded Feb 16, 2024 Source

Built Distribution

gcms_data_analysis-0.1.7-py3-none-any.whl (31.3 kB view hashes)

Uploaded Feb 16, 2024 Python 3

Hashes for gcms_data_analysis-0.1.7.tar.gz

Hashes for gcms_data_analysis-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`ddd6cbab40fc17aee785e8f589ddc5270a4aeecbec95e326ffeabad3fb8b35e5`
MD5	`55d26dc2b1227a9e7c0d5f0eb33e1727`
BLAKE2b-256	`d3e4993b2f6cd43e147c8e13ba15b9a4b5efeb86e144313e058b27bd189ca7a0`

Hashes for gcms_data_analysis-0.1.7-py3-none-any.whl

Hashes for gcms_data_analysis-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b1c2ed6b58a1ac92f117ecfebe90c7d2f2b748ff10c4a53891108c031406c202`
MD5	`7f009a01ccf786d12a86e40429027d3d`
BLAKE2b-256	`ee5f70d03100e543a9ca5b60a2bd3a52f6c4adc6a4709306055918fa3dbeb5dc`