Skip to main content

No project description provided

Project description

TraceGroomer

TraceGroomer is a command line solution for formatting and normalising Tracer metabolomics given file(s), to produce the .csv files which are ready for DIMet tool.

Currently, three styles of format of Tracer (or Isotope-labeled) metabolomics measurements files are accepted:

  1. IsoCor results (.tsv measurments file).
  2. Results provided by the VIB Metabolomics Expertise Center (El-Maven results are shaped by VIB MEC team into a multi-sheet .xlsx file).
  3. A 'generic' .xlsx measurements file.

For any type of these supported inputs, TraceGroomer generates an independent file for: i) total metabolite abundances ii) Isotopologues iii) Isotopologues' proportions and iv) mean enrichment (a.k.a fractional contributions).

Automatic formatting is performed, as well as the normalization chosen by the user: whether by the amount of material and/or by an internal standard. Useful advanced options are offered (e.g. if the user has only Isotopologues' absolute values, TraceGroomer can generate all the other measurements files automatically).

Note : this script does not correct for naturally occurring isotopologues. Your data must be already processed by another software that performs such correction.


Requirements

TraceGroomer requires Python 3.10+ installed or as a virtual environment.

For installing locally, clone this repository, make sure you have activated your virtual environment with Python 3.10+ (source MY_VIRTUAL_ENV/bin/activate), with poetry installed.

Then install dependencies: locate yourself in TraceGroomer and run

poetry install

After this the tool is ready to use:

python -m tracegroomer --help

How to use TraceGroomer

Its execution takes only few seconds. Below we explain how to proceed.

Input files

Compulsory files:

  • the measurements file (tsv, csv -tab delimited-, or xlsx)

    Description

    The measurements file is given by a Metabolomics facility. It is the result of the correction by software such as IsoCor, El-Maven, etc. Some times the file can be further formatted in the Metabolomics facility, before being delivered to the end user, which is the case of VIB MEC delivered files.

    TraceGroomer also accepts a "generic" format, see more details in 'generic' type of data.

    The user will find here how get examples of IsoCor direct output, VIB MEC file, and generic file. We provide also more details in the sections users having VIB results as input, users having IsoCor results and 'generic' type of data.

  • the metadata file, which describes the experimental setup.

    Description and example

    The metadata is a tab delimited .csv file provided by the user, which has to contain 6 columns named name_to_plot, timepoint, timenum, condition, compartment, original_name.

    Here is the semantics of the columns:

    • name_to_plot is the string that will appear on the figures produced by DIMet
    • condition is the experimental condition
    • timepoint is the sampling time as it is defined in your experimental setup (it is an arbitary string that can contain non numerical characters)
    • timenum is the numerical encoding of the timepoint
    • compartment is the name of the cellular compartment for which the measuring has been done (e.g. "endo", "endocellular", "cyto", etc)
    • original_name contains the column names that are provided in the quantification files

    Example:

    name_to_plot condition timepoint timenum compartment original_name
    Cond1 T0 cond1 T0 0 comp_name T0_cond_1
    Cond1 T24 cond1 T24 24 comp_name T24_cond_1
    Cond2 T0 cond2 T0 0 comp_name T0_cond_2
    Cond3 T24 cond2 T24 24 comp_name T24_cond_2

    The column name_to_plot is not used by tracegroomer but it will be used by DIMet, so it is practical to set it from the start.

    Note: You can create this file with any spreadsheet program such as Excel or Google Sheets or LibreOfice Calc. At the moment of saving your file you specify that the delimiter must be a tab ("Tab delimiter" or similar option depending of your context), see https://support.microsoft.com/en-us/office/save-a-workbook-to-text-format-txt-or-csv-3e9a9d6c-70da-4255-aa28-fcacf1f081e6.

  • the configuration file

    Description and example This file contains basic needed information: the name of the metadata file, the names (but not the paths) of the output files, and the absolute path to the output folder.

    The comments (#) serve as guide. The user must fill after the colon of each field:

    # coments start with #
    # -----------------------------
    
    # absolute path to output DIRECTORY : 
    groom_out_path :  ~/examples_TraceGrommer/data/example-isocor_data 
    
    metadata: metadata1   # file name, no extension. Must be in the output DIR 
    # names of the sheets in xlsx file that exist (null otherwise) 
    abundances : null  # total abundance
    mean_enrichment : null  # mean enrichment
    isotopologue_proportions : null  # isotopologue proportions
    isotopologues : isotopologuesCorrValues  # isotopologue absolute values  
    
    

    When the fields abundances, mean_enrichment, and/or isotopologue_proportions are set null (as shown in the example above), the respective output file will be automatically generated (when possible from existing quantifications).

    The user will find here how to get examples.

    Note: There exist online editors for .yml files, such as https://yamlchecker.com/, just copy-paste and edit!

Facultative files

  • the amount of material by sample (tab delimited csv file)
  • a file with metabolites to exclude (tab delimited csv file)

The facultative files are used through the command line, which is explained in Advanced options

You must organize your files as follows:

MyProject
├── data
│   ├── dataset1_data
│   │   ├── metadata_1.csv
│   │   └── TRACER_IsoCor_out_example.tsv
└── groom_files
    └── dataset1
              ├── amount_material_weightorcells.csv
              └── config-1-groom.yml

This structure is recommended to easily re-use the data folder for DIMet.

The generic command line is:

python3 -m tracegroomer --targetedMetabo_path $MEASUREMENTS \
    --type_of_file $MY_TYPE_OF_INPUT \
    $MY_BASIC_CONFIG

Where :

  • MEASUREMENTS is the file that contains the measurements, in absolute path.
  • MY_TYPE_OF_INPUT corresponds to one of: IsoCor_out_tsv, VIBMEC_xlsx, generic_xlsx

We recommend to run a test with the provided examples if this is the first time you use TraceGroomer. Then re-use the organization and the configurations, and modify the command line to be suitable to your data.


Running a test with the provided examples

To perform a test using the examples we provide, please download and uncompress our example from Zenodo. The structure of the folder is:

examples_TraceGroomer
├── data
│   ├── example-isocor_data
│   │   ├── metadata_1.csv
│   │   └── TRACER_IsoCor_out_example.tsv
│   ├── example-sheet_data
│   │   ├── metadata_3.csv
│   │   └── TRACER_generic_custom.xlsx
│   └── example-vib_data
│       ├── metadata_2.csv
│       └── TRACER_metabo_2.xlsx
└── groom_files
    ├── example-isocor
    │   ├── amount_material_weightorcells.csv
    │   └── config-1-groom.yml
    ├── example-sheet
    │   └── config-3-groom.yml
    └── example-vib
        ├── config-2-groom.yml
        ├── nbcells-or-amountOfMaterial.csv
        └── reject_list.csv

Pick the example most suited to your data:

  1. IsoCor output (tsv file)
  2. VIB MEC xlsx file
  3. or a generic type of xlsx file

Run the script

Note : if the working folder is not the 'home' directory, modify accordingly the absolute paths in the .yml files and in the bash commands. locate yourself in ``, then run:

For IsoCor case:

python3 -m tracegroomer \
   --targetedMetabo_path ~/examples_TraceGroomer/data/example-isocor_data/TRACER_IsoCor_out_example.tsv \
   --type_of_file IsoCor_out_tsv \
   ~/examples_TraceGroomer/groom_files/example-isocor/config-1-groom.yml

or, for VIB MEC case:

python3 -m tracegroomer \
  --targetedMetabo_path ~/examples_TraceGroomer/data/example-vib_data/TRACER_metabo_2.xlsx \
  --type_of_file VIBMEC_xlsx 
  ~/examples_TraceGroomer/groom_files/example-vib/config-2-groom.yml

or, for generic case:

python3 -m tracegroomer --targetedMetabo_path ~/examples_TraceGroomer/data/example-sheet_data/TRACER_generic_sheet.xlsx \
   --type_of_file generic_xlsx 
   ~/examples_TraceGroomer/groom_files/example-sheet/config-3-groom.yml

The output

The output files are saved in the folder that
you specified in the config .yml file (groom_out_path field). The data/[my_dataset] location is recommended. A total of 4 output files are generated if the absolute isotopologues are provided, otherwise 3 files are generated.

In this way you simply copy the entire data/ content to the folder structure that we want to run with DIMet !

The format of these output files is tab-delimited .csv.


Advanced options

We provide advanced options for this script, check the help:

python -m tracegroomer --help

they appear as 'optional arguments' in the help menu.

You can:

  • normalize by the amount of material (number of cells, tissue weight), and/or
  • normalize by an internal standard (present in your data) at choice

We have some indications that can slightly differ for users having VIB results as input, users having IsoCor results or users having 'generic' type of data. After consulting the one of your case, please visit Advanced options section for the offered normalisations.

Users having IsoCor results

Before explaining the advanced options for this kind of data, a short explanation about what TraceGroomer performs automatically as basic formatting:

A typical IsoCor results table is described in: https://isocor.readthedocs.io/en/latest/tutorials.html#output-files It consists of a .tsv file which has in columns the sample, metabolite, isotopologue and all quantifications, and the rows are in piled version (the samples are repeated vertically).

Our script transforms specific columns of that file into tables. As the total metabolite abundance column is not present in the input data, the total abundance per metabolite is the automatic result of the sum, per metabolite, of Isotopologues' absolute values (see AbundanceCorrected below). So here the correspondances:

column in the IsoCor file TraceGroomer output filename
corrected_area IsotopologuesAbsolute
isotopologue_fraction IsotopologuesProportions
mean_enrichment MeanEnrichment
- AbundanceCorrected

We provide the example downloadable from Zenodo (see here)

Advanced options regarding to detection limit (LOD) and blanks will not have any effect on the IsoCor type of data: LOD is not provided in the data, and the same is true for blanks.

All the other advanced options do have effect: those related to internal standard, amount of material, and isotopologues.

Users having VIB results

As shown in the example, give the names of the sheets that are present in your excel file coherently in the .yml file.

Our script performs, by default:

  • the subtraction of the means of the blanks across all metabolites' abundance for each sample.
  • seting to NaN the values of abundance that are under the limit of detection (LOD).
  • excluding metabolites whose abundance values across all samples are under LOD (excluded then from all tables by default).
  • stomping fractions values to be comprised between 0 and 1 (some negative and some superior to 1 values can occur after correction of naturally occurring isotopologues by certain software dedicated to such corrections)

You can modify all those options depending on your needs, they appear as 'optional arguments' in the help menu.

Users having generic data

We have created this option for those formats that are not the other two scenarios, so your data is expectd to be in the form of a .xlsx file with sheets similar as in the provided in example-sheet:

  • this .xlsx file must NOT contain: formulas, symbols accompanying the numeric values, nor special characters.
  • each sheet must correspond to one type of quantification, see Notes below.
  • the header (first row, with the molecules IDs), and the first column (with the samples) can contain non numeric values.
  • the isotopologues names, in the header, must follow the convention metaboliteID_labelX: the substring _label is compulsory and is located between the metabolite name (or identifier) metaboliteID and the number of marked carbon atoms X (e.g. glucose6phosphate_label0).

Important : If you only have Isotopologue Absolute values, but not the other tables: put them as a single named sheet in your .xlsx file, and TraceGroomer automatically generate all the other types of tables for you !

Notes:

  • sheets corresponding to isotopologue Proportions (when available) and isotopologue Absolute values (compulsory if the proportions not available) must have isotopologues as columns and samples as rows.
  • sheets corresponding to abundance and mean enrichment (when available) must have metabolites as columns and samples as rows.
  • the sheets corresponding to isotopologues measurements must be named with a name containing the string "isotopol". The names of the sheets must be unambiguous.

Advanced options:

Available for all the types of supported input files:

  • normalize by the amount of material (number of cells, tissue weight): setting the path to the file in --amountMaterial_path option. The file must be like this csv file, and the first column must contain the same names as in metadata 'original_name'.
  • normalize by an internal standard (present in your data) at choice: using the advanced option --use_internal_standard.
  • remove metabolites
  • print a preview of isotopologues values

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tracegroomer-0.1.0.tar.gz (23.2 kB view hashes)

Uploaded Source

Built Distribution

tracegroomer-0.1.0-py3-none-any.whl (20.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page