No project description provided
Project description
TraceGroomer
TraceGroomer is a solution for formatting and normalising Tracer metabolomics given file(s), to produce the .csv files which are ready for DIMet tool.
Currently, three styles of format of Tracer (or Isotope-labeled) metabolomics measurements files are accepted:
- IsoCor results (.tsv measurments file).
- Results provided by the VIB Metabolomics Expertise Center (El-Maven results are shaped by VIB MEC team into a multi-sheet .xlsx file).
- A 'generic' .xlsx measurements file, manually set by the user.
For any type of these supported inputs, TraceGroomer generates an independent file for: i) total metabolite abundances ii) Isotopologues iii) Isotopologues' proportions and iv) mean enrichment (a.k.a fractional contributions).
Automatic formatting is performed, as well as the normalization chosen by the user: whether by the amount of material and/or by an internal standard. Useful advanced options are offered (e.g. if the user has only Isotopologues' absolute values, TraceGroomer can generate all the other measurements files automatically).
Note : this script does not correct for naturally occurring isotopologues. Your data must be already processed by another software that performs such correction.
Requirements
TraceGroomer requires Python 3.10+. Running in a virtual environment is highly recommended.
Install via PyPI:
pip install tracegroomer
After this, the tool is ready to use:
python -m tracegroomer --help
Local install (alternatively)
Alternatively, for installing locally, clone this repository, make sure you have activated your virtual environment with Python 3.10+ (source MY_VIRTUAL_ENV/bin/activate
), with poetry
installed.
Then install dependencies: locate yourself in TraceGroomer
and run
poetry install
After this, the tool is ready to use:
python -m tracegroomer --help
How to use TraceGroomer
Its execution takes only few seconds. Below we explain how to proceed.
Input files
Compulsory files:
-
the measurements file (tsv, csv -tab delimited-, or xlsx)
Description
The measurements file is given by a Metabolomics facility. It is the result of the correction by software such as IsoCor, El-Maven, etc. Some times the file can be further formatted in the Metabolomics facility, before being delivered to the end user, which is the case of VIB MEC delivered files.
TraceGroomer also accepts a "generic" format, see more details in 'generic' type of data.
The user will find here how get examples of IsoCor direct output, VIB MEC file, and generic file. We provide also more details in the sections users having VIB results as input, users having IsoCor results and 'generic' type of data.
-
the metadata file, which describes the experimental setup.
Description and example
The metadata is a tab delimited .csv file provided by the user, which has to contain 6 columns named
name_to_plot
,timepoint
,timenum
,condition
,compartment
,original_name
.Here is the semantics of the columns:
name_to_plot
is the string that will appear on the figures produced by DIMetcondition
is the experimental conditiontimepoint
is the sampling time as it is defined in your experimental setup (it is an arbitary string that can contain non numerical characters)timenum
is the numerical encoding of thetimepoint
compartment
is the name of the cellular compartment for which the measuring has been done (e.g. "endo", "endocellular", "cyto", etc)original_name
contains the column names that are provided in the quantification files
Example:
name_to_plot condition timepoint timenum compartment original_name Cond1 T0 cond1 T0 0 comp_name T0_cond_1 Cond1 T24 cond1 T24 24 comp_name T24_cond_1 Cond2 T0 cond2 T0 0 comp_name T0_cond_2 Cond3 T24 cond2 T24 24 comp_name T24_cond_2 The column
name_to_plot
is not used bytracegroomer
but it will be used by DIMet, so it is practical to set it from the start.Note: You can create this file with any spreadsheet program such as Excel or Google Sheets or LibreOfice Calc. At the moment of saving your file you specify that the delimiter must be a
tab
("Tab delimiter" or similar option depending of your context), see https://support.microsoft.com/en-us/office/save-a-workbook-to-text-format-txt-or-csv-3e9a9d6c-70da-4255-aa28-fcacf1f081e6. -
the configuration file
Description and example
This file contains basic needed information: the name of the metadata file, the names (but not the paths) of the output files, and the absolute path to the output folder.The comments (
#
) serve as guide. The user must fill after the colon of each field:# coments start with # # ----------------------------- # absolute path to output DIRECTORY : groom_out_path : ~/examples_TraceGrommer/data/example-isocor_data metadata: metadata1 # file name, no extension. Must be in the output DIR # names of the sheets in xlsx file that exist (null otherwise) abundances : null # total abundance mean_enrichment : null # mean enrichment isotopologue_proportions : null # isotopologue proportions isotopologues : isotopologuesCorrValues # isotopologue absolute values
As shown in the example of configuration file above:
When the fields
abundances
,mean_enrichment
, and/orisotopologue_proportions
are setnull
, the respective output file will be automatically generated (when possible from existing quantifications).The user will find here how to get examples.
Note: There exist online editors for .yml files, such as https://yamlchecker.com/, just copy-paste and edit!
Facultative files
- the amount of material by sample (tab delimited csv file)
- a file with metabolites to exclude (tab delimited csv file)
The facultative files are used through the command line, which is explained in Advanced options
You must organize your files as follows:
MyProject
├── data
│ ├── dataset1_data
│ │ ├── metadata_1.csv
│ │ └── TRACER_IsoCor_out_example.tsv
└── groom_files
└── dataset1
├── amount_material_weightorcells.csv
└── config-1-groom.yml
This structure is recommended to easily re-use the data
folder for DIMet.
The command line is:
python3 -m tracegroomer --targetedMetabo_path $MEASUREMENTS \
--type_of_file $MY_TYPE_OF_INPUT \
$MY_BASIC_CONFIG
Where :
MEASUREMENTS
is the file that contains the measurements, in absolute path.MY_TYPE_OF_INPUT
corresponds to one of:IsoCor_out_tsv
,VIBMEC_xlsx
,generic_xlsx
We recommend to run a test with the provided examples if this is the first time you use TraceGroomer. Then re-use the organization and the configurations, and modify the command line to be suitable to your data.
Running a test with the provided examples
To perform a test using the examples we provide, please download and uncompress our examples from Zenodo. The structure of the folder is:
examples_TraceGroomer
├── data
│ ├── example-isocor_data
│ │ ├── metadata_1.csv
│ │ └── TRACER_IsoCor_out_example.tsv
│ ├── example-sheet_data
│ │ ├── metadata_3.csv
│ │ └── TRACER_generic_sheet.xlsx
│ └── example-vib_data
│ ├── metadata_2.csv
│ └── TRACER_metabo_2.xlsx
└── groom_files
├── example-isocor
│ ├── amount_material_weightorcells.csv
│ └── config-1-groom.yml
├── example-sheet
│ └── config-3-groom.yml
└── example-vib
├── config-2-groom.yml
├── nbcells-or-amountOfMaterial.csv
└── reject_list.csv
Pick the example most suited to your data:
- IsoCor data (tsv file)
- VIB MEC xlsx file
- or a generic type of xlsx file
Run the script
Note : if the working folder is not the 'home' directory, modify accordingly the absolute paths in the .yml files and in the bash commands.
For IsoCor data:
python3 -m tracegroomer \
--targetedMetabo_path ~/examples_TraceGroomer/data/example-isocor_data/TRACER_IsoCor_out_example.tsv \
--type_of_file IsoCor_out_tsv \
~/examples_TraceGroomer/groom_files/example-isocor/config-1-groom.yml
or, for VIB MEC data:
python3 -m tracegroomer \
--targetedMetabo_path ~/examples_TraceGroomer/data/example-vib_data/TRACER_metabo_2.xlsx \
--type_of_file VIBMEC_xlsx
~/examples_TraceGroomer/groom_files/example-vib/config-2-groom.yml
or, for the generic case:
python3 -m tracegroomer --targetedMetabo_path ~/examples_TraceGroomer/data/example-sheet_data/TRACER_generic_sheet.xlsx \
--type_of_file generic_xlsx
~/examples_TraceGroomer/groom_files/example-sheet/config-3-groom.yml
The output
Thet files are saved in the folder that you specified in the config .yml
file (groom_out_path
field).
The data/[my_dataset]
location is recommended for saving the output.
A total of 4 output files are generated if the absolute isotopologues are provided, otherwise 3 files are generated.
In this way you simply copy the entire data/
content to the folder structure that we want to run with DIMet !
The format of theset files is tab-delimited .csv.
Offered processing by type of input
We have some indications that can slightly differ for users having VIB results as input, users having IsoCor results or users having 'generic' type of data. After consulting the one of your case, please visit Advanced options section for the offered normalisations.
Users having IsoCor results
A short explanation about what TraceGroomer performs automatically as basic formatting:
A typical IsoCor results table is described in: https://isocor.readthedocs.io/en/latest/tutorials.html#output-files It consists of a .tsv file which has in columns the sample, metabolite, isotopologue and all quantifications, and the rows are in piled version (the samples are repeated vertically).
Our script transforms specific columns of that file into tables. As the total metabolite abundance column is not present in the input data, the total abundance per metabolite is the automatic result of the sum, per metabolite, of Isotopologues' absolute values (see AbundanceCorrected
below). So here the correspondances:
column in the IsoCor file | TraceGroomert filename |
---|---|
corrected_area | IsotopologuesAbsolute |
isotopologue_fraction | IsotopologuesProportions |
mean_enrichment | MeanEnrichment |
- | AbundanceCorrected |
We provide the example downloadable from Zenodo (see here)
Users having VIB results
As shown in the example, give the names of the sheets that are present in your excel file coherently in the .yml file.
Our script performs, by default:
- the subtraction of the means of the blanks across all metabolites' abundance for each sample.
- seting to NaN the values of abundance that are under the limit of detection (LOD).
- excluding metabolites whose abundance values across all samples are under LOD (excluded then from all tables by default).
- stomping fractions values to be comprised between 0 and 1 (some negative and some superior to 1 values can occur after correction of naturally occurring isotopologues by certain software dedicated to such corrections)
You can modify all those options depending on your needs, they appear as 'optional arguments' in the help menu.
Users having generic data
We have created this option for those formats that are not the other two scenarios, so your data is expectd to be in the form of a .xlsx file with sheets similar as in the provided in example-sheet
:
- this .xlsx file must NOT contain: formulas, symbols accompanying the numeric values, nor special characters.
- each sheet must correspond to one type of quantification, see Notes below.
- the header (first row, with the molecules IDs), and the first column (with the samples) can contain non numeric values.
- the isotopologues names, in the header, must follow the convention
metaboliteID_labelX
: the substring_label
is compulsory and is located between the metabolite name (or identifier)metaboliteID
and the number of marked carbon atomsX
(e.g.glucose6phosphate_label0
).
Important : If you only have Isotopologue Absolute values, but not the other tables: put them as a single named sheet in your .xlsx file, and TraceGroomer automatically generate all the other types of tables for you !
Notes:
- sheets corresponding to isotopologue Proportions (when available) and isotopologue Absolute values (compulsory if the proportions not available) must have isotopologues as columns and samples as rows.
- sheets corresponding to abundance and mean enrichment (when available) must have metabolites as columns and samples as rows.
- the sheets corresponding to isotopologues measurements must be named with a name containing the string "isotopol". The names of the sheets must be unambiguous.
Advanced options:
We provide advanced options for this script, available for all the types of supported input files; check the help:
python -m tracegroomer --help
they appear as 'optional arguments' in the help menu.
You can:
- normalize by the amount of material (number of cells, tissue weight): setting the path to the file in
--amountMaterial_path
option. The file must be like this csv file, and the first column must contain the same names as in metadata 'original_name'. - normalize by an internal standard (present in your data) at choice: using the advanced option
--use_internal_standard
. - remove metabolites
- print a preview of isotopologues values
Note: Advanced options regarding to detection limit (LOD) and blanks will only have effect in the VIB-MEC type of data.
Getting help
For any information or help running DIMet, you can get in touch with:
LICENSE MIT
Copyright (c) 2024
Johanna Galvis (1,2) deisy-johanna.galvis-rodriguez@u-bordeaux.fr
Benjamin Dartigues (2) benjamin.dartigues@u-bordeaux.fr
Slim Karkar (1,2) slim.karkar@u-bordeaux.fr
Helge Hecht (3,5) helge.hecht@recetox.muni.cz
Bjorn Gruening (4,5) bjoern.gruening@gmail.com
Macha Nikolski (1,2) macha.nikolski@u-bordeaux.fr
(1) CNRS, IBGC - University of Bordeaux,
1, rue Camille Saint-Saens, Bordeaux, France
(2) CBiB - University of Bordeaux,
146, rue Leo Saignat, Bordeaux, France
(3) RECETOX
Faculty of Science, Masaryk University, Kotlářksá 2, 611 37 Brno, Czech Republic
(4) University of Freiburg,
Freiburg, Germany
(5) Galaxy Europe
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tracegroomer-0.1.1.tar.gz
.
File metadata
- Download URL: tracegroomer-0.1.1.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.0 Linux/6.2.0-1019-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87728daf011976062eabb7967b3b286d7905b4cd5b92387bc1b3f07abfd6cf16 |
|
MD5 | 6108fd3ac06aea9b57c8d2752f69a072 |
|
BLAKE2b-256 | f0e20a3557a8ef16123f603affa2052f49ea06dfc50e8a13fb8af64206f5945c |
Provenance
File details
Details for the file tracegroomer-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: tracegroomer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.0 Linux/6.2.0-1019-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 037a9dddecdcfc2d5ccca3b456c583e3be3fff8143b906d3a23a8d3512de9a4c |
|
MD5 | 93606c2e3d7f8bd2e77699001f898af4 |
|
BLAKE2b-256 | a6d533e71d6b1884030c5cb879f569f2c0358fd4efad1b4ae30e8bee66b460c9 |