Lipid Trafficking Analysis
Project description
Lipid Traffic Analysis
aka LTA, aka LipidTA
A python commandline interface for analysing lipidomics data.
The source code lives on github.
The documentation lives at ReadTheDocs.
The project can be installed from PyPI.
Abstract
Lipid Traffic Analysis (LTA) is a tool for using lipidomics data to test hypotheses about how metabolism is controlled. Lipidomics data from several, metabolically connected tissues from control and experimental groups can be used to plot the spatial or temporal distribution of lipids. These distributions identify where changes in lipid metabolism occur and in which lipid pathways, indicating the locus and biochemical alterations that occur in a given phenotype. LTA was conceived in two parts. One is an Abundance Analysis, in which the error-normalised fold change (ENFC) for the control and given phenotype group is calculated. Because the ratio of the control and experimental values is scaled by the error, the ENFCs are easy to plot and compare between compartments. The second part is a Switch Analysis. This computes the presence of variables across the network. Current development is focused on developing the technique for complex networks and on the rate of lipid transport.
Using LTA from the command line
Installation
Installing from PyPI
This is the most straightforward way to set up the tool. When installing from PyPI, we strongly reccomend using a virtual environment. There are many ways to do this! If you already have a preferred method - I use pipx for command line tools - feel free to use that. Otherwise, use the builtin Python module venv. The exact instructions are OS-specific and detailed at the above link. Instructions for installing the most recent version of LTA on MacOS are given below:
# Make a directory for the project
mkdir lta && cd lta
# Create the virtual environment
python3 -m venv .venv
# Activate the environment
source .venv/bin/activate
# Install lta
pip install -U LipidTA
Our pip package is `LipidTA`.
Unfortunately,
`lta` was "too similar to existing package names",
so PyPi wouldn't let us use it.
If you want to install a specific version, then change the last line in the previous code block to:
pip install LipidTA==0.12.1
replacing the version number with the version number you want. A list of all released versions can be found at our tags.
Installing from Source
Most users **will not need** these instructions.
If you need to customise the code in some manner, you'll need to install from source. To do that, either clone the repository from github, or download one of our releases. For full instructions, please see our guide on contributing.
(data)=
The Data
This should be a single CSV files where the first 11 rows contain sample metadata and the first 3 columns contain the lipid metadata. Within the sample metadata, rows 4-9 should contain the:
- Mode (ie. -ve vs +ve)
- Sample ID
- Phenotype (ie. lean vs obese)
- Generation (ie. F1 vs F2)
- Tissue (ie. heart)
- Handling (any notes about sample prep)
respectively.
You can name these metadata rows whatever you want,
and tell lta
where to find them with the appropriate flags.
Please see the section on customising your run.
In order to read the data,
some assumptions about the format must be made.
Should we make any changes to data format expectations,
it will be well documented and will only occur in a major/breaking releas.
We hope to generalise file reading in future releases to improve usability
in a future release.
Running the analysis
Once you've installed the tool and activated your virtual environment, running the analysis is as simple as:
lta data.csv results
The first argument is path to the combined input file. If the file doesn't exist, is a directory, or doesn't contain any data files, the command will error with an apropriate message. The secont argument identifies a folder in which the results will be saved. It will be create if it doesn't exist.
If you ever have any questions about the tool, you can access a condensed help menu by running:
lta -h
(customising)=
Customising
There are a few options that can be customised for any given run.
The statistics are calculated using a bootstrapping approach,
which (by definition) involves repeated replicates.
To control the number of replicates,
pass the -b/--boot-reps
flag with a number.
Generally, more reps improves the accuracy of the estimates,
though I find little improvement beyond 20,000 reps.
1000 (the default number) seems to provide a good balance between speed and accuracy.
A critical step of the analysis is binarizing the lipid expression.
A lipid is classed as 0 in a tissue/condition if
the lipid is not detected in more than a particular fraction of samples.
The default values is 0.2 (one-fifth of the samples).
If you want to change it,
pass the -t/--threshold
flag with a decimal between 0 and 1.
This value can have a significant impact on the analysis,
so explore how it impacts your data!
Many calculations are dependent on knowing where certain metadata is stored.
Namely, the experimental conditions (specified with --phenotype
)
the tissue of origin (specified with --tissue
),
and the lipidomics mode (specified with --mode
).
If these are not passed,
then they default to "Phenotype", "Tissue", and "Mode" respectively.
Please the section on expected data file structure for more information.
For the fold-change calculation in ENFC to make any sense,
we need to know which group in phenotype
is which.
You can specify this using the --order
option like so:
lta data results --order obese lean
The first word following order will be treated as the experimental group,
while the second word will be treated as the control group.
In this example then,
fold-change would be give as obese / lean
.
If you don't specify,
this defaults to experimental / control
.
If you find yourself regularly passing arguments via the CLI, you might want to try a configuration file! This is a simple text file that stores options in a simple format:
option=value
By default,
LTA looks for lta_conf.txt
in your current directory.
However,
you can name this file whatever you want,
and let LTA know where to find it,
by passing the config flag like so:
lta -c path/to/your/config.txt data results
If you specify an option in the configuration file, that will override LTA's defaults, and specifying an option at the command line will override the configuration file! The config file doesn't need do exist, however, and is just a bit of sugar.
The Output
Re-running the analysis overwrites existing results,
so be sure to either back up your data,
or pass a different output folder!
The output folder will contain 5 file. For each type of lipid, you should see the following:
enfc_individual_lipids.csv
- the ENFC results for each lipid.enfc_lipid_classes.csv
- the mean and St.Dev. of ENFC, grouped by lipid class.switch_individual_lipid.csv
- a table of lipids and their A/B/U/N classification.switch_lipid_classes.csv
- a table counting the frequency of each lipid class within the A/B/U/N classification.jaccard_similarity.csv
- the Jaccard similarity and p-value for each lipid class.
A few notes!
Fold change will always be order[0] / order[1]
.
The Jaccard distances are calculated between conditions specified in --phenotype
across both tissues and lipid classes.
The p-values for these distances are calculated using the method outlined by
N. Chung, et. al..
For ENFC,
fold change is only meaningful if both values are non-0.
Where this is not true,
NaN has been substituted,
leaving an empty cell in the output.
Contributing
Open-source software is only open-source becaues of the excellent community, so we welcome any and all contributions! If you think you have found a bug, please log a report in our issues. If you think you can fix a bug, or have an idea for a new feature, please see our guide on contributing for more information on how to get started! While here, we request that you follow our code of conduct to help maintain a welcoming, respectful environment.
Future Developments
- Improve Github actions to use caching for poetry and Nox
- Increase test coverage
- Automate plotting
Citations
If you use LTA in your work, please cite the following manuscripts:
- Furse, S., Watkins, A.J., Hojat, N. et al. Lipid Traffic Analysis reveals the impact of high paternal carbohydrate intake on offsprings’ lipid metabolism. Commun Biol 4, 163 (2021). https://doi.org/10.1038/s42003-021-01686-1
- Furse, S.[^eq], Fernandez-Twinn, D.S.[^eq], et al. Lipid metabolism is dysregulated before, during and after pregnancy in a mouse model of gestational diabetes. Int. J. Mol. Sci. 22, 7452 (2021). https://doi.org/10.3390/ijms22147452
[^eq]: These authors contributed equally to this work.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.