Skip to main content

Lipid Traffic Analysis

Project description

Lipid Traffic Analysis

MIT License PyPI Version Python Versions CI/CD codecov Documentation Status Project Status: Active Codestyle: Black Imports: isort

aka LTA, aka LipidTA

A python commandline interface for analysing lipidomics data.

The source code lives on github.

The documentation lives at ReadTheDocs.

The project can be installed from PyPI.

Abstract

Lipid Traffic Analysis (LTA) is a tool for using lipidomics data to test hypotheses about how metabolism is controlled. Lipidomics data from several, metabolically connected tissues from control and experimental groups can be used to plot the spatial or temporal distribution of lipids. These distributions identify where changes in lipid metabolism occur and in which lipid pathways, indicating the locus and biochemical alterations that occur in a given group. LTA was conceived in two parts. One is an Abundance Analysis, in which the error-normalised fold change (ENFC) for the control and given group group is calculated. Because the ratio of the control and experimental values is scaled by the error, the ENFCs are easy to plot and compare between compartments. The second part is a Switch Analysis. This computes the presence of variables across the network. Current development is focused on developing the technique for complex networks and on the rate of lipid transport.

Using LTA from the command line

Installation

Installing from PyPI

This is the most straightforward way to set up the tool. When installing from PyPI, we strongly reccomend using a virtual environment. There are many ways to do this! If you already have a preferred method - I use pipx for command line tools - feel free to use that. Otherwise, use the builtin Python module venv. The exact instructions are OS-specific and detailed at the above link. Instructions for installing the most recent version of LTA on MacOS are given below:

# Make a directory for the project
mkdir lta && cd lta
# Create the virtual environment
python3 -m venv .venv
# Activate the environment
source .venv/bin/activate
# Install lta
pip install -U LipidTA
Our pip package is `LipidTA`.
Unfortunately,
`lta` was "too similar to existing package names",
so PyPi wouldn't let us use it.

If you want to install a specific version, then change the last line in the previous code block to:

pip install LipidTA==0.12.1

replacing the version number with the version number you want. A list of all released versions can be found at our tags.

Installing from Source

Most users **will not need** these instructions.

If you need to customise the code in some manner, you'll need to install from source. To do that, either clone the repository from github, or download one of our releases. For full instructions, please see our guide on contributing.

(data)=

The Data

The input should be a csv containing the lipidomics results. Though we strive to be as flexible as possible, we must make some assumptions about the data to be able to use it. Firstly, the first 3 columns must be the multiindex for the lipids, and include the lipid name, category, and m/z, respectively. Secondly, the values must be numeric.

The analysis depends on a number of key metadata variables, namely:

  • Mode (ie. -ve vs +ve)
  • Sample ID
  • group (ie. lean vs obese)
  • Tissue (ie. heart)

Additionally, "group" should be binary - that is, there should only be two categories - and the order for fold change calculation must be specified with --order Cond1 Cond2. Fold Change will always be calculated as Cond1 / Cond2.

These rows should be in the first n rows of your data file, where n is specified with the option --n-rows-metadata. You can name these metadata rows whatever you want in the data file, and tell lta where to find them with the appropriate flags. Please see the section on customising your run. However, if these data are not present, the tool will not run, as the analysis only makes sense in the context of these variables.

Should we make any changes to data format expectations,
it will be well documented and will only occur in a major/breaking releas.

Running the analysis

Once you've installed the tool and activated your virtual environment, running the analysis can be as simple as:

lta data.csv results

The first argument is path to the combined input file. If the file doesn't exist, is a directory, or doesn't contain any data files, the command will error with an apropriate message. The secont argument identifies a folder in which the results will be saved. It will be create if it doesn't exist.

While it can be that simple, you'll likely have to customise some options for your run. In that case, it will likely look a bit more like:

lta --n-rows-metadata 11\
--group Group \
--order obse lean \
--tissue Compartment \
--sample-id mouse

Don't worry if it looks intimidating! You can check out the section on customising your run for further details, and help can always be found at our documentation or from the command line with:

lta -h

Alternatively, you might prefer to use a configuration file to keep things simple. In that case, see the section on configuration for more information.

(customising)=

Customising

There are a few options that can be customised for any given run. The statistics are calculated using a bootstrapping approach, which (by definition) involves repeated replicates. To control the number of replicates, pass the -b/--boot-reps flag with a number. Generally, more reps improves the accuracy of the estimates, though I find little improvement beyond 20,000 reps. 1000 (the default number) seems to provide a good balance between speed and accuracy.

A critical step of the analysis is binarizing the lipid expression. A lipid is classed as 0 in a tissue/condition if the lipid is not detected in more than a particular fraction of samples. The default values is 0.2 (one-fifth of the samples). If you want to change it, pass the -t/--threshold flag with a decimal between 0 and 1. This value can have a significant impact on the analysis, so explore how it impacts your data!

Many calculations are dependent on knowing where certain metadata is stored. Namely, the experimental conditions (specified with --group) the tissue of origin (specified with --tissue), the sample ID (specified with --sample-id), and the lipidomics mode (specified with --mode). If these are not passed, then they default to "group", "Tissue", "SampleID", and "Mode" respectively. To find these rows, we also need to know the number of lines in your column metadata. This is specified with --n-rows-metadata. Please the section on expected data file structure for more information.

For the fold-change calculation in ENFC to make any sense, we need to know which group in group is which. You can specify this using the --order option like so:

lta data results --order obese lean

The first word following order will be treated as the experimental group, while the second word will be treated as the control group. In this example then, fold-change would be give as obese / lean. If you don't specify, this defaults to experimental / control.

(configuration)=

Configuration files

If you find yourself regularly passing arguments via the CLI, you might want to try a configuration file! This is a simple text file that stores options in a simple format:

option=value

By default, LTA looks for lta_conf.txt in your current directory. However, you can name this file whatever you want, and let LTA know where to find it, by passing the config flag like so:

lta -c path/to/your/config.txt data results

If you specify an option in the configuration file, that will override LTA's defaults, and specifying an option at the command line will override the configuration file! The config file doesn't need do exist, however, and is just a bit of sugar.

The Output

Re-running the analysis overwrites existing results,
so be sure to either back up your data,
or pass a different output folder!

The output folder will contain 5 file. For each type of lipid, you should see the following:

  1. enfc_individual_lipids.csv - the ENFC results for each lipid.
  2. enfc_lipid_classes.csv - the mean and St.Dev. of ENFC, grouped by lipid class.
  3. switch_individual_lipid.csv - a table of lipids and their A/B/U/N classification.
  4. switch_lipid_classes.csv - a table counting the frequency of each lipid class within the A/B/U/N classification.
  5. jaccard_similarity.csv - the Jaccard similarity and p-value for each lipid class.

A few notes! Fold change will always be order[0] / order[1]. The Jaccard similarities are calculated between conditions specified in --group across both tissues and lipid classes. The p-values for these similarities are calculated using the method outlined by N. Chung, et. al.. For ENFC, fold change is only meaningful if both values are non-0. Where this is not true, NaN has been substituted, leaving an empty cell in the output.

Contributing

Open-source software is only open-source becaues of the excellent community, so we welcome any and all contributions! If you think you have found a bug, please log a report in our issues. If you think you can fix a bug, or have an idea for a new feature, please see our guide on contributing for more information on how to get started! While here, we request that you follow our code of conduct to help maintain a welcoming, respectful environment.

Future Developments

  • Improve Github actions to use caching for poetry and Nox
  • Increase test coverage
  • Automate plotting

Citations

If you use LTA in your work, please cite the following manuscripts:

  1. Furse, S., Watkins, A.J., Hojat, N. et al. Lipid Traffic Analysis reveals the impact of high paternal carbohydrate intake on offsprings’ lipid metabolism. Commun Biol 4, 163 (2021). https://doi.org/10.1038/s42003-021-01686-1
  2. Furse, S.[^eq], Fernandez-Twinn, D.S.[^eq], et al. Lipid metabolism is dysregulated before, during and after pregnancy in a mouse model of gestational diabetes. Int. J. Mol. Sci. 22, 7452 (2021). https://doi.org/10.3390/ijms22147452

[^eq]: These authors contributed equally to this work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LipidTA-1.0.0.tar.gz (24.2 kB view hashes)

Uploaded Source

Built Distribution

LipidTA-1.0.0-py3-none-any.whl (21.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page