xiRT · PyPI

xiRT: Multi-dimensional Retention Time Prediction for Linear and Crosslinked Peptides.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

logo

release tag PyPI version coverage pytest

A python package for multi-dimensional retention time prediction for linear and crosslinked peptides using a (Siamese) deep neural network architecture.

Overview
Description
Installation

overview

xiRT is a deep learning tool to predict the separation behavior (i.e. retention times) of linear and crosslinked peptides from single to multiple fractionation dimensions including RP (typically directly coupled to the mass spectrometer). xiRT was developed to predict retention times from a multi-dimensional separation from the combination of SCX / hSAX / RP chromatography. However, xiRT supports all available chromatographic and other peptide separation methods

xiRT requires the columns shown in the table below. Importantly, the xiRT framework requires that CSM are sorted such that in the Peptide1 - Peptide2, Peptide1 is the longer or lexicographically larger one for crosslinked RT predictions. The sorting is done internally and may result in swapped peptide sequences in the output tables.

xiRT Architecture

Description

xiRT is meant to be used to generate additional information about CSMs for machine learning-based rescoring frameworks but the usage can be extended to spectral libraries, targeted acquisitions etc. Therefore xiRT offers several training / prediction modes that need to be configured depending on the use case. At the moment training, prediction, crossvalidation are the supported modes.

training: trains xiRT on the input CSMs (using 10% for validation) and stores a trained model
prediction: use a pretrained model and predict RTs for the input CSMs
crossvalidation: load/train a model and predict RTs for all data points without using them in the training process. Requires the training of several models during CV

Note: all modes can be supplemented by using a pretrained model ("transfer learning") when not enough training data is available to achieve robust prediction performance.

This readme only gives a brief overview about xiRTs functions and parameters. Please refer to the documentation for more details and examples.

Installation and Usage

xiRT is a python package that comes with a executable python file. To run xiRT follow the steps below.

Requirements

xiRT requires a running python installation on windows/mac/linux. All further requirements are managed during the installation process via pip or conda. xiRT was tested using python >3.7 with TensorFlow 1.4 and python >3.8 and TensorFlow >2.0. A GPU is not mandatory to run xiRT, however it can greatly decrease runtime. Further system requirements depend on the data sets to be used.

Installation

To install xiRT simply run the command below. We recommend to use an isolated python environment, for example by using pipenv or conda. Installation should finish within minutes.

Using pipenv:

pipenv shell

pip install xirt

Optional: To enable CUDA support, using a conda environment is the easiest solution.
Conda will take care of the CUDA libraries and other dependencies. Note, xiRT runs either on CPUs or GPUs. To use a GPU specify CuDNNGRU/CuDNNLSTM as type in the LSTM settings, to use a CPU set the type to GRU/LSTM.

conda create --name xirt_env python=3.8

conda activate xirt_env

pip install xirt

Hint: The plotting functionality for the network is not enabled per default because pydot and graphviz sometimes make trouble when they are installed via pip. If on linux, simply use sudo apt-get install graphviz, on windows download latest graphviz package from here, unzip the content of the file and the bin directory path to the windows PATH variable. These two packages allow the visualization of the neural network architecture. xiRT will function also without this functionality.

Older versions of TensorFlow will require the separate installation of tensorflow-gpu. We recommend to install tensorflow in conda, especially if GPU usage is desired.

General Usage

This section explains the general usage of xiRT via the command line. A minimal working example in a quick-start guide fashion is available here.

The command line interface (CLI) requires three inputs:

input PSM/CSM file
a YAML file to configure the neural network architecture
another YAML file to configure the general training / prediction behaviour, called setup-config

Configs are either available via github. Alternatively, up-to-date configs can be generated from the xiRT package itself:

xirt -p learning_params.yaml

xirt -s xirt_params.yaml

To use these two parameter files in xiRT and store the results in a directory called out_dir,
run the following command:

xirt -i psms.csv -o out_dir -x xirt_params.yaml -l learning_params.yaml

To adapt the xiRT parameters to your needs, edits to the YAML config file are needed. The configuration file is used to determine the prediction task (rp, scx, hsax, ...) but also to set important network parameters (number of neurons, layers, regularization). While the default network configuration offers suitable parameters for most situations, the prediction tasks need further adjustments. The adjustments need to account for the type and number of prediction tasks. Please visit the documentation to get more information about viable configurations.

Once xirt is running, the progress is logged to the terminal as well as a dedicated log file. This log file summarizes the training steps and contains important information (settings, file paths, metrics). Further output files and quality control plots are then stored in the specified output (-o) directory. Find a description for the files here

input format

short name	explicit column name	description	Example
peptide sequence 1	Peptide1	First peptide sequence for crosslinks	PEPRTIDER
peptide sequence 2	Peptide2	Second peptide sequence for crosslinks, or empty	ELRVIS
fasta description 1	Fasta1	FASTA header / description of protein 1	SUCD_ECOLI Succinate--CoA ligase [ADP-forming]
fasta description 2	Fasta2	FASTA header / description of protein 2	SUCC_ECOLI Succinate--CoA ligase [ADP-forming]
PSMID	PSMID	A unique identifier for the identification	1
link site 1	LinkPos1	Crosslink position in the first peptide (0-based)	3
link site 2	LinkPos2	Crosslink position in the second peptide (0-based	2
score	score	Single score from the search engine	17.12
unique id	PSMID	A unique index for each entry in the result table	0
TT	isTT	Binary column which is True for any TT	True
TD	isTD	Binary column which is True for any TD	True
DD	isDD	Binary column which is True for any DD	True
fdr	fdr	Estimated false discovery rate	0.01

The first four columns should be self explanatory, if not check the sample input. The fifth column ("PSMID") is a unique(!) integer that can be used as to retrieve CSMs/PSMs. In addition, depending on the number retention time domains that should be learned/predicted the RT columns need to be present. The column names need to match the configuration in the network parameter yaml. Note that xiRT swaps the sequences such that peptide1 is longer than peptide 2. In order to keep track of this process all columns that follow the convention 1 and 2 are swapped. Make sure to only have such paired columns and not single columns ending with 1/2.

xiRT config

This file determines the network architecture and training behaviour used in xiRT. Please see the documentation for a detailed example. For crosslinks the most important parameter sections to adapt are the output and the predictions section. Here the parameters must be adapted for the used chromatography dimensions and modelling choices. See also the provided examples.

Setup config

This file determines the input data to be used and gives some training procedure options. Please see the documentation for a detailed example.

Contributors

Sven Giese
Ludwig Sinn

Citation

If you consider xiRT helpful for your work please cite our manuscript. Currently, in preparation.

RappsilberLab

The Rappsilber applies and develops crosslinking chemistry methods, workflows and software. Visit the lab page to learn more about the developed software.

xiSUITE

xiVIEW: Graham, M. J.; Combe, C.; Kolbowski, L.; Rappsilber, J. bioRxiv 2019.
xiNET: Combe, C. W.; Fischer, L.; Rappsilber, J. Mol. Cell. Proteomics 2015.
xiSPEC: Kolbowski, L.; Combe, C.; Rappsilber, J. Nucleic Acids Res. 2018, 46 (W1), W473–W478.
xiSEARCH: Mendes, M. L.; Fischer, L.; Chen, Z. A.; Barbon, M.; O’Reilly, F. J.; Giese, S. H.; Bohlke‐Schneider, M.; Belsom, A.; Dau, T.; Combe, C. W.; Graham, M.; Eisele, M. R.; Baumeister, W.; Speck, C.; Rappsilber, J. Mol. Syst. Biol. 2019, 15 (9), e8994.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.2.41

Apr 26, 2021

1.2.4

Mar 2, 2021

1.2.3

Feb 26, 2021

1.2.2

Feb 19, 2021

1.1.1

Dec 21, 2020

1.0.61

Aug 25, 2020

1.0.51

Aug 24, 2020

1.0.50

Aug 22, 2020

1.0.40

Aug 19, 2020

1.0.34

Aug 4, 2020

1.0.33

Aug 3, 2020

1.0.32

Jul 10, 2020

1.0.31

Jul 7, 2020

1.0.6

Aug 25, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xiRT-1.2.41.tar.gz (65.5 kB view hashes)

Uploaded Apr 26, 2021 Source

Built Distribution

xiRT-1.2.41-py3-none-any.whl (50.8 kB view hashes)

Uploaded Apr 26, 2021 Python 3

Hashes for xiRT-1.2.41.tar.gz

Hashes for xiRT-1.2.41.tar.gz
Algorithm	Hash digest
SHA256	`6d023ab9fcb9f2f610d993e7f04381f2f6b134aad7f375bf00cff0b39b661a48`
MD5	`4ad49db29cb8bca5d742cbca19ed404a`
BLAKE2b-256	`d263722924ec603646ae0e11e47b0d48b5d2bbdef6c8e3c8b2ee279131e485ae`

Hashes for xiRT-1.2.41-py3-none-any.whl

Hashes for xiRT-1.2.41-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd7de9145219fa4dbbd82e8812c1461753768d17aecebfff7a9a3e18dd4dc732`
MD5	`f44c10e5b415bebe2518ffc61c324bf3`
BLAKE2b-256	`7a825346988fdac020427e2ed09f2428aea5d460e0e8dfbad151ab26715f8ce5`