xiRT: Multi-dimensional Retention Time Prediction for Linear and Cross-Linked Peptides.
Project description
A python package for multi-dimensional retention time prediction for linear and crosslinked peptides using a (siamese) deep neural network architecture.
overview
xiRT is a deep learning tool to predict the RT of linear and cross-linked peptides from multiple fractionation dimensions including RP (typically coupled to the mass spectrometer). xiRT requires the columns shown in the table below. Importantly, the xiRT framework requires that CSM are sorted such that in the Peptide1 - Peptide2, Peptide1 is the longer or lexicographically larger one for crosslinked RT predictions.
Description
xiRT is meant to be used to generate additional information about CSMs for machine learning-based rescoring frameworks (similar to percolator). However, xiRT also delivers RT prediction for various scenarios. Therefore xiRT offers several training / prediction modes that need to be configured depending on the use case. At the moment training, prediction, crossvalidation are the supporte modes.
- training: trains xiRT on the input CSMs (using 10% for validation) and stores a trained model
- prediction: use a pretrained model and predict RTs for the input CSMs
- crossvalidation: load/train a model and predict RTs for all data points without using them in the training process. Requires the training of several models during CV
Note: all modes can be supplemented by using a pretrained model ("transfer learning").
Usage
xiRT is a python package that comes with a exectuable python file. To run xiRT follow the steps below.
Installation
To install xiRT a pip package is under development. Future release can be installed via:
pip install xirt
To enable CUDA support, the specific libraries need to be installed manually. The python dependencies are covered via the pip installation.
To use xiRT a simple command line script gets installed with pip. To run the predictions ...
config file
To adapt the xiRT parameters a yaml config file needs to be prepared. The configuration file is used to determine network parameters (number of neurons, layers, regularization) but also for the definition of the prediction task (classification, regression, ordered regression). Depending on the decoding of the target variable the output layers need to be adapted. For standard RP prediction, regression is essentially the only viable option. For SCX/hSAX (general classification) the prediction task can be formulated as classification, regression or ordered regression. For the usage of regression for fractionation it is recommended that the estimated salt concentrations are used as target variable for the prediction (raw fraction numbers are possible too).
input format
short name | explicit column name | description | Example |
---|---|---|---|
peptide sequence 1 | Peptide1 | First peptide sequence for crosslinks | PEPRTIDER |
peptide sequence 2 | Peptide2 | Second peptide sequence for crosslinks, or empty | ELRVIS |
link site 1 | LinkPos1 | Crosslink position in the first peptide (0-based) | 3 |
link site 2 | LinkPos2 | Crosslink position in the second peptide (0-based | 2 |
precursor charge | Charge | Precursor charge of the crosslinked peptide | 3 |
score | Score | Single score from the search engine | 17.12 |
unique id | CSMID | A unique index for each entry in the result table | 0 |
decoy | isTT | Binary column which is True for any TT identification and False for TD, DD ids | TT |
fdr | fdr | Estimated false discovery rate | 0.01 |
fdr level | fdrGroup | String identifier for heteromeric and self links (splitted FDR) | heteromeric |
The first four columns should be self explanatory, if not check the sample input #TODO. The fifth column ("CSMID") is a unique(!) integer that can be used as to retrieve CSMs. In addition, depending on the number retention time domains that should to be learned/predicted the RT columns need to be present. The column names need to match the configuration in the network parameter yaml.
CLI interface
The command line interface (CLI) requires three inputs:
- input CSM file
- a YAML file to configure the neural network architecture
- another YAML file to configure the general training / prediction behaviour, called setup-config
xiRT config
Setup config
Contributors
- Sven Giese
Citation
If you consider xiRT helpful for your work please cite our manuscript. Currently, only available on bioRxiv.org "xiRT: Retention Time Prediction using Neural Networks increases Identifications in Crosslinking Mass Spectrometry".
RappsilberLab
The Rappsilber applies and developes crosslinking chemistry methods, workflows and software. Visit the lab page to learn more about the developed software.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for xiRT-1.0.31-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e4ad92020db149552e2db6ad2f56c94f7973421c5db6123b53eec72c57217c6 |
|
MD5 | 3f75416b3e20d95a749e39d07195c787 |
|
BLAKE2b-256 | 3b76ad4ac10b8a15b04ad6210bc230c2d09d490d4d04da309c9327ceeb8e17cd |