DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
Project description
DeepLC: Retention time prediction for peptides carrying any modification.
Introduction
DeepLC is a retention time predictor for peptides. Its strength lies in the fact that it can accurately predict retention times for modified peptides, even if hasn't seen said modification during training.
DeepLC can be used through the web application or as a Python package. In the latter case, DeepLC can be used from the command line, or as a Python module.
Citation
If you use DeepLC for your research, please use the following citation:
DeepLC can predict retention times for peptides that carry as-yet unseen modifications
Robbin Bouwmeester, Ralf Gabriels, Niels Hulstaert, Lennart Martens & Sven Degroeve
Nature Methods 18, 1363–1369 (2021) doi: 10.1038/s41592-021-01301-5
Usage
Web application
Just go to iomics.ugent.be/deeplc and get started!
Python package
Installation
Install with conda, using the bioconda and conda-forge channels:
conda install -c bioconda -c conda-forge deeplc
Or install with pip:
pip install deeplc
Command line interface
To use the DeepLC CLI, run:
deeplc --file_pred <path/to/peptide_file.csv>
We highly recommend to add a peptide file with known retention times for calibration:
deeplc --file_pred <path/to/peptide_file.csv> --file_cal <path/to/peptide_file_with_tr.csv>
For an overview of all CLI arguments, run deeplc --help.
Python module
Minimal example:
import pandas as pd
from deeplc import DeepLC
peptide_file = "datasets/test_pred.csv"
calibration_file = "datasets/test_train.csv"
pep_df = pd.read_csv(peptide_file, sep=",")
pep_df['modifications'] = pep_df['modifications'].fillna("")
cal_df = pd.read_csv(calibration_file, sep=",")
cal_df['modifications'] = cal_df['modifications'].fillna("")
dlc = DeepLC()
dlc.calibrate_preds(seq_df=cal_df)
preds = dlc.make_preds(seq_df=pep_df)
Minimal example with psm_utils:
import pandas as pd
from psm_utils.psm import PSM
from psm_utils.psm_list import PSMList
from psm_utils.io import write_file
from deeplc import DeepLC
infile = pd.read_csv("https://github.com/compomics/DeepLC/files/13298024/231108_DeepLC_input-peptides.csv")
psm_list = []
for idx,row in infile.iterrows():
seq = row["modifications"].replace("(","[").replace(")","]")
if seq.startswith("["):
idx_nterm = seq.index("]")
seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:]
psm_list.append(PSM(peptidoform=seq,spectrum_id=idx))
psm_list = PSMList(psm_list=psm_list)
infile = pd.read_csv("https://github.com/compomics/DeepLC/files/13298022/231108_DeepLC_input-calibration-file.csv")
psm_list_calib = []
for idx,row in infile.iterrows():
seq = row["seq"].replace("(","[").replace(")","]")
if seq.startswith("["):
idx_nterm = seq.index("]")
seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:]
psm_list_calib.append(PSM(peptidoform=seq,retention_time=row["tr"],spectrum_id=idx))
psm_list_calib = PSMList(psm_list=psm_list_calib)
dlc = DeepLC()
dlc.calibrate_preds(psm_list_calib)
preds = dlc.make_preds(seq_df=psm_list)
For a more elaborate example, see examples/deeplc_example.py .
Input files
DeepLC accepts any PSM file format supported by
psm_utils,
including MaxQuant msms.txt, Sage, MSAmanda, Percolator, and many more. The file
format is automatically inferred from the file extension, or can be specified
explicitly with the --psm-filetype option.
At a minimum, a tab-separated file with a peptidoform and spectrum_id column
is accepted. Peptidoforms must be in
ProForma 2.0 notation.
For calibration or fine-tuning, a retention_time column is also required.
For example:
spectrum_id peptidoform retention_time
0 AAGPSLSHTSGGTQSK/2 12.16
1 AAINQK[Acetyl]LIETGER/2 34.10
2 AANDAGYFNDEM[Oxidation]APIEVK[Acetyl]TK/3 37.38
See examples/datasets for more examples.
Prediction models
DeepLC comes with multiple CNN models trained on data from various experimental settings. By default, DeepLC selects the best model based on the calibration dataset. If no calibration is performed, the first default model is selected. Always keep note of the used models and the DeepLC version. The current version comes with:
| Model filename | Experimental settings | Publication |
|---|---|---|
| full_hc_PXD005573_mcp_8c22d89667368f2f02ad996469ba157e.hdf5 | Reverse phase | Bruderer et al. 2017 |
| full_hc_PXD005573_mcp_1fd8363d9af9dcad3be7553c39396960.hdf5 | Reverse phase | Bruderer et al. 2017 |
| full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5 | Reverse phase | Bruderer et al. 2017 |
For all the full models that can be used in DeepLC (including some TMT models!) please see:
https://github.com/RobbinBouwmeester/DeepLCModels
Naming convention for the models is as follows:
[full_hc]_[dataset]_[fixed_mods]_[hash].hdf5
The different parts refer to:
full_hc - flag to indicated a finished, trained, and fully optimized model
dataset - name of the dataset used to fit the model (see the original publication, supplementary table 2)
fixed mods - flag to indicate fixed modifications were added to peptides without explicit indication (e.g., carbamidomethyl of cysteine)
hash - indicates different architectures, where "1fd8363d9af9dcad3be7553c39396960" indicates CNN filter lengths of 8, "cb975cfdd4105f97efa0b3afffe075cc" indicates CNN filter lengths of 4, and "8c22d89667368f2f02ad996469ba157e" indicates filter lengths of 2
Q&A
See the FAQ in the documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deeplc-4.0.0a2.tar.gz.
File metadata
- Download URL: deeplc-4.0.0a2.tar.gz
- Upload date:
- Size: 11.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a339f9e0d976f119481eaa7ee9b16ec90ac0e3f11f2d61c59bec06790cb8c610
|
|
| MD5 |
96c0a2cc008e2eddd05ce9ef14a17c71
|
|
| BLAKE2b-256 |
08f8bb77adab566d1bebf0cbf1e1b6b31e25977dfc6753ca2c1e9a74f38a7864
|
Provenance
The following attestation bundles were made for deeplc-4.0.0a2.tar.gz:
Publisher:
publish.yml on CompOmics/DeepLC
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
deeplc-4.0.0a2.tar.gz -
Subject digest:
a339f9e0d976f119481eaa7ee9b16ec90ac0e3f11f2d61c59bec06790cb8c610 - Sigstore transparency entry: 1185771001
- Sigstore integration time:
-
Permalink:
CompOmics/DeepLC@cceec9c8158d8704c4a230923e1a415dab6723d9 -
Branch / Tag:
refs/tags/4.0.0-alpha.2 - Owner: https://github.com/CompOmics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cceec9c8158d8704c4a230923e1a415dab6723d9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file deeplc-4.0.0a2-py3-none-any.whl.
File metadata
- Download URL: deeplc-4.0.0a2-py3-none-any.whl
- Upload date:
- Size: 11.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dfc231264d34d688444b4f62b002a7886317290d7f8476f486724a37c7ff4a44
|
|
| MD5 |
d68f3e669b2745dfb00a27558a7b7cdc
|
|
| BLAKE2b-256 |
e64d41edd5458e51bfd84c5e1699f3acbd9e09e75b492445686a784c233d49e2
|
Provenance
The following attestation bundles were made for deeplc-4.0.0a2-py3-none-any.whl:
Publisher:
publish.yml on CompOmics/DeepLC
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
deeplc-4.0.0a2-py3-none-any.whl -
Subject digest:
dfc231264d34d688444b4f62b002a7886317290d7f8476f486724a37c7ff4a44 - Sigstore transparency entry: 1185771004
- Sigstore integration time:
-
Permalink:
CompOmics/DeepLC@cceec9c8158d8704c4a230923e1a415dab6723d9 -
Branch / Tag:
refs/tags/4.0.0-alpha.2 - Owner: https://github.com/CompOmics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cceec9c8158d8704c4a230923e1a415dab6723d9 -
Trigger Event:
release
-
Statement type: