Skip to main content

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.

Project description



GitHub release PyPI Conda GitHub Workflow Status License

DeepLC: Retention time prediction for peptides carrying any modification.



Introduction

DeepLC is a retention time predictor for peptides. Its strength lies in the fact that it can accurately predict retention times for modified peptides, even if hasn't seen said modification during training.

DeepLC can be used through the web application or as a Python package. In the latter case, DeepLC can be used from the command line, or as a Python module.

Citation

If you use DeepLC for your research, please use the following citation:

DeepLC can predict retention times for peptides that carry as-yet unseen modifications
Robbin Bouwmeester, Ralf Gabriels, Niels Hulstaert, Lennart Martens & Sven Degroeve
Nature Methods 18, 1363–1369 (2021) doi: 10.1038/s41592-021-01301-5

Usage

Web application

Open in Streamlit

Just go to iomics.ugent.be/deeplc and get started!

Python package

Installation

install with bioconda install with pip container

Install with conda, using the bioconda and conda-forge channels: conda install -c bioconda -c conda-forge deeplc

Or install with pip: pip install deeplc

Command line interface

To use the DeepLC CLI, run:

deeplc --file_pred <path/to/peptide_file.csv>

We highly recommend to add a peptide file with known retention times for calibration:

deeplc --file_pred  <path/to/peptide_file.csv> --file_cal <path/to/peptide_file_with_tr.csv>

For an overview of all CLI arguments, run deeplc --help.

Python module

Minimal example:

import pandas as pd
from deeplc import DeepLC

peptide_file = "datasets/test_pred.csv"
calibration_file = "datasets/test_train.csv"

pep_df = pd.read_csv(peptide_file, sep=",")
pep_df['modifications'] = pep_df['modifications'].fillna("")

cal_df = pd.read_csv(calibration_file, sep=",")
cal_df['modifications'] = cal_df['modifications'].fillna("")

dlc = DeepLC()
dlc.calibrate_preds(seq_df=cal_df)
preds = dlc.make_preds(seq_df=pep_df)

Minimal example with psm_utils:

import pandas as pd

from psm_utils.psm import PSM
from psm_utils.psm_list import PSMList
from psm_utils.io import write_file

from deeplc import DeepLC

infile = pd.read_csv("https://github.com/compomics/DeepLC/files/13298024/231108_DeepLC_input-peptides.csv")
psm_list = []

for idx,row in infile.iterrows():
    seq = row["modifications"].replace("(","[").replace(")","]")
    
    if seq.startswith("["):
        idx_nterm = seq.index("]")
        seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:]
        
    psm_list.append(PSM(peptidoform=seq,spectrum_id=idx))

psm_list = PSMList(psm_list=psm_list)

infile = pd.read_csv("https://github.com/compomics/DeepLC/files/13298022/231108_DeepLC_input-calibration-file.csv")
psm_list_calib = []

for idx,row in infile.iterrows():
    seq = row["seq"].replace("(","[").replace(")","]")
    
    if seq.startswith("["):
        idx_nterm = seq.index("]")
        seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:]
        
    psm_list_calib.append(PSM(peptidoform=seq,retention_time=row["tr"],spectrum_id=idx))

psm_list_calib = PSMList(psm_list=psm_list_calib)

dlc = DeepLC()
dlc.calibrate_preds(psm_list_calib)
preds = dlc.make_preds(seq_df=psm_list)

For a more elaborate example, see examples/deeplc_example.py .

Input files

DeepLC accepts any PSM file format supported by psm_utils, including MaxQuant msms.txt, Sage, MSAmanda, Percolator, and many more. The file format is automatically inferred from the file extension, or can be specified explicitly with the --psm-filetype option.

At a minimum, a tab-separated file with a peptidoform and spectrum_id column is accepted. Peptidoforms must be in ProForma 2.0 notation. For calibration or fine-tuning, a retention_time column is also required.

For example:

spectrum_id	peptidoform	retention_time
0	AAGPSLSHTSGGTQSK/2	12.16
1	AAINQK[Acetyl]LIETGER/2	34.10
2	AANDAGYFNDEM[Oxidation]APIEVK[Acetyl]TK/3	37.38

See examples/datasets for more examples.

Prediction models

DeepLC comes with multiple CNN models trained on data from various experimental settings. By default, DeepLC selects the best model based on the calibration dataset. If no calibration is performed, the first default model is selected. Always keep note of the used models and the DeepLC version. The current version comes with:

Model filename Experimental settings Publication
full_hc_PXD005573_mcp_8c22d89667368f2f02ad996469ba157e.hdf5 Reverse phase Bruderer et al. 2017
full_hc_PXD005573_mcp_1fd8363d9af9dcad3be7553c39396960.hdf5 Reverse phase Bruderer et al. 2017
full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5 Reverse phase Bruderer et al. 2017

For all the full models that can be used in DeepLC (including some TMT models!) please see:

https://github.com/RobbinBouwmeester/DeepLCModels

Naming convention for the models is as follows:

[full_hc]_[dataset]_[fixed_mods]_[hash].hdf5

The different parts refer to:

full_hc - flag to indicated a finished, trained, and fully optimized model

dataset - name of the dataset used to fit the model (see the original publication, supplementary table 2)

fixed mods - flag to indicate fixed modifications were added to peptides without explicit indication (e.g., carbamidomethyl of cysteine)

hash - indicates different architectures, where "1fd8363d9af9dcad3be7553c39396960" indicates CNN filter lengths of 8, "cb975cfdd4105f97efa0b3afffe075cc" indicates CNN filter lengths of 4, and "8c22d89667368f2f02ad996469ba157e" indicates filter lengths of 2

Q&A

See the FAQ in the documentation.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeplc-4.0.0a2.tar.gz (11.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deeplc-4.0.0a2-py3-none-any.whl (11.2 MB view details)

Uploaded Python 3

File details

Details for the file deeplc-4.0.0a2.tar.gz.

File metadata

  • Download URL: deeplc-4.0.0a2.tar.gz
  • Upload date:
  • Size: 11.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deeplc-4.0.0a2.tar.gz
Algorithm Hash digest
SHA256 a339f9e0d976f119481eaa7ee9b16ec90ac0e3f11f2d61c59bec06790cb8c610
MD5 96c0a2cc008e2eddd05ce9ef14a17c71
BLAKE2b-256 08f8bb77adab566d1bebf0cbf1e1b6b31e25977dfc6753ca2c1e9a74f38a7864

See more details on using hashes here.

Provenance

The following attestation bundles were made for deeplc-4.0.0a2.tar.gz:

Publisher: publish.yml on CompOmics/DeepLC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deeplc-4.0.0a2-py3-none-any.whl.

File metadata

  • Download URL: deeplc-4.0.0a2-py3-none-any.whl
  • Upload date:
  • Size: 11.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deeplc-4.0.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 dfc231264d34d688444b4f62b002a7886317290d7f8476f486724a37c7ff4a44
MD5 d68f3e669b2745dfb00a27558a7b7cdc
BLAKE2b-256 e64d41edd5458e51bfd84c5e1699f3acbd9e09e75b492445686a784c233d49e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for deeplc-4.0.0a2-py3-none-any.whl:

Publisher: publish.yml on CompOmics/DeepLC

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page