Skip to main content

Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry

Project description

Transformer_DIA

Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry.

is a deep learning model designed for de novo peptide sequencing from Data-Independent Acquisition (DIA) mass spectrometry data. By leveraging the transformer architecture. This guide will help you get started with installation, dataset preparation, and running key functionalities like model training, evaluation, and prediction. Follow the instructions below to utilize Transformer-DIA effectively for your peptide sequencing tasks.

Paper

For more details about the model and its implementation, refer to our paper: Transformer-DIA: Advanced De Novo Peptide Sequencing


Installation

To manage dependencies efficiently, we recommend using conda. Start by creating a dedicated conda environment:

conda create --name transdia_env python=3.10

Activate the environment:

conda activate transdia_env

Install Transformer_DIA and its dependencies via pip:

pip install transdia

To verify a successful installation, check the command-line interface:

transdia --help

Dataset

Data Preprocessing

To use Transformer-DIA, you need to preprocess the data by generating a feature file. This feature file serves as an essential input for the model. We provide a script to streamline this process. The script processes spectrum and feature files to create the required feature file in pickle format. The generated features include:

  • Keys: Peptide sequences
  • Values: List containing the following attributes:
    • precursor_mz
    • precursor_charge
    • scan_list_middle
    • ms1
    • mz_list
    • int_list
    • neighbor_right_count
    • neighbor_size_half

You can run the script by providing the paths to your spectrum and feature files as input. The script validates the inputs to ensure compatibility. Follow the instructions in the script prompts for seamless data preprocessing.

We used the feature and spectrum files released by the DeepNovo-DIA model, which are available here: MassIVE MSV000082368.

You can run the script by providing the paths to your spectrum and feature files as input. The script validates the inputs to ensure compatibility. Follow the instructions in the script prompts for seamless data preprocessing.

Download DIA Datasets

Annotated DIA datasets can be downloaded from the datasets page.


Download Pretrained Model Weights

Transformer_DIA requires pretrained model weights for predictions in denovo or eval modes. Compatible weights (in .ckpt format) can be found on the pretrained models page.

Specify the model file during execution using the --model parameter.


Usage

Predict Peptide Sequences

Transformer_DIA predicts peptide sequences from MS/MS spectra stored in MGF files. Predictions are saved as a CSV file:

transdia --mode=denovo --model=pretrained_checkpoint.ckpt --peak_path=path/to/spectra.mgf --peak_feature=path/to/precursor_feature.pkl

Evaluate de novo Sequencing Performance

To assess the performance of de novo sequencing against known annotations:

transdia --mode=eval --model=pretrained_checkpoint.ckpt --peak_path=path/to/spectra.mgf --peak_feature=path/to/precursor_feature.pkl

Annotations in the MGF file must include peptide sequences in the SEQ field.


Train a New Model

To train a new Transformer model from scratch, provide labeled training and validation datasets in MGF format:

transdia --mode=train --peak_path=path/to/train/annotated_spectra.mgf \ 
--peak_feature=path/to/train/precursor_feature.pkl \
--peak_path_val=path/to/validation/annotated_spectra.mgf \
--peak_feature_val==path/to/validation/precursor_feature.pkl

MGF files must include peptide sequences in the SEQ field.


Fine-Tune an Existing Model

To fine-tune a pretrained Transformer-DIA model, set the --train_from_scratch parameter to false:

transdia --mode=train --model=pretrained_checkpoint.ckpt \
--peak_feature=path/to/train/precursor_feature.pkl \
--peak_path_val=path/to/validation/annotated_spectra.mgf \
--peak_feature_val==path/to/validation/precursor_feature.pkl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transdia-0.0.7.tar.gz (147.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transdia-0.0.7-py3-none-any.whl (171.1 kB view details)

Uploaded Python 3

File details

Details for the file transdia-0.0.7.tar.gz.

File metadata

  • Download URL: transdia-0.0.7.tar.gz
  • Upload date:
  • Size: 147.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.19

File hashes

Hashes for transdia-0.0.7.tar.gz
Algorithm Hash digest
SHA256 239f562403134cb3e5be4b19cab2c33e8d3e9ed8f8957bc285c6922b4529a7ee
MD5 278b67509941e51492f2a5faf644d873
BLAKE2b-256 3b064b26d38e73aa53431f6d9ef1125a82ebb134458965d37975cefc337b191f

See more details on using hashes here.

File details

Details for the file transdia-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: transdia-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 171.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.19

File hashes

Hashes for transdia-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d1c138d1f97933ec7b06d0ca43078c6da96c86e643f5b489ba9baf01bd2558ea
MD5 5e91cab9155bae935d92d50a901f2a64
BLAKE2b-256 105d0ac7ac6c98d5215adb6e2a92c31c8e9e079a27072b699737ab38313f3aa7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page