Skip to main content

DiffNovo: A Transformer-Diffusion Model for De Novo Peptide Sequencing

Project description

DiffNovo

DiffNovo is an innovative tool for de novo peptide sequencing using advanced machine learning techniques. This guide will help you get started with installation, dataset preparation, and running key functionalities like model training, evaluation, and prediction.


Installation

To manage dependencies efficiently, we recommend using conda. Start by creating a dedicated conda environment:

conda create --name diffnovo_env python=3.10

Activate the environment:

conda activate diffnovo_env

Install DiffNovo and its dependencies via pip:

pip install diffnovo

To verify a successful installation, check the command-line interface:

diffnovo --help

Dataset Preparation

Download DIA Datasets

Annotated DIA datasets can be downloaded from the datasets page. These datasets are essential for running DiffNovo in various modes.


Download Pretrained Model Weights

DiffNovo requires pretrained model weights for predictions in denovo or eval modes. Compatible weights (in .ckpt format) can be found on the pretrained models page.

Specify the model file during execution using the --model parameter. For example:

diffnovo --mode=denovo --model pretrained_checkpoint.ckpt --peak_path=path/to/predict/spectra.mgf --output=path/to/output

If no model file is specified, DiffNovo will automatically download and use a compatible model.


Usage

Predict Peptide Sequences

DiffNovo predicts peptide sequences from MS/MS spectra stored in MGF files. Predictions are saved as a CSV file:

diffnovo --mode=denovo --model pretrained_checkpoint.ckpt --peak_path=path/to/spectra.mgf --output=path/to/output.csv

Evaluate de novo Sequencing Performance

To assess the performance of de novo sequencing against known annotations:

diffnovo --mode=eval --peak_path=path/to/test/annotated_spectra.mgf

Annotations in the MGF file must include peptide sequences in the SEQ field.


Train a New Model

To train a new DiffNovo model from scratch, provide labeled training and validation datasets in MGF format:

diffnovo --mode=train --peak_path=path/to/train/annotated_spectra.mgf --peak_path_val=path/to/validation/annotated_spectra.mgf

MGF files must include peptide sequences in the SEQ field.


Fine-Tune an Existing Model

To fine-tune a pretrained DiffNovo model, set the --train_from_scratch parameter to false:

diffnovo --mode=train --model pretrained_checkpoint.ckpt \
 --peak_path=path/to/train/annotated_spectra.mgf \
 --peak_path_val=path/to/validation/annotated_spectra.mgf

For further details, refer to our documentation or raise an issue on our GitHub repository.

Happy sequencing!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffnovo-0.0.7.tar.gz (58.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diffnovo-0.0.7-py3-none-any.whl (69.2 kB view details)

Uploaded Python 3

File details

Details for the file diffnovo-0.0.7.tar.gz.

File metadata

  • Download URL: diffnovo-0.0.7.tar.gz
  • Upload date:
  • Size: 58.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.19

File hashes

Hashes for diffnovo-0.0.7.tar.gz
Algorithm Hash digest
SHA256 75b616fc1b216b0535ece59407ffa1902595b5f30c3c1fcc1984d358772cc840
MD5 bce2c06ef5f48912c4bcd40b1c16bb4a
BLAKE2b-256 3e41210e44fbee034a52f96bf92ce73801a36452eeddc5d10ab02837ac7f2cdf

See more details on using hashes here.

File details

Details for the file diffnovo-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: diffnovo-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 69.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.19

File hashes

Hashes for diffnovo-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 6ed408394730a24d0ecd9f90a4023a29a1d8555f88f24ff675c1c0ba40cbc409
MD5 bdbe61b86b8037a61466f008e397c140
BLAKE2b-256 bca13752f89107b6b73b4c38f6d43f1f5bf2a25a044edff2458057c2e8c0c5ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page