DiffNovo: A Transformer-Diffusion Model for De Novo Peptide Sequencing
Project description
DiffNovo
DiffNovo is an innovative tool for de novo peptide sequencing using advanced machine learning techniques. This guide will help you get started with installation, dataset preparation, and running key functionalities like model training, evaluation, and prediction.
Installation
To manage dependencies efficiently, we recommend using conda. Start by creating a dedicated conda environment:
conda create --name diffnovo_env python=3.10
Activate the environment:
conda activate diffnovo_env
Install DiffNovo and its dependencies via pip:
pip install diffnovo==0.0.8
To verify a successful installation, check the command-line interface:
diffnovo --help
Dataset Preparation
Download DIA Datasets
Annotated DIA datasets can be downloaded from the datasets page. These datasets are essential for running DiffNovo in various modes.
Download Pretrained Model Weights
DiffNovo requires pretrained model weights for predictions in denovo or eval modes. Compatible weights (in .ckpt format) can be found on the pretrained models page.
Specify the model file during execution using the --model parameter. For example:
diffnovo --mode=denovo --model pretrained_checkpoint.ckpt --peak_path=path/to/predict/spectra.mgf --output=path/to/output
If no model file is specified, DiffNovo will automatically download and use a compatible model.
Usage
Predict Peptide Sequences
DiffNovo predicts peptide sequences from MS/MS spectra stored in MGF files. Predictions are saved as a CSV file:
diffnovo --mode=denovo --model pretrained_checkpoint.ckpt --peak_path=path/to/spectra.mgf --output=path/to/output.csv
Evaluate de novo Sequencing Performance
To assess the performance of de novo sequencing against known annotations:
diffnovo --mode=eval --peak_path=path/to/test/annotated_spectra.mgf
Annotations in the MGF file must include peptide sequences in the SEQ field.
Train a New Model
To train a new DiffNovo model from scratch, provide labeled training and validation datasets in MGF format:
diffnovo --mode=train --peak_path=path/to/train/annotated_spectra.mgf --peak_path_val=path/to/validation/annotated_spectra.mgf
MGF files must include peptide sequences in the SEQ field.
Fine-Tune an Existing Model
To fine-tune a pretrained DiffNovo model, set the --train_from_scratch parameter to false:
diffnovo --mode=train --model pretrained_checkpoint.ckpt \
--peak_path=path/to/train/annotated_spectra.mgf \
--peak_path_val=path/to/validation/annotated_spectra.mgf
For further details, refer to our documentation or raise an issue on our GitHub repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file diffnovo-0.0.9.tar.gz.
File metadata
- Download URL: diffnovo-0.0.9.tar.gz
- Upload date:
- Size: 59.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e954fbee38a8184dd2bb1437da5e28a46d4f2a142dee77d746addddfd57521ac
|
|
| MD5 |
b0aa4d42367a5913fac4c169859d5480
|
|
| BLAKE2b-256 |
7d81abe20542f90aa434392ebe178e84bbc5bc6d06c98e11113841124bab3af6
|
File details
Details for the file diffnovo-0.0.9-py3-none-any.whl.
File metadata
- Download URL: diffnovo-0.0.9-py3-none-any.whl
- Upload date:
- Size: 70.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d77abbbbf058d0e0585952c448642d7a6b902ab6a2dc137d85a677cf6de78d3f
|
|
| MD5 |
d84637a7f831881a6754a772c5477236
|
|
| BLAKE2b-256 |
97e3df9f7c2ee6a67eebeb422e3b1ea34b371ec64ea08ed3625e5853d192614f
|