De novo sequencing with InstaNovo
Project description
De novo peptide sequencing with InstaNovo
The official code repository for InstaNovo. This repo contains the code for training and inference of InstaNovo and InstaNovo+. InstaNovo is a transformer neural network with the ability to translate fragment ion peaks into the sequence of amino acids that make up the studied peptide(s). InstaNovo+, inspired by human intuition, is a multinomial diffusion model that further improves performance by iterative refinement of predicted sequences.
Links:
Developed by:
Usage
Installation
To use InstaNovo, we need to install the module via pip
:
pip install instanovo
It is recommended to install InstaNovo in a fresh environment, such as Conda or PyEnv. For example, if you have anaconda/miniconda installed:
conda create -n instanovo python=3.8
conda activate instanovo
Note: InstaNovo is built for Python >= 3.8
Training
To train auto-regressive InstaNovo:
usage: python -m instanovo.transformer.train train_path valid_path [-h] [--config CONFIG] [--n_gpu N_GPU] [--n_workers N_WORKERS]
required arguments:
train_path Training data path
valid_path Validation data path
optional arguments:
--config CONFIG file in configs folder
--n_workers N_WORKERS
Note: data is expected to be saved as Polars .ipc
format. See section on data conversion.
To update the InstaNovo model config, modify the config file under configs/instanovo/base.yaml
Prediction
To evaluate InstaNovo:
usage: python -m instanovo.transformer.predict data_path model_path [-h] [--denovo] [--config CONFIG] [--subset SUBSET] [--knapsack_path KNAPSACK_PATH] [--n_workers N_WORKERS]
required arguments:
data_path Evaluation data path
model_path Model checkpoint path
optional arguments:
--denovo evaluate in de novo mode, will not try to compute metrics
--output_path OUTPUT_PATH
Save predictions to a csv file (required in de novo mode)
--subset SUBSET
portion of set to evaluate
--knapsack_path KNAPSACK_PATH
path to pre-computed knapsack
--n_workers N_WORKERS
Converting datasets to Polars
To convert a dataset to polars .ipc
:
usage: python -m instanovo.utils.convert_ipc source target [-h] [--source_type SOURCE_TYPE] [--max_charge MAX_CHARGE]
required arguments:
source source data
target target ipc file
optional arguments:
--source_type SOURCE_TYPE
type of input data. currently supports [mgf, csv]
--max_charge MAX_CHARGE
maximum charge to filter
Roadmap
This code repo is currently under construction.
ToDo:
- Add diffusion model code
- Add data preprocessing pipeline
- Multi-GPU support
License
Code is licensed under the Apache License, Version 2.0 (see LICENSE)
The model checkpoints are licensed under Creative Commons Non-Commercial (CC BY-NC-SA 4.0)
BibTeX entry and citation info
@article{eloff_kalogeropoulos_2023_instanovo,
title = {De novo peptide sequencing with InstaNovo: Accurate, database-free peptide identification for large scale proteomics experiments},
author = {Kevin Eloff and Konstantinos Kalogeropoulos and Oliver Morell and Amandla Mabona and Jakob Berg Jespersen and Wesley Williams and Sam van Beljouw and Marcin Skwark and Andreas Hougaard Laustsen and Stan J. J. Brouns and Anne Ljungars and Erwin Marten Schoof and Jeroen Van Goey and Ulrich auf dem Keller and Karim Beguir and Nicolas Lopez Carranza and Timothy Patrick Jenkins},
year = {2023},
doi = {10.1101/2023.08.30.555055},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/10.1101/2023.08.30.555055v1},
journal = {bioRxiv}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file instanovo-0.1.5.tar.gz
.
File metadata
- Download URL: instanovo-0.1.5.tar.gz
- Upload date:
- Size: 46.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a4eb3b07c865de08e2d77c4b6fc6ba06eabd13c490983fe206407a2d1822206 |
|
MD5 | 61a6f2758a518e0f0c0f638051369080 |
|
BLAKE2b-256 | 061fea1a77eacab9a296d3c0b03e80581152e59c2564fbaa069706e792855f53 |
File details
Details for the file instanovo-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: instanovo-0.1.5-py3-none-any.whl
- Upload date:
- Size: 53.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9419aa0b610249709cc6cc2e531642343705e6b2167acc1d44c5a7d49e4ac032 |
|
MD5 | 496d52ff9d9efd1d88a0ad4a181e457e |
|
BLAKE2b-256 | 086de01335d5d8e056a1695803f975346be9fb1dbdfaa5dbe35f845814244d36 |