Skip to main content

De novo sequencing with InstaNovo

Project description

De novo peptide sequencing with InstaNovo

PyPI version Open In Colab

The official code repository for InstaNovo. This repo contains the code for training and inference of InstaNovo and InstaNovo+. InstaNovo is a transformer neural network with the ability to translate fragment ion peaks into the sequence of amino acids that make up the studied peptide(s). InstaNovo+, inspired by human intuition, is a multinomial diffusion model that further improves performance by iterative refinement of predicted sequences.

Graphical Abstract

Links:

Developed by:

Usage

Installation

To use InstaNovo, we need to install the module via pip:

pip install instanovo

It is recommended to install InstaNovo in a fresh environment, such as Conda or PyEnv. For example, if you have anaconda/miniconda installed:

conda create -n instanovo python=3.8
conda activate instanovo

Note: InstaNovo is built for Python >= 3.8

Training

To train auto-regressive InstaNovo:

usage: python -m instanovo.transformer.train train_path valid_path [-h] [--config CONFIG] [--n_gpu N_GPU] [--n_workers N_WORKERS]

required arguments:
  train_path        Training data path
  valid_path        Validation data path

optional arguments:
  --config CONFIG   file in configs folder
  --n_workers N_WORKERS

Note: data is expected to be saved as Polars .ipc format. See section on data conversion.

To update the InstaNovo model config, modify the config file under configs/instanovo/base.yaml

Prediction

To evaluate InstaNovo:

usage: python -m instanovo.transformer.predict data_path model_path [-h] [--denovo] [--config CONFIG] [--subset SUBSET] [--knapsack_path KNAPSACK_PATH] [--n_workers N_WORKERS]

required arguments:
  data_path         Evaluation data path
  model_path        Model checkpoint path

optional arguments:
  --denovo          evaluate in de novo mode, will not try to compute metrics
  --output_path OUTPUT_PATH
                    Save predictions to a csv file (required in de novo mode)
  --subset SUBSET
                    portion of set to evaluate
  --knapsack_path KNAPSACK_PATH
                    path to pre-computed knapsack
  --n_workers N_WORKERS

Converting datasets to Polars

To convert a dataset to polars .ipc:

usage: python -m instanovo.utils.convert_ipc source target [-h] [--source_type SOURCE_TYPE] [--max_charge MAX_CHARGE]

required arguments:
  source            source data
  target            target ipc file

optional arguments:
  --source_type SOURCE_TYPE
                    type of input data. currently supports [mgf, csv]
  --max_charge MAX_CHARGE
                    maximum charge to filter

Roadmap

This code repo is currently under construction.

ToDo:

  • Add diffusion model code
  • Add data preprocessing pipeline
  • Multi-GPU support

License

Code is licensed under the Apache License, Version 2.0 (see LICENSE)

The model checkpoints are licensed under Creative Commons Non-Commercial (CC BY-NC-SA 4.0)

BibTeX entry and citation info

@article{eloff_kalogeropoulos_2023_instanovo,
	title = {De novo peptide sequencing with InstaNovo: Accurate, database-free peptide identification for large scale proteomics experiments},
	author = {Kevin Eloff and Konstantinos Kalogeropoulos and Oliver Morell and Amandla Mabona and Jakob Berg Jespersen and Wesley Williams and Sam van Beljouw and Marcin Skwark and Andreas Hougaard Laustsen and Stan J. J. Brouns and Anne Ljungars and Erwin Marten Schoof and Jeroen Van Goey and Ulrich auf dem Keller and Karim Beguir and Nicolas Lopez Carranza and Timothy Patrick Jenkins},
	year = {2023},
	doi = {10.1101/2023.08.30.555055},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/10.1101/2023.08.30.555055v1},
	journal = {bioRxiv}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instanovo-0.1.5.tar.gz (46.0 kB view details)

Uploaded Source

Built Distribution

instanovo-0.1.5-py3-none-any.whl (53.9 kB view details)

Uploaded Python 3

File details

Details for the file instanovo-0.1.5.tar.gz.

File metadata

  • Download URL: instanovo-0.1.5.tar.gz
  • Upload date:
  • Size: 46.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for instanovo-0.1.5.tar.gz
Algorithm Hash digest
SHA256 8a4eb3b07c865de08e2d77c4b6fc6ba06eabd13c490983fe206407a2d1822206
MD5 61a6f2758a518e0f0c0f638051369080
BLAKE2b-256 061fea1a77eacab9a296d3c0b03e80581152e59c2564fbaa069706e792855f53

See more details on using hashes here.

File details

Details for the file instanovo-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: instanovo-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 53.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for instanovo-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9419aa0b610249709cc6cc2e531642343705e6b2167acc1d44c5a7d49e4ac032
MD5 496d52ff9d9efd1d88a0ad4a181e457e
BLAKE2b-256 086de01335d5d8e056a1695803f975346be9fb1dbdfaa5dbe35f845814244d36

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page