Skip to main content

Generate reference database of variant peptides for peptide spectrum matching

Project description

Overview

varpepdb is a python package for generating a fasta database of genetically variant peptides for database searching after data acquisition by LC/MS. It takes in a list of amino acid substitutions for a protein sequence and generates all possible variant peptides after enzymatic cleaving. It allows for multiple digestion enzymes and up to 1 miscleavage. It also takes into account the effect of amino acid substitutions on enzyme cleavage.

Installation

Requires:

  • Python version: >= 3.8
  • rpg

You can install varpepdb from PyPI:

pip install varpepdb

Usage

import varpepdb
import rpg

# Set enzymes to Asp-N and Trypsin from the rpg package
varpepdb.setenzyme([rpg.RapidPeptidesGenerator.ALL_ENZYMES[1], 
                    rpg.RapidPeptidesGenerator.ALL_ENZYMES[41]])
# Allow 1 miscleave
varpepdb.setmiscleave(True)
# Set peptide length limits. Default values are 6 and 30. 
varpepdb.setpeptidelengths(min_length: 6, max_length: 30)

# Demo inputs. In practice, these will be generated programmatically.
variants = ['O75844:p.Trp11Trp', 
            'O75844:p.Ala22Lys', 
            'O75844:p.Glu34Thr', 
            'O75844:p.Gln41His']
sequence = 'MGMWASLDALWEMPAEKRIFGAVLLFSWTVYLWETFLAQRQRRIYKTTTH'
gene = 'ZMPSTE24'
identifier = 'B3KQI7'

# Generate variant peptides
peptides = generate_single(variants=variants, 
                           sequence=sequence, 
                           gene=gene,
                           identifier=identifier)

# Removes variant peptides that do not contain at least 1 amino acid substitution
var_peptides=varpepdb.variant_containing_peptides(peptides)

# Write variant peptides into fasta file
varpepdb.write(path='path/to/output.fasta', 
                     peptides=var_peptides,
                     include_non_unique=True)

Multiple proteins can be processed in parallel using varpepdb.generate.

peptides = generate(input_list=[(variant_list1, sequence1, gene1, identifier1),
                                (variant_list2, sequence2, gene2, identifier2),
                                (variant_list3, sequence3, gene3, identifier3)])

var_peptides=varpepdb.variant_containing_peptides(peptides)

varpepdb.write(path='path/to/output.fasta', 
                     peptides=var_peptides,
                     include_non_unique=True)

Enzymes

In-silico digestion is performed using the rpg package. Refer to rpg's documentation on which enzymes are available and how to create your own enzyme.

Fasta output

Example of an entry written into the fasta file:

>A0A8I5KQE6-v1 RPSA2 129-143 (p.Pro143Arg),p.Thr135Met 0
ADHQPLMEASYVNLR

A0A8I5KQE6-v1 is the sequence identifier of the parent protein (in this case the Uniprot ascension number) with
'v{number}' appended to identify it as a peptide of the parent protein.
RPSA2 is the name of the gene for this protein.
129-143 is the position of the parent protein sequence from which this peptide is dervied. (p.Pro143Arg) is an amino acid substitution that affected the enzyme cleavage site. Amino acid substitutions that introduce or remove cleavage sites are marked by parenthesis.
p.Thr135Met is an amino acid subtstitution that didn't affect enzyme cleavage site.
0 refers to the number of miscleavages

Contact

For further information please contact jaren_sia@htx.gov.sg

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

varpepdb-1.0.4.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

varpepdb-1.0.4-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file varpepdb-1.0.4.tar.gz.

File metadata

  • Download URL: varpepdb-1.0.4.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for varpepdb-1.0.4.tar.gz
Algorithm Hash digest
SHA256 68820d4694715bf31378d25c4112fdbeeaaabab1f932a6e2b324339765e05395
MD5 01dc3dda5896bd8a324b3745b07784a5
BLAKE2b-256 9b0a9edc50163e3fe734f5852b1d079865e01d11fa4013c7acd4cd15321df6a2

See more details on using hashes here.

File details

Details for the file varpepdb-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: varpepdb-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for varpepdb-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 dcad8230136fc4663030e9ae9a2a4f697e06385a6d68626ccdfe172ba74f2b02
MD5 7754c34995836d856cf67d45ea931e63
BLAKE2b-256 88e20b3fb5410122a177a0465b1ff0feed0dd67ad0e3aa72efa1217a7f24d1ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page