Skip to main content

Peptide matcher

Project description

peptide-matcher

peptide-matcher is a piece of software that can be used for matching peptide sequences identified in proteomics experiments using a database-match or a de novo approach against a sequence database. The main purpose is to extract sequence context for the corresponding matches, but peptide-matcher can also provide structural context if provided with a database that includes structural information, see peptide-matcher-data.

There are three ways of how to use peptide-matcher:

  • the GUI peptide_matcher_gui
  • the CLI peptide_matcher
  • the python class peptide_matcher.PeptideMatcher

The GUI is written with wxWidgets. Other dependencies include biopython and pyahocorasick.

Installation

Install with pipy: pip3 install peptide_matcher.

How to use the GUI

interface

Two files are needed: the database in fasta format with optional structural annotations and a plain list of peptide sequences.

The optional structural annotations should follow a custom format. Databases generated based on alphafold's models for a couple of popular model organisms are distributed at peptide-matcher-data.

The results of the peptide matching are returned to the GUI and can be saved as xlsx. For each peptide the following output is generated:

Field Description Example Values
Peptide peptide sequence QVHAVSFYSK string of amino acid symbols
Length peptide length 10 integer
Protein matching protein id A6NL46 string
Start start position (1-based) 150 integer
End end position (1-based) 159 integer
C-term distance to protein's C-terminus 182 integer
N-flank N-flanking residues in this protein TDKA string
C-flank C-flanking residues in this protein GHGV string
N-flank* weblogo for each position of the N-flank 2T|2D|2K|2A | - separator between positions
C-flank* weblogo for each position of the C-flank 1G1D|2H|1G1E|2V
N-flank SS secondary structure for the N-flank HHHH string of DSSP codes
Peptide SS same for the peptide itself HH------EE
C-flank SS same for the C-flank region EEEE
N-flank TM transmembrane region for the N-flank TTTT string of: T - TM region, S - signal peptide
Peptide TM same for the peptide itself TT--------
C-flank TM same for the C-flank region ----
N-flank conf alphafold's pLDDT score for the N-flank 43,46,40,49 list of integers 0-100
Peptide conf same for the peptide itself 44,44,45,44,50,39,48,39,56,46
C-flank conf same for the C-flank 49,47,42,46
N-flank RSA relative solvent accessibility for the N-flank 81,79,84,71 list of integers 0-100
Peptide RSA same for the peptide itself 90,78,75,78,54,62,73,84,73,81
C-flank RSA same for the C-flank 67,78,71,80

In the provided database, the RSA values are calculated by dividing the absolute solvent accessibility (ASA) as produced by dssp (mkdssp v.3.0.0) by the theoretical maximum values for ASA from Tien et al 2013.

How to use CLI

Check out peptide_matcher -h:

$ peptide_matcher -h
usage: peptide_matcher [-h] --peptides FILENAME --database FILENAME [--secstruct] [--flanks N] [--format {json,tsv,csv}] [--output OUTPUT]

Match peptides in a protein database.

optional arguments:
  -h, --help            show this help message and exit
  --peptides FILENAME, -p FILENAME
                        list of peptides to match
  --database FILENAME, -d FILENAME
                        protein database in fasta format
  --secstruct, -s       whether the database also contains structural information
  --flanks N, -f N      length of the flanks to report (default: 4)
  --format {json,tsv,csv}, -F {json,tsv,csv}
                        output format (default: json)
  --output OUTPUT, -o OUTPUT
                        output file (default: output to stdout)

The output is similar to that of the GUI. The header of the tabular output formats looks as follows: [ 'peptide', 'peplen', 'record_id', 'start', 'end', 'c_term', 'n_flank', 'c_flank', 'n_logos', 'c_logos', 'sst_n_term', 'sst_pept', 'sst_c_term', 'tm_n_term', 'tm_pept', 'tm_c_term', 'conf_n_term', 'conf_pept', 'conf_c_term', 'acc_n_term', 'acc_pept', 'acc_c_term' ]. The json output is a list of dictionaries with each one of the following format: {"peptide": "IYGALAVGAP", "matches": [{"record_id": "P77549", "start": 157, "end": 166, "c_term": 227, "n_flank": "NGMA", "c_flank": "LGLL", "sst_n_term": "HHHH", "sst_pept": "HHHHHHHHHH", "sst_c_term": "HHHH", "tm_n_term": "----", "tm_pept": "----------", "tm_c_term": "----", "conf_n_term": [94, 89, 91, 94], "conf_pept": [93, 86, 88, 94, 89, 85, 90, 92, 86, 88], "conf_c_term": [93, 94, 91, 94], "acc_n_term": [3, 6, 25, 6], "acc_pept": [9, 24, 19, 0, 25, 50, 44, 0, 22, 45], "acc_c_term": [36, 0, 37, 47]}], "n_logos": [{"N": 1}, {"G": 1}, {"M": 1}, {"A": 1}], "c_logos": [{"L": 1}, {"G": 1}, {"L": 1}, {"L": 1}]}.

How to use the API

from peptide_matcher import PeptideMatcher, wrap_logos, wrap_scores
peptides = [ 'IYGALAVGAP', 'LTCDETPVFSGSVLN', 'KRFARESGMTLL', 'GAGFAELLSSLQTPEIK', 'RTGHKLV' ] # or a file handle
database = 'UP000000625_83333_ECOLI.fasta' # or a file handle
flanks = 4
secstruct = True
pm = PeptideMatcher(peptides, database, secstruct, flanks)
for output in pm.run():
    print(output)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peptide_matcher-2.0.0.tar.gz (24.5 kB view hashes)

Uploaded Source

Built Distribution

peptide_matcher-2.0.0-py3-none-any.whl (23.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page