Skip to main content

Peptide Matcher

Project description

PepAln: Simple Peptide Alignment Visualization

This Python package is designed match short peptide sequences detected via Mass Spectroscopy to a FASTA file then produce alignment outputs in various formats. An input file format would be:

Peptide     F145I/Dd2Dd2    Mass_Spec_Mode
VG;GV          3.493           POS
PA             2.454           POS
SP             4.701           NEG

Installation

pip install pepaln

Usage

python -m pepaln -m fragments.txt -r reference.fa

Generates the files called output.gff, output.txt and output.pdf

What does this package do?

A collaborator asked me to align short peptides from a Mass Spec experiment to a sequence, then show him an image that displays in an easy-to-see format where does each peptide align and which regions are not covered.

For example, when they had a series of short fragments like:

VL LS LSP LSPAD PA NVKAA NVK VKA AA

And a origin sequence of:

VLSPADKTNVKAAWGK

They wanted to see it aligned like so :

VLSPADKTNVKAAWG
      **      
VL PA   NVKAA  
 LS     NVK    
 LSP     VKA     
 LSPAD     AA     

The * above indicates a region that is not covered. In addition they wanted to display different peptides with colors as well.

I was unable to locate a tool that fulfills this need, hence I wrote this package.

Input data

The input consists of a tab delimited format with at least three columns:

Peptide     F145I/Dd2Dd2    Mass_Spec_Mode
VG;GV          3.493           POS
PA             2.454           POS
SP             4.701           NEG

Where:

  1. The first column lists the peptide sequence (multiple sequences may be listed separated with a semicolon ;).
  2. The second column lists a value
  3. The third column indicates the ionization mode

The reference fasta file may contain more than one target sequence.

>ha
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPN
>hb
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHL

Outputs

The tool will generate outputs in three formats TXT, GFF as well as PDF formats. The default filenames are

  • output.txt, output.gff, output.pdf

You may override each.

Text output:

>ha (Mode=POS)
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPN
             **                    *   *         *                            
VL PA   NVKAA  KVGA AGEYG  AL RMF   PTT TYF HFD   GSAQV   GKKV DAL  AV      PN
 LS  DKTNVK    KVGAHA EY    LE   LS      YFPH DL    AQVKG GKKVADA TNAVAHVDDM  
 LSP     VKA    VGA  GEYGA                FPH DLS    QV     KVA AL  AVAH      
 LSPAD     AA    GAHA   GAEA               PHF LS    QVK     VA ALTNA AHV     
   PADK           AHAG                     PHFD       VKGH       LT  VA       
                   HAGEYG                   HFDL       KGHGKKVA      VAH      

PDF output

The peptides are colored by their value field:

GFF output:

ha	VL	.	1	2	.	2.433	.	Mode=POS
ha	LS	.	2	3	.	4.806	.	Mode=POS
ha	LSP	.	2	4	.	2.522	.	Mode=POS
ha	LSPAD	.	2	6	.	1.613	.	Mode=POS
ha	PA	.	4	5	.	2.2	.	Mode=POS
ha	PADK	.	4	7	.	1.548	.	Mode=POS
ha	DKTNVK	.	6	11	.	1.845	.	Mode=POS
ha	NVKAA	.	9	13	.	3.012	.	Mode=POS
ha	VKA	.	10	12	.	3.986	.	Mode=POS
...

Help

$ python -m pepaln
usage: __main__.py [-h] [-m MASS] [-r REF] [-p output.pdf] [-t output.txt]
                   [-g output.gff]

optional arguments:
  -h, --help            show this help message and exit
  -m MASS, --mass MASS  Mass-spec result file containing peptide sequences.
  -r REF, --ref REF     Reference file to match the peptides against.
  -p output.pdf, --pdf output.pdf
                        Output file for pdf file
  -t output.txt, --txt output.txt
                        Output file for text alignments
  -g output.gff, --gff output.gff
                        Output file as GFF data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepaln-1.0.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pepaln-1.0.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file pepaln-1.0.0.tar.gz.

File metadata

  • Download URL: pepaln-1.0.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for pepaln-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b9bfefc5bce8f1cacdbfb787b48bac039b665da4b3a8c188054722054ed69c42
MD5 aed2f29bddecef45c08a59e5d85d6510
BLAKE2b-256 525a9a7be0a10bfc659c2ab6e7ae65fdcf2843858579053f132d7cf13816c940

See more details on using hashes here.

File details

Details for the file pepaln-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pepaln-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for pepaln-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d65f88a1cf44377a8d0d199406545baeb50da263700d60a7a81ce8d52d88a68
MD5 72001508d07ea292a778e83422ebe161
BLAKE2b-256 b499e78928bf480dc51f66429d781cfda7e8d79e133063daacf761b4199e2f90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page