pepmatch·PyPI

Search tool for peptides and epitopes within a proteome, while considering potential residue substitutions.

Project description

Unit Tests

Author: Daniel Marrama

Peptide search against a reference proteome, or sets of proteins, with residue subtitutions.

Two step process: preprocessing and matching.

Preprocessed data is stored in a SQLite or pickle format and only has to be performed once.

As a competition to improve tool performance, we created a benchmarking framework with instructions here.

Requirements

Installation

pip install pepmatch

Inputs

Preprocessor

proteome - Path to proteome file to search against.
k - k-mer size to break up proteome into.
preprocessed_format - SQLite ("sqlite") or "pickle".
preprocessed_files_path - (optional) Directory where you want preprocessed files to go. Default is current directory.
gene_priority_proteome - (optional) Subset of proteome with prioritized protein IDs.\

Matcher

query - Query of peptides to search either in .fasta file or as a Python list.
proteome_file - Name of preprocessed proteome to search against.
max_mismatches - Maximum number of mismatches (substitutions) for query.
k - (optional) k-mer size of the preprocessed proteome. If no k is selected, then a best k will be calculated and the proteome will be preprocessed
preprocessed_files_path - (optional) Directory where preprocessed files are. Default is current directory.
best_match - (optional) Returns only one match per query peptide. It will output the best match.
output_format - (optional) Outputs results into a file (CSV, XLSX, JSON, HTML) or just as a dataframe.
output_name - (optional) Specify name of file for output. Leaving blank will generate a name.

Note: For now, due to performance, SQLite is used for exact matching and pickle is used for mismatching.

Note: PEPMatch can also search for discontinuous epitopes in the residue:index format. Example:

"R377, Q408, Q432, H433, F436, V441, S442, S464, K467, K489, I491, S492, N497"

Command Line Example

pepmatch-preprocess -p human.fasta -k 5 -f sql
pepmatch-match -q peptides.fasta -p human.fasta -m 0 -k 5

Exact Matching Example

from pepmatch import Preprocessor, Matcher

# proteome, k, preprocessed_format, target directory, gene_priority_proteome
Preprocessor(
  'proteomes/human.fasta', '.', 'proteomes/human_gp.fasta'
).sql_proteome(k = 5) # preprocessing only needs to be done once!

# query, proteome, max_mismatches, k, preprocessed files directory
Matcher(
  'queries/mhc_ligands_test.fasta', 'proteomes/human.fasta', 0, 5, '.'
).match()

Mismatching Example

from pepmatch import Preprocessor, Matcher

# proteome, k, preprocessed_format, target directory
Preprocessor('proteomes/human.fasta').pickle_proteome(k = 3)

# query, proteome, max_mismatches, k, preprocessed files directory
Matcher(
  'queries/neoepitopes_test.fasta', 'proteomes/human.fasta', 3, 3
).match()

Best Match Example

from pepmatch import Preprocessor, Matcher
Matcher(
  'queries/milk_peptides.fasta', 'proteomes/human.fasta', best_match=True
).match()

The best match parameter without k or mismatch inputs will produce the best match for each peptide in the query, meaning the match with the least number of mismatches, the best protein existence level, and if the match exists in the gene priority proteome. No preprocessing beforehand is required, as the Matcher class will do this for you to find the best match.

Outputs

As mentioned above, outputs can be specified with the output_format parameter in the Matcher class. The following formats are allowed: dataframe, csv, xlsx, json, and html.

If specifying dataframe, the match() method will return a pandas dataframe which can be stored as a variable:

df = Matcher('queries/neoepitopes_test.fasta', 'human.fasta', 3, 3, output_format='dataframe').match()

TODO

Test other key-value stores (Redis, Memcached, LMDB, etc.)
Remove dependency on Levenshtein (this is not maintained very well)

Project details

Release history Release notifications | RSS feed

1.3.0

Jun 28, 2025

1.2.0

Jun 26, 2025

1.1.2

Jun 25, 2025

1.1.1

Feb 5, 2025

1.1.0

Feb 5, 2025

1.0.5

Jun 10, 2024

1.0.4

Jun 10, 2024

1.0.3

Feb 25, 2024

1.0.2

Feb 23, 2024

1.0.1

Feb 6, 2024

1.0.0

Jan 25, 2024

0.9.6

Oct 12, 2023

0.9.5

Sep 13, 2023

0.9.4

Aug 4, 2023

This version

0.9.3

Jul 3, 2023

0.9.2

Jun 23, 2023

0.9.1

May 23, 2023

0.9.0

Mar 30, 2023

0.8.4

Mar 15, 2023

0.8.3

Mar 10, 2023

0.8.2

Mar 6, 2023

0.8.1

Mar 3, 2023

0.8

Feb 6, 2023

0.7.17

Jun 3, 2022

0.7.16

May 6, 2022

0.7.15

May 3, 2022

0.7.14

May 2, 2022

0.7.13

Apr 26, 2022

0.7.12

Apr 15, 2022

0.7.10

Mar 7, 2022

0.7.9

Feb 8, 2022

0.7.8

Dec 14, 2021

0.7.7

Oct 22, 2021

0.7.6

Oct 13, 2021

0.7.5

Oct 13, 2021

0.7.4

Oct 7, 2021

0.7.3

Oct 6, 2021

0.7.2

Jul 16, 2021

0.7

Jul 1, 2021

0.6.3

Jun 8, 2021

0.6.2

Jun 8, 2021

0.6.1

Jun 5, 2021

0.6.0

Jun 5, 2021

0.5.3

Mar 30, 2021

0.5.2

Mar 5, 2021

0.5.1

Mar 3, 2021

0.5

Mar 3, 2021

0.4.2

Mar 2, 2021

0.4.1

Feb 23, 2021

0.4

Feb 18, 2021

0.3.3

Feb 16, 2021

0.3.2

Feb 16, 2021

0.3.1

Jan 26, 2021

0.3

Jan 26, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepmatch-0.9.3.tar.gz (21.5 kB view details)

Uploaded Jul 3, 2023 Source

Built Distribution

pepmatch-0.9.3-py3-none-any.whl (23.7 kB view details)

Uploaded Jul 3, 2023 Python 3

File details

Details for the file pepmatch-0.9.3.tar.gz.

File metadata

Download URL: pepmatch-0.9.3.tar.gz
Upload date: Jul 3, 2023
Size: 21.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.16

File hashes

Hashes for pepmatch-0.9.3.tar.gz
Algorithm	Hash digest
SHA256	`5bccc99a89e5d3298589492bc683f94fdf3f21fdbe8c952bbb23828b815ad52c`
MD5	`0c1afb285b8777afc9015abe98dda87d`
BLAKE2b-256	`f0a5d693fc5f0dba2cbd92a4b5e48ebc95e14196dbf3499387fa68841dc159eb`

See more details on using hashes here.

File details

Details for the file pepmatch-0.9.3-py3-none-any.whl.

File metadata

Download URL: pepmatch-0.9.3-py3-none-any.whl
Upload date: Jul 3, 2023
Size: 23.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.16

File hashes

Hashes for pepmatch-0.9.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`11809eb47d2a88f206bbe291433cba209442e475e5a1845271570f959c056409`
MD5	`441565b72f40e9931e3f1074f447be46`
BLAKE2b-256	`3d8eb93d9d85de970befe31f386be0e14a2b067bcb832d0e9b0b0ba0a601f8fe`

See more details on using hashes here.

pepmatch 0.9.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Author: Daniel Marrama

Requirements

Installation

Inputs

Preprocessor

Matcher

Command Line Example

Exact Matching Example

Mismatching Example

Best Match Example

Outputs

TODO

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes