Search tool for peptides and epitopes within a proteome, while considering potential residue substitutions.
Project description
Author: Daniel Marrama
PEPMatch is a high-performance peptide search tool for finding short peptide sequences within a reference proteome. Powered by a Rust engine with Python bindings, it delivers sub-second search times across entire proteomes while maintaining a simple Python API.
Key Features
- Blazing Fast: Rust-powered search engine with automatic multi-core parallelization via Rayon. Search thousands of peptides against the entire human proteome in seconds.
- Unified Index Format: Single
.pepidxbinary format stores sequences, metadata, and k-mer index in one memory-mapped file. Preprocess once, search repeatedly. - Versatile Searching: Exact matches, mismatch-tolerant searches, best match mode, and discontinuous epitope support.
- Simple API: Two classes —
PreprocessorandMatcher— handle everything. - Flexible I/O: Accepts queries from FASTA files, text files, or Python lists. Outputs to CSV, TSV, XLSX, JSON, or Polars DataFrame.
Requirements
Installation
pip install pepmatch
Quick Start
from pepmatch import Preprocessor, Matcher
# Preprocess a proteome (one-time step)
Preprocessor('human.fasta').preprocess(k=5)
# Search for exact matches
df = Matcher(
query=['YLLDLHSYL', 'GLCTLVAML', 'FAKEPEPTIDE'],
proteome_file='human.fasta',
max_mismatches=0,
k=5
).match()
print(df)
Preprocessing
Preprocessing builds a .pepidx index from your proteome FASTA file. This only needs to be done once per proteome and k-mer size. If a .pepidx file doesn't exist when you search, Matcher will create it automatically.
from pepmatch import Preprocessor
Preprocessor('human.fasta').preprocess(k=5)
CLI:
pepmatch-preprocess -p human.fasta -k 5
Flags
-p,--proteome(Required): Path to the proteome FASTA file.-k,--kmer_size(Required): The k-mer size for indexing.-n,--proteome_name: Custom name for the proteome.-P,--preprocessed_files_path: Directory to save preprocessed files.
Matching
Exact Matching
from pepmatch import Matcher
df = Matcher(
query='peptides.fasta',
proteome_file='human.fasta',
max_mismatches=0,
k=5
).match()
Mismatch Searching
df = Matcher(
query='neoepitopes.fasta',
proteome_file='human.fasta',
max_mismatches=3,
k=3
).match()
Best Match
Automatically finds the optimal match for each peptide by trying different k-mer sizes and mismatch thresholds. No manual preprocessing required.
df = Matcher(
query='peptides.fasta',
proteome_file='human.fasta',
best_match=True
).match()
Discontinuous Epitope Searching
Search for non-contiguous residues defined by their positions.
df = Matcher(
query=[
"R377, Q408, Q432, H433, F436",
"S2760, V2763, E2773, D2805, T2819"
],
proteome_file='sars-cov-2.fasta',
max_mismatches=1
).match()
Mixed Queries
Linear peptides and discontinuous epitopes can be searched together.
df = Matcher(
query=[
'YLLDLHSYL',
'R377, Q408, Q432, H433, F436',
'GLCTLVAML',
],
proteome_file='sars-cov-2.fasta',
max_mismatches=0
).match()
Query Input Formats
- Python list:
['YLLDLHSYL', 'GLCTLVAML'] - FASTA file:
.fasta,.fas,.fa,.fna,.ffn,.faa,.mpfa,.frn - Text file:
.txtwith one peptide per line
CLI:
pepmatch-match -q peptides.fasta -p human.fasta -m 0 -k 5
Flags
-q,--query(Required): Path to the query file.-p,--proteome_file(Required): Path to the proteome FASTA file.-m,--max_mismatches: Maximum mismatches allowed (default: 0).-k,--kmer_size: K-mer size (default: 5).-P,--preprocessed_files_path: Directory containing preprocessed files.-b,--best_match: Enable best match mode.-f,--output_format: Output format —csv,tsv,xlsx,json(default:csv).-o,--output_name: Output file name (without extension).-v,--sequence_version: Disable sequence versioning on protein IDs.
Output Formats
dataframe(default for API): Returns a Polars DataFrame.csv(default for CLI): CSV file.tsv: Tab-separated file.xlsx: Excel file.json: JSON file.
Performance
Benchmarked searching ~2,000 peptides against the human proteome (~200,000 proteins):
| Mode | Time |
|---|---|
| Exact match (k=5) | ~0.06s |
| 1 mismatch (k=3) | ~1.5s |
| 2 mismatches (k=3) | ~1.9s |
| 3 mismatches (k=3) | ~3.7s |
Citation
If you use PEPMatch in your research, please cite:
Marrama D, Chronister WD, Westernberg L, et al. PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins. BMC Bioinformatics. 2023;24(1):485. Published 2023 Dec 18. doi:10.1186/s12859-023-05606-4
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pepmatch-1.16.1.tar.gz.
File metadata
- Download URL: pepmatch-1.16.1.tar.gz
- Upload date:
- Size: 39.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c1a9f6ac42a3c963fab061d7b4d040f10cc05593e50f3377330186d9ef69e26
|
|
| MD5 |
bbe8fd021c56070ed07af44ba4f850a4
|
|
| BLAKE2b-256 |
dc05c3c6d892de7d3bde301848cde62baff3b4f2f3fea0f13136410371d2cf09
|
File details
Details for the file pepmatch-1.16.1-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 241.9 kB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af2629a167816fc050bf6da340b69dbddaf86df3e129dcc36b59524c770c6c6e
|
|
| MD5 |
3a5a27dc53eae6c8ec14094eaffed672
|
|
| BLAKE2b-256 |
3e635fe3cfb1439d5c2fa7119de601fb7d11355ff46d1bd4635675f7f9066171
|
File details
Details for the file pepmatch-1.16.1-cp313-cp313-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp313-cp313-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 364.7 kB
- Tags: CPython 3.13, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb9b621b941cc97a403ce0cade17d12029121868b0dbbcd6e17882c1f97c9a3d
|
|
| MD5 |
d6aa379907519d5c3d44f35706272adc
|
|
| BLAKE2b-256 |
f886414da93eb1510a266d7d7ab258d49d8d533515a2c934f406e22a311ac810
|
File details
Details for the file pepmatch-1.16.1-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 325.7 kB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2186ab85eff43f025628c8f7ba1541d0c7454cac347ba33d51418bf5ac0b30c8
|
|
| MD5 |
e2d132e3c1ab27e1b6fa026cf924b3e6
|
|
| BLAKE2b-256 |
4a3178d0a13ca38e3370b79a6e7ab0fd8c30b94d1d008f7ecbc2cc2274997ede
|
File details
Details for the file pepmatch-1.16.1-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 242.0 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f2c6a49f40291b3d58929e72eb66aa5c32dc5b7ff36a7a96fecaae39ce925dd
|
|
| MD5 |
77bdf74312b3c4211786a20a8f15d642
|
|
| BLAKE2b-256 |
055acb92c208cfb9be4b7e00da41ee87605c1c886538a2d6c89cf4ca46b4c91e
|
File details
Details for the file pepmatch-1.16.1-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 364.8 kB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9af234648055a34f60c6aba0187b8a49bef4bc629404a3d0f6117ed96b2e20d6
|
|
| MD5 |
9f2722666bb814bad4e8471f8f08fe54
|
|
| BLAKE2b-256 |
1241f7aec590f3766d0149c1b1069917ec168980df9461cc650f622384b36669
|
File details
Details for the file pepmatch-1.16.1-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 325.9 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f43a79754151c470d5d0e45bb295d7eb49826a76d83b11d0e4da1368264a59e5
|
|
| MD5 |
056292a5ea6a65e6e3c207bf0e1a1939
|
|
| BLAKE2b-256 |
c73c8bef6a0e4127343cf94cc64f2f551aa6c4765f97a1e4d7db61888c2e3f6a
|
File details
Details for the file pepmatch-1.16.1-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 241.4 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf045699f9290fe68c06c4c4a4c19674d1fdba84b87ae4c8866a58aeb4ae3a19
|
|
| MD5 |
b29d78a2c79336c335cc75b3b9fb1745
|
|
| BLAKE2b-256 |
a241650c88ec00c3cedc56847d1b9408b87bbf711c0300fe23e26e91c2bd5b98
|
File details
Details for the file pepmatch-1.16.1-cp311-cp311-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 364.1 kB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5b859eac0f6961208be94ff104eb1c6b598a16b5824f6c7fa1fae6b800efdf9
|
|
| MD5 |
9d535426003157fef9b8b53e0c68e431
|
|
| BLAKE2b-256 |
6dc401935eee1937eaf256d3b6db964f5cb5e5c464affd9123c5b0fc604c1594
|
File details
Details for the file pepmatch-1.16.1-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 326.2 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f689d18aaf150cb9f72f8e9c6b8463c0f4597a04d3e9624a64edb663aa77a31a
|
|
| MD5 |
8e1e65e9025aecbfb8bd35cb8ba5fa58
|
|
| BLAKE2b-256 |
da9f98cfbf54f2ec4786422e5f995f7451e0bebae16d3d4de5a9abf70a7513fc
|
File details
Details for the file pepmatch-1.16.1-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 241.6 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a2e6a2fbbf37e2cf01bd764e1360face93aa53ae95dd9afacb7546e06a680c1
|
|
| MD5 |
8030bdb2bad71c21f5de0589c6518d43
|
|
| BLAKE2b-256 |
e3e9bc57027aa06be849f0c5e7c69472e461f422f074c068b194a8f4e70d518b
|
File details
Details for the file pepmatch-1.16.1-cp310-cp310-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp310-cp310-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 364.3 kB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fc416b9a2efd17d6f3a672759a706847be43f665480a5a92f969442aee74fa6
|
|
| MD5 |
96cadb48afe07fef3a7040b0d18cbe78
|
|
| BLAKE2b-256 |
4ea32db1c01e0fb456bec5e1bf98be3e1b8276066e35a9a09d8173ebf620c866
|
File details
Details for the file pepmatch-1.16.1-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: pepmatch-1.16.1-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 326.5 kB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b2b8d3160b7598629349ddca548c26a4118c1722be82bfed0e2324a0ee8c7a3
|
|
| MD5 |
f287f7b696a526557511e9348dab76cf
|
|
| BLAKE2b-256 |
0a9100572fa8915c074c7dcb55a8d312d0488558476f35a6d7aeaa15e5e8954a
|