Skip to main content

Search tool for peptides and epitopes within a proteome, while considering potential residue substitutions.

Project description

PEPMatch Logo


Unit Tests

Author: Daniel Marrama

PEPMatch is a high-performance peptide search tool for finding short peptide sequences within a reference proteome. Powered by a Rust engine with Python bindings, it delivers sub-second search times across entire proteomes while maintaining a simple Python API.

Key Features

  • Blazing Fast: Rust-powered search engine with automatic multi-core parallelization via Rayon. Search thousands of peptides against the entire human proteome in seconds.
  • Unified Index Format: Single .pepidx binary format stores sequences, metadata, and k-mer index in one memory-mapped file. Preprocess once, search repeatedly.
  • Versatile Searching: Exact matches, mismatch-tolerant searches, best match mode, and discontinuous epitope support.
  • Simple API: Two classes — Preprocessor and Matcher — handle everything.
  • Flexible I/O: Accepts queries from FASTA files, text files, or Python lists. Outputs to CSV, TSV, XLSX, JSON, or Polars DataFrame.

Requirements

Installation

pip install pepmatch

Quick Start

from pepmatch import Preprocessor, Matcher

# Preprocess a proteome (one-time step)
Preprocessor('human.fasta').preprocess(k=5)

# Search for exact matches
df = Matcher(
  query=['YLLDLHSYL', 'GLCTLVAML', 'FAKEPEPTIDE'],
  proteome_file='human.fasta',
  max_mismatches=0,
  k=5
).match()

print(df)

Preprocessing

Preprocessing builds a .pepidx index from your proteome FASTA file. This only needs to be done once per proteome and k-mer size. If a .pepidx file doesn't exist when you search, Matcher will create it automatically.

from pepmatch import Preprocessor

Preprocessor('human.fasta').preprocess(k=5)

CLI:

pepmatch-preprocess -p human.fasta -k 5

Flags

  • -p, --proteome (Required): Path to the proteome FASTA file.
  • -k, --kmer_size (Required): The k-mer size for indexing.
  • -n, --proteome_name: Custom name for the proteome.
  • -P, --preprocessed_files_path: Directory to save preprocessed files.

Matching

Exact Matching

from pepmatch import Matcher

df = Matcher(
  query='peptides.fasta',
  proteome_file='human.fasta',
  max_mismatches=0,
  k=5
).match()

Mismatch Searching

df = Matcher(
  query='neoepitopes.fasta',
  proteome_file='human.fasta',
  max_mismatches=3,
  k=3
).match()

Best Match

Automatically finds the optimal match for each peptide by trying different k-mer sizes and mismatch thresholds. No manual preprocessing required.

df = Matcher(
  query='peptides.fasta',
  proteome_file='human.fasta',
  best_match=True
).match()

Discontinuous Epitope Searching

Search for non-contiguous residues defined by their positions.

df = Matcher(
  query=[
    "R377, Q408, Q432, H433, F436",
    "S2760, V2763, E2773, D2805, T2819"
  ],
  proteome_file='sars-cov-2.fasta',
  max_mismatches=1
).match()

Mixed Queries

Linear peptides and discontinuous epitopes can be searched together.

df = Matcher(
  query=[
    'YLLDLHSYL',
    'R377, Q408, Q432, H433, F436',
    'GLCTLVAML',
  ],
  proteome_file='sars-cov-2.fasta',
  max_mismatches=0
).match()

Query Input Formats

  • Python list: ['YLLDLHSYL', 'GLCTLVAML']
  • FASTA file: .fasta, .fas, .fa, .fna, .ffn, .faa, .mpfa, .frn
  • Text file: .txt with one peptide per line

CLI:

pepmatch-match -q peptides.fasta -p human.fasta -m 0 -k 5

Flags

  • -q, --query (Required): Path to the query file.
  • -p, --proteome_file (Required): Path to the proteome FASTA file.
  • -m, --max_mismatches: Maximum mismatches allowed (default: 0).
  • -k, --kmer_size: K-mer size (default: 5).
  • -P, --preprocessed_files_path: Directory containing preprocessed files.
  • -b, --best_match: Enable best match mode.
  • -f, --output_format: Output format — csv, tsv, xlsx, json (default: csv).
  • -o, --output_name: Output file name (without extension).
  • -v, --sequence_version: Disable sequence versioning on protein IDs.

Output Formats

  • dataframe (default for API): Returns a Polars DataFrame.
  • csv (default for CLI): CSV file.
  • tsv: Tab-separated file.
  • xlsx: Excel file.
  • json: JSON file.

Performance

Benchmarked searching ~2,000 peptides against the human proteome (~200,000 proteins):

Mode Time
Exact match (k=5) ~0.06s
1 mismatch (k=3) ~1.5s
2 mismatches (k=3) ~1.9s
3 mismatches (k=3) ~3.7s

Citation

If you use PEPMatch in your research, please cite:

Marrama D, Chronister WD, Westernberg L, et al. PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins. BMC Bioinformatics. 2023;24(1):485. Published 2023 Dec 18. doi:10.1186/s12859-023-05606-4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepmatch-1.16.1.tar.gz (39.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pepmatch-1.16.1-cp313-cp313-win_amd64.whl (241.9 kB view details)

Uploaded CPython 3.13Windows x86-64

pepmatch-1.16.1-cp313-cp313-manylinux_2_34_x86_64.whl (364.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

pepmatch-1.16.1-cp313-cp313-macosx_11_0_arm64.whl (325.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pepmatch-1.16.1-cp312-cp312-win_amd64.whl (242.0 kB view details)

Uploaded CPython 3.12Windows x86-64

pepmatch-1.16.1-cp312-cp312-manylinux_2_34_x86_64.whl (364.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

pepmatch-1.16.1-cp312-cp312-macosx_11_0_arm64.whl (325.9 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pepmatch-1.16.1-cp311-cp311-win_amd64.whl (241.4 kB view details)

Uploaded CPython 3.11Windows x86-64

pepmatch-1.16.1-cp311-cp311-manylinux_2_34_x86_64.whl (364.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

pepmatch-1.16.1-cp311-cp311-macosx_11_0_arm64.whl (326.2 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pepmatch-1.16.1-cp310-cp310-win_amd64.whl (241.6 kB view details)

Uploaded CPython 3.10Windows x86-64

pepmatch-1.16.1-cp310-cp310-manylinux_2_34_x86_64.whl (364.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

pepmatch-1.16.1-cp310-cp310-macosx_11_0_arm64.whl (326.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file pepmatch-1.16.1.tar.gz.

File metadata

  • Download URL: pepmatch-1.16.1.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.1.tar.gz
Algorithm Hash digest
SHA256 6c1a9f6ac42a3c963fab061d7b4d040f10cc05593e50f3377330186d9ef69e26
MD5 bbe8fd021c56070ed07af44ba4f850a4
BLAKE2b-256 dc05c3c6d892de7d3bde301848cde62baff3b4f2f3fea0f13136410371d2cf09

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 241.9 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 af2629a167816fc050bf6da340b69dbddaf86df3e129dcc36b59524c770c6c6e
MD5 3a5a27dc53eae6c8ec14094eaffed672
BLAKE2b-256 3e635fe3cfb1439d5c2fa7119de601fb7d11355ff46d1bd4635675f7f9066171

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.1-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 eb9b621b941cc97a403ce0cade17d12029121868b0dbbcd6e17882c1f97c9a3d
MD5 d6aa379907519d5c3d44f35706272adc
BLAKE2b-256 f886414da93eb1510a266d7d7ab258d49d8d533515a2c934f406e22a311ac810

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2186ab85eff43f025628c8f7ba1541d0c7454cac347ba33d51418bf5ac0b30c8
MD5 e2d132e3c1ab27e1b6fa026cf924b3e6
BLAKE2b-256 4a3178d0a13ca38e3370b79a6e7ab0fd8c30b94d1d008f7ecbc2cc2274997ede

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 242.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 8f2c6a49f40291b3d58929e72eb66aa5c32dc5b7ff36a7a96fecaae39ce925dd
MD5 77bdf74312b3c4211786a20a8f15d642
BLAKE2b-256 055acb92c208cfb9be4b7e00da41ee87605c1c886538a2d6c89cf4ca46b4c91e

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9af234648055a34f60c6aba0187b8a49bef4bc629404a3d0f6117ed96b2e20d6
MD5 9f2722666bb814bad4e8471f8f08fe54
BLAKE2b-256 1241f7aec590f3766d0149c1b1069917ec168980df9461cc650f622384b36669

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f43a79754151c470d5d0e45bb295d7eb49826a76d83b11d0e4da1368264a59e5
MD5 056292a5ea6a65e6e3c207bf0e1a1939
BLAKE2b-256 c73c8bef6a0e4127343cf94cc64f2f551aa6c4765f97a1e4d7db61888c2e3f6a

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 241.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 cf045699f9290fe68c06c4c4a4c19674d1fdba84b87ae4c8866a58aeb4ae3a19
MD5 b29d78a2c79336c335cc75b3b9fb1745
BLAKE2b-256 a241650c88ec00c3cedc56847d1b9408b87bbf711c0300fe23e26e91c2bd5b98

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a5b859eac0f6961208be94ff104eb1c6b598a16b5824f6c7fa1fae6b800efdf9
MD5 9d535426003157fef9b8b53e0c68e431
BLAKE2b-256 6dc401935eee1937eaf256d3b6db964f5cb5e5c464affd9123c5b0fc604c1594

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f689d18aaf150cb9f72f8e9c6b8463c0f4597a04d3e9624a64edb663aa77a31a
MD5 8e1e65e9025aecbfb8bd35cb8ba5fa58
BLAKE2b-256 da9f98cfbf54f2ec4786422e5f995f7451e0bebae16d3d4de5a9abf70a7513fc

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 241.6 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 7a2e6a2fbbf37e2cf01bd764e1360face93aa53ae95dd9afacb7546e06a680c1
MD5 8030bdb2bad71c21f5de0589c6518d43
BLAKE2b-256 e3e9bc57027aa06be849f0c5e7c69472e461f422f074c068b194a8f4e70d518b

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6fc416b9a2efd17d6f3a672759a706847be43f665480a5a92f969442aee74fa6
MD5 96cadb48afe07fef3a7040b0d18cbe78
BLAKE2b-256 4ea32db1c01e0fb456bec5e1bf98be3e1b8276066e35a9a09d8173ebf620c866

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1b2b8d3160b7598629349ddca548c26a4118c1722be82bfed0e2324a0ee8c7a3
MD5 f287f7b696a526557511e9348dab76cf
BLAKE2b-256 0a9100572fa8915c074c7dcb55a8d312d0488558476f35a6d7aeaa15e5e8954a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page