Skip to main content

Search tool for peptides and epitopes within a proteome, while considering potential residue substitutions.

Project description

PEPMatch Logo


Unit Tests

Author: Daniel Marrama

PEPMatch is a high-performance peptide search tool for finding short peptide sequences within a reference proteome. Powered by a Rust engine with Python bindings, it delivers sub-second search times across entire proteomes while maintaining a simple Python API.

Key Features

  • Blazing Fast: Rust-powered search engine with automatic multi-core parallelization via Rayon. Search thousands of peptides against the entire human proteome in seconds.
  • Unified Index Format: Single .pepidx binary format stores sequences, metadata, and k-mer index in one memory-mapped file. Preprocess once, search repeatedly.
  • Versatile Searching: Exact matches, mismatch-tolerant searches, best match mode, and discontinuous epitope support.
  • Simple API: Two classes — Preprocessor and Matcher — handle everything.
  • Flexible I/O: Accepts queries from FASTA files, text files, or Python lists. Outputs to CSV, TSV, XLSX, JSON, or Polars DataFrame.

Requirements

Installation

pip install pepmatch

Quick Start

from pepmatch import Preprocessor, Matcher

# Preprocess a proteome (one-time step)
Preprocessor('human.fasta').preprocess(k=5)

# Search for exact matches
df = Matcher(
  query=['YLLDLHSYL', 'GLCTLVAML', 'FAKEPEPTIDE'],
  proteome_file='human.fasta',
  max_mismatches=0,
  k=5
).match()

print(df)

Preprocessing

Preprocessing builds a .pepidx index from your proteome FASTA file. This only needs to be done once per proteome and k-mer size. If a .pepidx file doesn't exist when you search, Matcher will create it automatically.

from pepmatch import Preprocessor

Preprocessor('human.fasta').preprocess(k=5)

CLI:

pepmatch-preprocess -p human.fasta -k 5

Flags

  • -p, --proteome (Required): Path to the proteome FASTA file.
  • -k, --kmer_size (Required): The k-mer size for indexing.
  • -n, --proteome_name: Custom name for the proteome.
  • -P, --preprocessed_files_path: Directory to save preprocessed files.

Matching

Exact Matching

from pepmatch import Matcher

df = Matcher(
  query='peptides.fasta',
  proteome_file='human.fasta',
  max_mismatches=0,
  k=5
).match()

Mismatch Searching

df = Matcher(
  query='neoepitopes.fasta',
  proteome_file='human.fasta',
  max_mismatches=3,
  k=3
).match()

Best Match

Automatically finds the optimal match for each peptide by trying different k-mer sizes and mismatch thresholds. No manual preprocessing required.

df = Matcher(
  query='peptides.fasta',
  proteome_file='human.fasta',
  best_match=True
).match()

Discontinuous Epitope Searching

Search for non-contiguous residues defined by their positions.

df = Matcher(
  query=[
    "R377, Q408, Q432, H433, F436",
    "S2760, V2763, E2773, D2805, T2819"
  ],
  proteome_file='sars-cov-2.fasta',
  max_mismatches=1
).match()

Mixed Queries

Linear peptides and discontinuous epitopes can be searched together.

df = Matcher(
  query=[
    'YLLDLHSYL',
    'R377, Q408, Q432, H433, F436',
    'GLCTLVAML',
  ],
  proteome_file='sars-cov-2.fasta',
  max_mismatches=0
).match()

Query Input Formats

  • Python list: ['YLLDLHSYL', 'GLCTLVAML']
  • FASTA file: .fasta, .fas, .fa, .fna, .ffn, .faa, .mpfa, .frn
  • Text file: .txt with one peptide per line

CLI:

pepmatch-match -q peptides.fasta -p human.fasta -m 0 -k 5

Flags

  • -q, --query (Required): Path to the query file.
  • -p, --proteome_file (Required): Path to the proteome FASTA file.
  • -m, --max_mismatches: Maximum mismatches allowed (default: 0).
  • -k, --kmer_size: K-mer size (default: 5).
  • -P, --preprocessed_files_path: Directory containing preprocessed files.
  • -b, --best_match: Enable best match mode.
  • -f, --output_format: Output format — csv, tsv, xlsx, json (default: csv).
  • -o, --output_name: Output file name (without extension).
  • -v, --sequence_version: Disable sequence versioning on protein IDs.

Output Formats

  • dataframe (default for API): Returns a Polars DataFrame.
  • csv (default for CLI): CSV file.
  • tsv: Tab-separated file.
  • xlsx: Excel file.
  • json: JSON file.

Performance

Benchmarked searching ~2,000 peptides against the human proteome (~200,000 proteins):

Mode Time
Exact match (k=5) ~0.06s
1 mismatch (k=3) ~1.5s
2 mismatches (k=3) ~1.9s
3 mismatches (k=3) ~3.7s

Citation

If you use PEPMatch in your research, please cite:

Marrama D, Chronister WD, Westernberg L, et al. PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins. BMC Bioinformatics. 2023;24(1):485. Published 2023 Dec 18. doi:10.1186/s12859-023-05606-4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepmatch-1.16.3.tar.gz (40.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pepmatch-1.16.3-cp313-cp313-win_amd64.whl (244.8 kB view details)

Uploaded CPython 3.13Windows x86-64

pepmatch-1.16.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (367.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

pepmatch-1.16.3-cp313-cp313-macosx_11_0_arm64.whl (329.9 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pepmatch-1.16.3-cp312-cp312-win_amd64.whl (244.9 kB view details)

Uploaded CPython 3.12Windows x86-64

pepmatch-1.16.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (367.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pepmatch-1.16.3-cp312-cp312-macosx_11_0_arm64.whl (330.0 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pepmatch-1.16.3-cp311-cp311-win_amd64.whl (244.3 kB view details)

Uploaded CPython 3.11Windows x86-64

pepmatch-1.16.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (366.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pepmatch-1.16.3-cp311-cp311-macosx_11_0_arm64.whl (330.4 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pepmatch-1.16.3-cp310-cp310-win_amd64.whl (244.4 kB view details)

Uploaded CPython 3.10Windows x86-64

pepmatch-1.16.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (367.2 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

pepmatch-1.16.3-cp310-cp310-macosx_11_0_arm64.whl (330.6 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file pepmatch-1.16.3.tar.gz.

File metadata

  • Download URL: pepmatch-1.16.3.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.3.tar.gz
Algorithm Hash digest
SHA256 31fb12e29451d33987d006e8f4b3c30d9b630193e1fe785aaf03b5e4fef73149
MD5 fb5c63f792cdb86bb5b48684f2765740
BLAKE2b-256 b5e9c3f61c685aec168bf7bd6b7fadfa14b90cd3731dd872ca281449aefe95a0

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.3-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 244.8 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.3-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 b500a9b4ef8db937f1430948a88cf843df3dce937e9169181b113f458cec1e82
MD5 687de5d900a15e986a969585431a0c10
BLAKE2b-256 ae4650e2043e89a9c41c26aeaab93bdddcfd64d3b40617a9c3bbf87a336194f2

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 be5095fe7bcc47d79b8001ea5712114539afcc6b2eb124d241debbf4255efcef
MD5 3bcec76f9a40ac5181f3a964a1a08d6a
BLAKE2b-256 470b3b00f5391b2432596ed3a326dd96ca6c9197d848968a8719cf9119fd4cae

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8d123055edce55a43ec747f90946c9dbaebf0023ccab621086fe8ebc466299cb
MD5 369c77077c930566d01bfeefe257fb34
BLAKE2b-256 0cd52bceba8d64cd0e65ebef3abda4b4eaa3498bcca83b89e362ae30ab0b783d

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 244.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ff584fb29ec16bcdaaed771fccd408e4dd846d8fb012cab939ae8925c2a82f30
MD5 1b6ef82e2d0b0153d570c06d847e572c
BLAKE2b-256 2f75f1c1ee031f6be1f71d47cd2b0937dd594258d3dcce3f6d2bbf9273eea13d

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bf7d7c01f55d5e2e2155f1f8be86a3e6bc797e3bf3ffa0ddcfb557e311e4e5e8
MD5 3304538101ae16a30cc8921fb8433f55
BLAKE2b-256 c3cb1186b6292865b07c051cebf7c0f25844213c0115ba7c1de502de9ac6d8bb

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f5047b56ef8fd2454d04855f077a7adc83b20fe8298eece7bc9c355e514c0406
MD5 d06abb4ec6a794735b318428e8f4f2a7
BLAKE2b-256 aa71a84ac94af74136d3a90b143b6c583234c7371509b9f2268f964689bc5af1

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.3-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 244.3 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 47c8c443590b2f21dbf346ba67555626b7d4957281ad1ffdd3ffd986f68fb74b
MD5 bd975d47676719e23ed8d25788883cb8
BLAKE2b-256 1b4693ee9b2bb0bc3ddef8ecee9bd766e511c002f7744e91a32589b562c0fb58

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 abfd8e3a9bbf5056392f4e4992f01b21ba0aa99e96959c611a513f62765a1899
MD5 bb445d455721a8c8cd1f469219279ee9
BLAKE2b-256 c663c56f6eaea1b66f0056043cc2e17a8ca4694ed7849e42f99a6872239cbced

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 34305b26581723a0a3f249863b0eb7290478a3d5c8eb81da8dd8524117ba8f05
MD5 19dc8f23a5e928371849d7e12381d622
BLAKE2b-256 c7aec71c3489629a7f5e19620d58197ed3e476435359be291c7900cda18f6e60

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.3-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 244.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d7c8b77887480c3170861bf28ed03a29de36b4111b08f7bf0a2fd44a5bb17bd7
MD5 d8e1935df564be6ff9f49ee720cb2eec
BLAKE2b-256 462a070e984f1899c951f545724128dd6139446472fb27edea70d9dc276a02b8

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d1b5f5b838748b375b411913a416124ac0717aa3bfdc4206862a44c64bddfac6
MD5 6e53b952bf60ccac996ab4fa2e7dbda6
BLAKE2b-256 ba8821e9598c4628a25e5c321f15d03dbd33be0cd89be7f74695b9d0bc0d6089

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 79422ad69bf0d04681e6c9d7499091b48a51e5b2b2a5230b7da5f425fab27bb0
MD5 ed6adee35302dc7b9c6fdfe3961b8c39
BLAKE2b-256 2c79a8d8cc61d27daf695756bd7c444fba118342b89ecfe1d1e41d975e7e014b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page