Skip to main content

Search tool for peptides and epitopes within a proteome, while considering potential residue substitutions.

Project description

PEPMatch Logo


Unit Tests

Author: Daniel Marrama

PEPMatch is a high-performance peptide search tool for finding short peptide sequences within a reference proteome. Powered by a Rust engine with Python bindings, it delivers sub-second search times across entire proteomes while maintaining a simple Python API.

Key Features

  • Blazing Fast: Rust-powered search engine with automatic multi-core parallelization via Rayon. Search thousands of peptides against the entire human proteome in seconds.
  • Unified Index Format: Single .pepidx binary format stores sequences, metadata, and k-mer index in one memory-mapped file. Preprocess once, search repeatedly.
  • Versatile Searching: Exact matches, mismatch-tolerant searches, best match mode, and discontinuous epitope support.
  • Simple API: Two classes — Preprocessor and Matcher — handle everything.
  • Flexible I/O: Accepts queries from FASTA files, text files, or Python lists. Outputs to CSV, TSV, XLSX, JSON, or Polars DataFrame.

Requirements

Installation

pip install pepmatch

Quick Start

from pepmatch import Preprocessor, Matcher

# Preprocess a proteome (one-time step)
Preprocessor('human.fasta').preprocess(k=5)

# Search for exact matches
df = Matcher(
  query=['YLLDLHSYL', 'GLCTLVAML', 'FAKEPEPTIDE'],
  proteome_file='human.fasta',
  max_mismatches=0,
  k=5
).match()

print(df)

Preprocessing

Preprocessing builds a .pepidx index from your proteome FASTA file. This only needs to be done once per proteome and k-mer size. If a .pepidx file doesn't exist when you search, Matcher will create it automatically.

from pepmatch import Preprocessor

Preprocessor('human.fasta').preprocess(k=5)

CLI:

pepmatch-preprocess -p human.fasta -k 5

Flags

  • -p, --proteome (Required): Path to the proteome FASTA file.
  • -k, --kmer_size (Required): The k-mer size for indexing.
  • -n, --proteome_name: Custom name for the proteome.
  • -P, --preprocessed_files_path: Directory to save preprocessed files.

Matching

Exact Matching

from pepmatch import Matcher

df = Matcher(
  query='peptides.fasta',
  proteome_file='human.fasta',
  max_mismatches=0,
  k=5
).match()

Mismatch Searching

df = Matcher(
  query='neoepitopes.fasta',
  proteome_file='human.fasta',
  max_mismatches=3,
  k=3
).match()

Best Match

Automatically finds the optimal match for each peptide by trying different k-mer sizes and mismatch thresholds. No manual preprocessing required.

df = Matcher(
  query='peptides.fasta',
  proteome_file='human.fasta',
  best_match=True
).match()

Discontinuous Epitope Searching

Search for non-contiguous residues defined by their positions.

df = Matcher(
  query=[
    "R377, Q408, Q432, H433, F436",
    "S2760, V2763, E2773, D2805, T2819"
  ],
  proteome_file='sars-cov-2.fasta',
  max_mismatches=1
).match()

Mixed Queries

Linear peptides and discontinuous epitopes can be searched together.

df = Matcher(
  query=[
    'YLLDLHSYL',
    'R377, Q408, Q432, H433, F436',
    'GLCTLVAML',
  ],
  proteome_file='sars-cov-2.fasta',
  max_mismatches=0
).match()

Query Input Formats

  • Python list: ['YLLDLHSYL', 'GLCTLVAML']
  • FASTA file: .fasta, .fas, .fa, .fna, .ffn, .faa, .mpfa, .frn
  • Text file: .txt with one peptide per line

CLI:

pepmatch-match -q peptides.fasta -p human.fasta -m 0 -k 5

Flags

  • -q, --query (Required): Path to the query file.
  • -p, --proteome_file (Required): Path to the proteome FASTA file.
  • -m, --max_mismatches: Maximum mismatches allowed (default: 0).
  • -k, --kmer_size: K-mer size (default: 5).
  • -P, --preprocessed_files_path: Directory containing preprocessed files.
  • -b, --best_match: Enable best match mode.
  • -f, --output_format: Output format — csv, tsv, xlsx, json (default: csv).
  • -o, --output_name: Output file name (without extension).
  • -v, --sequence_version: Disable sequence versioning on protein IDs.

Output Formats

  • dataframe (default for API): Returns a Polars DataFrame.
  • csv (default for CLI): CSV file.
  • tsv: Tab-separated file.
  • xlsx: Excel file.
  • json: JSON file.

Performance

Benchmarked searching ~2,000 peptides against the human proteome (~200,000 proteins):

Mode Time
Exact match (k=5) ~0.06s
1 mismatch (k=3) ~1.5s
2 mismatches (k=3) ~1.9s
3 mismatches (k=3) ~3.7s

Citation

If you use PEPMatch in your research, please cite:

Marrama D, Chronister WD, Westernberg L, et al. PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins. BMC Bioinformatics. 2023;24(1):485. Published 2023 Dec 18. doi:10.1186/s12859-023-05606-4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepmatch-1.16.2.tar.gz (39.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pepmatch-1.16.2-cp313-cp313-win_amd64.whl (241.9 kB view details)

Uploaded CPython 3.13Windows x86-64

pepmatch-1.16.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (364.8 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

pepmatch-1.16.2-cp313-cp313-macosx_11_0_arm64.whl (325.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pepmatch-1.16.2-cp312-cp312-win_amd64.whl (242.0 kB view details)

Uploaded CPython 3.12Windows x86-64

pepmatch-1.16.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (364.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pepmatch-1.16.2-cp312-cp312-macosx_11_0_arm64.whl (325.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pepmatch-1.16.2-cp311-cp311-win_amd64.whl (241.4 kB view details)

Uploaded CPython 3.11Windows x86-64

pepmatch-1.16.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (364.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pepmatch-1.16.2-cp311-cp311-macosx_11_0_arm64.whl (326.4 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pepmatch-1.16.2-cp310-cp310-win_amd64.whl (241.5 kB view details)

Uploaded CPython 3.10Windows x86-64

pepmatch-1.16.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (364.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

pepmatch-1.16.2-cp310-cp310-macosx_11_0_arm64.whl (326.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file pepmatch-1.16.2.tar.gz.

File metadata

  • Download URL: pepmatch-1.16.2.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.2.tar.gz
Algorithm Hash digest
SHA256 48298b89eb861710b45edc9226b3c8b0c30c0f0073abdae17019cf3d3ba528a6
MD5 23f846388a3d701927089180ebeb7938
BLAKE2b-256 79798b4443e919cce93ff2946bcce01d139c715d7a6ae983d6dd1333a034375e

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.2-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 241.9 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 9beb41cf40f320fb5c6374efc6a9c5006dcbd06a8f29aa83edb45bbbaf234b78
MD5 cade814f5597929f7d0cb77437ed95e7
BLAKE2b-256 b87bea20527c211919f8cf93b4a95837efdd21defc72e89ac1d0db2f78bc2bf8

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 edc117c731d053e85b3699b42168390c269fedcbcec8ed165028bda9ff8407ac
MD5 637726a7079fb482380f49dbdd07613e
BLAKE2b-256 59eac30a3db1970d75c52cc227591b47ef6ef6dfb058f04817310e90c5e97418

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7662b0a690110dc4dbba2729fe7e9fa419578c5c478ebddaea70fdfa0402ab7a
MD5 d1b4183deaeb3202f88c9d4a7ebf3cb8
BLAKE2b-256 221ed950e479f3a8c08f2a4b34a5693a16195d617223742c49d0173636894416

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 242.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 5204555f1ba807ac11bcbb3f5fcafab6a61155a45ea155189c03d477d5fda27a
MD5 1c8d030f889df311459dde2b8709abe9
BLAKE2b-256 e5118ba99bd426b82167808eb0fe7e3b0e4adcd528b6f32ae2bf579f13ddcc29

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6dd0b1ab98e95b9fba73b422346b20bd8d0487edcb3590964df53aa5959d1d24
MD5 f85dde13ff137af9be782c77a1e06403
BLAKE2b-256 d7e175fb1c28c0555ed18f8b64c78a3c5f7edbd6827e981369a4fc4998855a07

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 643baefe5359e79fff1e4eb3a22dc0bb7133f399d60e594c0e5eb0a9a5c06dad
MD5 db4af2549ae39feeffa3fb6495e7ae0e
BLAKE2b-256 8fb08eddf905aff0c9ac3174e30034b9c3f88bccad6ed10b293a007e8b564e05

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.2-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 241.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 53fe15d3fd7953b485a66e10fd67c64b3edf89a54faa5f5cc4bb5a677bdb0ef0
MD5 27e612b403ed9bce3507bfc36cec917d
BLAKE2b-256 48dd3f03562682298dca9ab7e2803376ffc3d4ed092ef35d0df785219b77a15c

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ecc30af108010c43651e48e1fa4c883b6bd2f7828cbe643dd51511f9c71a6b43
MD5 171efcc2d25c37f188f8f314d3425524
BLAKE2b-256 bdebebaa5be4a23da683f87ca04a557a9bdd421e84f25123039e80454b0e81e4

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3f5cee89616862e25e54c4cfacb1e1f2e460227fc9c5e4b3ecb721a1c6620930
MD5 1d2932de8aed631532ff1761d2eae88e
BLAKE2b-256 7124292201d048a7641192c6856b9fd75f3ec1d2e7171107a87ff5879f794f9b

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.2-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 241.5 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 af6290c88944c927942e042c6baa21d4117d8a9b0345500f68e439f549b761ac
MD5 985ef6f9c26b8b012fa1840c66fb082d
BLAKE2b-256 9a4f0f9b8951e1e706b6c9681d155257403c568af7d08b2c26a435394a4f188e

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 22a4ecded445b0ccb7d2ede090b7b4f44a1806f79d3b7f41587289c99d12f0c2
MD5 8645d43e6f154057430548f38b72a9b8
BLAKE2b-256 3a5c33eed010afb1bac61c31016e9093baa7964acd40294c1c11b33c812baea8

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 51049cb38955f43d1e12e55de2787626b5483837ef8cb94fc857bb73cd85a767
MD5 57a7d3e528f97a65ae197b4fcf7f69f8
BLAKE2b-256 26d9bb1b6b81651952d33c9252bf68006cc3a6694fc15ba58e64d99942cee307

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page