Skip to main content

Search tool for peptides and epitopes within a proteome, while considering potential residue substitutions.

Project description

PEPMatch Logo


Unit Tests

Author: Daniel Marrama

PEPMatch is a high-performance peptide search tool for finding short peptide sequences within a reference proteome. Powered by a Rust engine with Python bindings, it delivers sub-second search times across entire proteomes while maintaining a simple Python API.

Key Features

  • Blazing Fast: Rust-powered search engine with automatic multi-core parallelization via Rayon. Search thousands of peptides against the entire human proteome in seconds.
  • Unified Index Format: Single .pepidx binary format stores sequences, metadata, and k-mer index in one memory-mapped file. Preprocess once, search repeatedly.
  • Versatile Searching: Exact matches, mismatch-tolerant searches, best match mode, and discontinuous epitope support.
  • Simple API: Two classes — Preprocessor and Matcher — handle everything.
  • Flexible I/O: Accepts queries from FASTA files, text files, or Python lists. Outputs to CSV, TSV, XLSX, JSON, or Polars DataFrame.

Requirements

Installation

pip install pepmatch

Quick Start

from pepmatch import Preprocessor, Matcher

# Preprocess a proteome (one-time step)
Preprocessor('human.fasta').preprocess(k=5)

# Search for exact matches
df = Matcher(
  query=['YLLDLHSYL', 'GLCTLVAML', 'FAKEPEPTIDE'],
  proteome_file='human.fasta',
  max_mismatches=0,
  k=5
).match()

print(df)

Preprocessing

Preprocessing builds a .pepidx index from your proteome FASTA file. This only needs to be done once per proteome and k-mer size. If a .pepidx file doesn't exist when you search, Matcher will create it automatically.

from pepmatch import Preprocessor

Preprocessor('human.fasta').preprocess(k=5)

CLI:

pepmatch-preprocess -p human.fasta -k 5

Flags

  • -p, --proteome (Required): Path to the proteome FASTA file.
  • -k, --kmer_size (Required): The k-mer size for indexing.
  • -n, --proteome_name: Custom name for the proteome.
  • -P, --preprocessed_files_path: Directory to save preprocessed files.

Matching

Exact Matching

from pepmatch import Matcher

df = Matcher(
  query='peptides.fasta',
  proteome_file='human.fasta',
  max_mismatches=0,
  k=5
).match()

Mismatch Searching

df = Matcher(
  query='neoepitopes.fasta',
  proteome_file='human.fasta',
  max_mismatches=3,
  k=3
).match()

Best Match

Automatically finds the optimal match for each peptide by trying different k-mer sizes and mismatch thresholds. No manual preprocessing required.

df = Matcher(
  query='peptides.fasta',
  proteome_file='human.fasta',
  best_match=True
).match()

Discontinuous Epitope Searching

Search for non-contiguous residues defined by their positions.

df = Matcher(
  query=[
    "R377, Q408, Q432, H433, F436",
    "S2760, V2763, E2773, D2805, T2819"
  ],
  proteome_file='sars-cov-2.fasta',
  max_mismatches=1
).match()

Mixed Queries

Linear peptides and discontinuous epitopes can be searched together.

df = Matcher(
  query=[
    'YLLDLHSYL',
    'R377, Q408, Q432, H433, F436',
    'GLCTLVAML',
  ],
  proteome_file='sars-cov-2.fasta',
  max_mismatches=0
).match()

Query Input Formats

  • Python list: ['YLLDLHSYL', 'GLCTLVAML']
  • FASTA file: .fasta, .fas, .fa, .fna, .ffn, .faa, .mpfa, .frn
  • Text file: .txt with one peptide per line

CLI:

pepmatch-match -q peptides.fasta -p human.fasta -m 0 -k 5

Flags

  • -q, --query (Required): Path to the query file.
  • -p, --proteome_file (Required): Path to the proteome FASTA file.
  • -m, --max_mismatches: Maximum mismatches allowed (default: 0).
  • -k, --kmer_size: K-mer size (default: 5).
  • -P, --preprocessed_files_path: Directory containing preprocessed files.
  • -b, --best_match: Enable best match mode.
  • -f, --output_format: Output format — csv, tsv, xlsx, json (default: csv).
  • -o, --output_name: Output file name (without extension).
  • -v, --sequence_version: Disable sequence versioning on protein IDs.

Output Formats

  • dataframe (default for API): Returns a Polars DataFrame.
  • csv (default for CLI): CSV file.
  • tsv: Tab-separated file.
  • xlsx: Excel file.
  • json: JSON file.

Performance

Benchmarked searching ~2,000 peptides against the human proteome (~200,000 proteins):

Mode Time
Exact match (k=5) ~0.06s
1 mismatch (k=3) ~1.5s
2 mismatches (k=3) ~1.9s
3 mismatches (k=3) ~3.7s

Citation

If you use PEPMatch in your research, please cite:

Marrama D, Chronister WD, Westernberg L, et al. PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins. BMC Bioinformatics. 2023;24(1):485. Published 2023 Dec 18. doi:10.1186/s12859-023-05606-4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepmatch-1.16.4.tar.gz (41.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pepmatch-1.16.4-cp313-cp313-win_amd64.whl (265.4 kB view details)

Uploaded CPython 3.13Windows x86-64

pepmatch-1.16.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (384.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

pepmatch-1.16.4-cp313-cp313-macosx_11_0_arm64.whl (347.1 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pepmatch-1.16.4-cp312-cp312-win_amd64.whl (265.5 kB view details)

Uploaded CPython 3.12Windows x86-64

pepmatch-1.16.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (384.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pepmatch-1.16.4-cp312-cp312-macosx_11_0_arm64.whl (347.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pepmatch-1.16.4-cp311-cp311-win_amd64.whl (265.0 kB view details)

Uploaded CPython 3.11Windows x86-64

pepmatch-1.16.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pepmatch-1.16.4-cp311-cp311-macosx_11_0_arm64.whl (349.2 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pepmatch-1.16.4-cp310-cp310-win_amd64.whl (265.1 kB view details)

Uploaded CPython 3.10Windows x86-64

pepmatch-1.16.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

pepmatch-1.16.4-cp310-cp310-macosx_11_0_arm64.whl (349.4 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file pepmatch-1.16.4.tar.gz.

File metadata

  • Download URL: pepmatch-1.16.4.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.4.tar.gz
Algorithm Hash digest
SHA256 ce50d992d3da0318d1c0512df4c52fd74cb7e469461f551c6c3f3f0f5214ac50
MD5 5d91cfeb55e70e0d4aec031cbf32b170
BLAKE2b-256 ca0a1b73c3b7b0788c4f411fce195d7406aa8920c2c4a2f729375bd44ad27ced

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.4-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 265.4 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.4-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 afc1bea0318ba6c78e5391aba826af1f0fc349cacbb8dac833a6bf26e6d2f2ed
MD5 1b27e37370fb3949f8ceef5e5f1e5a67
BLAKE2b-256 9187e5cc3754d6d77c3530d62a5a6bdb3646ccb667e8b0caca772bbc8bba0f54

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a85d314d91025dda7f419123b814aa4d4cc974e56a37f9abb4c68d47c0c00a78
MD5 782788848a4eb6dddb06358d47cfd734
BLAKE2b-256 235335251cae232132aba751b82ae56159f8a85a474cd6b791f8d0047413ec3d

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.4-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 38f389c5cc829b8e376385454a9300f327f098e371bc6f536b6fb10817386b95
MD5 c983c825d75d45023c35b0e3bcd40a9c
BLAKE2b-256 4a9a2973d59f6eb87b71533eeb5b338fa55acea3f8a98688f59f153345d05520

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.4-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 265.5 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.4-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 56f625f7246cc5eeb2cef2261e0b8faefa060418f3d76e18c72b67de2d62eb7f
MD5 fa15364fd4d1c7c767f5947e93083b00
BLAKE2b-256 b150b1b294da8e5036a452ce54ce6b46219529bfa9f4f79a8339beb1bbf61bdf

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f90368c030098b81fb945a17b1a8e3ef7bd4f4e1ac1acf92e36c08d2b12bc052
MD5 c4b51f1ce215c27eb75129cb5de842d5
BLAKE2b-256 9442d0c1c559a83144308973e810e4721e986e8c24283292eafa086adf12f628

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 990908aaada52368c05b806360dbbd838906602f5883bba6bf4ea0393a0b7f3c
MD5 eeae861763e22dcf61543ffc0392fa04
BLAKE2b-256 dbcdd746664dd2256b1f7535301ba2070f97120047dd9cdf8c0e8a3a200924b1

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.4-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 265.0 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 060fe3e4a38a1c8dd38ad2b591baa27625e71851d120d541ae18a30135b61209
MD5 1e35c241e7c4a1d585ddd7dbb855d0f2
BLAKE2b-256 f71be7571389bfabb7d4ad3f7b6b01c5483629143ecb04b7168650c06f5e74d9

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 191706bb38761420962b623265942d9b6e885c87a06448e04151cfb6c0481f23
MD5 50f4ec87ab837393c5e1c8f906a1042b
BLAKE2b-256 c534d909cd6c67debfa4496fd7c93fbc92efa1d188c65b2112e22d6817d5946d

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4cb5075e8a84e3bd45aba50cfdf645be7f4db024ffe12b2e75e10789728a9376
MD5 a2da2b29b36c582bea8168c9fbb5ea39
BLAKE2b-256 06df52e48e2fcdd8b6f8a87e402cf3cbc391266eed5af2f4e60472da35ee281e

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: pepmatch-1.16.4-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 265.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pepmatch-1.16.4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 c2cc97562b2b1e771ab86ae727727ca29af9499810bdfd90c7c03d3451f572b9
MD5 8a603c9502b531b201e40f9e6cf2809d
BLAKE2b-256 e610ce0e822279b2e3ddb6ef2d0d14a876977f2ff9acc3c9f0c138de67ba3af4

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eb2545992947292739cbc82ee939f0079d3ee5609223445586b34a17a6661ca8
MD5 5731eda4a14a6a281217d554b1a286ff
BLAKE2b-256 6634b8d9ffa988c3e5dc4573f239321a228eedcf9c68c64832982b0f0fb01934

See more details on using hashes here.

File details

Details for the file pepmatch-1.16.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pepmatch-1.16.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 97482f63cbaf1c5adcef091622c4026d6a8eaf3a1df88cdd30c3a42ea417968f
MD5 ea0c60708d70f75c128816230d1d1f3b
BLAKE2b-256 45ab49333c2e68f6182ae5b9a5333e38b77e9607571d7c4f1f6165ffaf503b43

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page