Skip to main content

PEFF-aware in-silico protein digestion with PTM enumeration, sequence variants, and ProForma output.

Project description

peff_digest

A PEFF-aware protein digest tool. Given a PEFF file and enzymatic digestion parameters, produces a CSV of peptides with their sequence (ProForma notation), variant annotation, length, and monoisotopic mass.

Each PEFF VariantSimple and VariantComplex annotation is applied independently (not combined). PEFF PTMs (ModResPsi / ModResUnimod) are applied combinatorially up to a configurable limit per peptide. Fixed and variable user-defined modifications — including terminal modifications — are also supported.

Installation

uv sync
# or
just install

Requires Python 3.12+.

Usage

Via config file

peff-digest --config config.toml

Via flags

peff-digest --peff-file human.peff --output-file peptides.csv

Flags override any values set in --config. Run peff-digest --help for the full flag reference.

Output

The output CSV has five columns:

Column Description
protein_id db_unique_id from the PEFF header
sequence ProForma-annotated peptide sequence (includes mods)
variant PEFF variant notation, e.g. (42|R), or empty for canonical
length Peptide length in residues
mass Monoisotopic mass in Da, or empty if not computable

Config reference

All options can be set in a TOML or JSON config file. TOML example:

peff_file = "human.peff"
output_file = "peptides.csv"

cleave_on = "KR"
missed_cleavages = 2
semi_enzymatic = false
max_ptm_per_peptide = 2
min_length = 7
max_length = 40
restrict_after = "P"
restrict_before = ""
cterminal = true
min_mass = 400.0
max_mass = 10000.0
drop_invalid_mass = false
annotate_variants = true

# Internal modifications — one [[internal_mods]] block per mod:
# [[internal_mods]]
# modification = "Carbamidomethyl"
# residue = "C"
# mod_type = "fixed"
#
# [[internal_mods]]
# modification = "Oxidation"
# residue = "M"
# mod_type = "variable"

# Terminal modifications — one [[terminal_mods]] block per mod:
# [[terminal_mods]]
# modification = "Acetyl"
# position = "nterm"
# mod_type = "variable"
# protein_terminus = true   # only the first peptide of each protein
#
# [[terminal_mods]]
# modification = "UNIMOD:737"
# position = "nterm"
# mod_type = "fixed"
# residue = "M"             # only if the terminal residue is M
#
# [[terminal_mods]]
# modification = "Amidated"
# position = "cterm"
# mod_type = "variable"

DigestConfig fields

Field Type Default Description
peff_file str required Path to the input PEFF file. Must exist.
output_file str "peptides.csv" Path for the output CSV.
cleave_on str "KR" Amino acids at which to cleave (e.g. "KR" for trypsin).
missed_cleavages int 2 Maximum number of missed cleavage sites per peptide. Min 0.
semi_enzymatic bool false Include semi-enzymatic peptides (one non-enzymatic terminus).
max_ptm_per_peptide int 2 Maximum number of variable mods (PEFF + user) applied simultaneously per peptide. 0 disables all variable mods. Min 0.
min_length int 7 Minimum peptide length in residues (inclusive). Min 1.
max_length int 40 Maximum peptide length in residues (inclusive). Min 1.
restrict_after str "P" Skip cleavage when the following residue is in this set (e.g. "P" for trypsin/Pro rule).
restrict_before str "" Skip cleavage when the preceding residue is in this set.
cterminal bool true true = C-terminal cleavage (standard); false = N-terminal.
internal_mods list[InternalMod] [] Per-residue modifications. See InternalMod fields below.
terminal_mods list[TerminalMod] [] Terminal modifications. See TerminalMod fields below.
min_mass float | None None Minimum peptide mass in Da. Ignored if None.
max_mass float | None None Maximum peptide mass in Da. Ignored if None.
drop_invalid_mass bool false If true, exclude peptides whose mass cannot be computed.
annotate_variants bool true If false, do not set peptide_name on variant peptides.
workers int | None None Number of worker processes. Defaults to all available CPUs. Min 1.

InternalMod fields

Field Type Default Description
modification str required Modification name (e.g. "Carbamidomethyl", "UNIMOD:21").
residue str required One or more amino acids the mod applies to (e.g. "C" or "KR").
mod_type "fixed" | "variable" required "fixed" = always applied; "variable" = enumerated combinatorially (counts against max_ptm_per_peptide).

TerminalMod fields

Field Type Default Description
modification str required Modification name (e.g. "Acetyl", "UNIMOD:737").
position "nterm" | "cterm" required Which terminus to apply the mod to.
mod_type "fixed" | "variable" required "fixed" = always applied; "variable" = enumerated combinatorially (counts against max_ptm_per_peptide).
residue str | None None If set, the mod is only applied when the terminal residue is in this string (e.g. "M" or "KR").
protein_terminus bool false If true, only apply to the protein-level terminus (first peptide for N-term, last for C-term).

Python API

Full digest → Polars DataFrame

from peff_digest import DigestConfig, InternalMod, TerminalMod, digest

config = DigestConfig(
    peff_file="human.peff",
    missed_cleavages=2,
    min_length=7,
    max_length=40,
    min_mass=400.0,
    max_mass=10000.0,
    drop_invalid_mass=True,
    internal_mods=[
        InternalMod(modification="Carbamidomethyl", residue="C", mod_type="fixed"),
        InternalMod(modification="Oxidation", residue="M", mod_type="variable"),
    ],
    terminal_mods=[
        TerminalMod(modification="Acetyl", position="nterm", mod_type="variable", protein_terminus=True),
    ],
)

df = digest(config)
print(df)

Returns a polars.DataFrame with columns protein_id, sequence, variant, length, mass. All filtering from the config (mass bounds, drop_invalid_mass) is applied before returning.

Single-entry digest

import pefftacular as pf
from peff_digest import InternalMod, TerminalMod, digest_peff_sequence

entry = next(iter(pf.PeffReader("human.peff")))

peptides = digest_peff_sequence(
    entry,
    cleave_on="KR",
    missed_cleavages=2,
    min_length=7,
    max_length=40,
    restrict_after="P",
    internal_mods=[
        InternalMod(modification="Carbamidomethyl", residue="C", mod_type="fixed"),
        InternalMod(modification="Oxidation", residue="M", mod_type="variable"),
    ],
    max_ptm_per_peptide=2,
    terminal_mods=[
        TerminalMod(modification="Acetyl", position="nterm", mod_type="variable", protein_terminus=True),
        TerminalMod(modification="Amidated", position="cterm", mod_type="variable"),
    ],
)

for peptide in peptides:
    print(str(peptide), len(peptide), peptide.mass())

Returns a set[peptacular.ProFormaAnnotation]. Each element supports len(), .mass(), str(), and .peptide_name (PEFF variant notation, or None for canonical).

Development

just lint      # ruff check
just format    # ruff format + import sort
just test      # pytest
just check     # lint + type check (ty) + test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peff_digest-0.1.0.tar.gz (268.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peff_digest-0.1.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file peff_digest-0.1.0.tar.gz.

File metadata

  • Download URL: peff_digest-0.1.0.tar.gz
  • Upload date:
  • Size: 268.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for peff_digest-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5c5918fa62b27d26edf7bf6648f10e715758178ceeb3c9d3e063d56aa65f5af3
MD5 7456657399664216b9eff7951263368c
BLAKE2b-256 6ce94ae17f361619bdab88ed95fa1286e2696fb06a926ea92bd7b1364dd31113

See more details on using hashes here.

Provenance

The following attestation bundles were made for peff_digest-0.1.0.tar.gz:

Publisher: release.yml on tacular-omics/peff_digest

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file peff_digest-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: peff_digest-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for peff_digest-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f1d7bf8a5374c8e0b0fa3f3da27974c8f3d1a75ec0b97ac588ca20f7400cac6
MD5 c57e3ed89d8cb1c7afacc8740933b01a
BLAKE2b-256 2f4e0a096e27c828a994758846008a4060c78511140bec49acfbb2d39f6a1a37

See more details on using hashes here.

Provenance

The following attestation bundles were made for peff_digest-0.1.0-py3-none-any.whl:

Publisher: release.yml on tacular-omics/peff_digest

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page