PEFF-aware in-silico protein digestion with PTM enumeration, sequence variants, and ProForma output.
Project description
peff_digest
A PEFF-aware protein digest tool. Given a PEFF file and enzymatic digestion parameters, produces a CSV of peptides with their sequence (ProForma notation), variant annotation, length, and monoisotopic mass.
Each PEFF VariantSimple and VariantComplex annotation is applied independently (not combined). PEFF PTMs (ModResPsi / ModResUnimod) are applied combinatorially up to a configurable limit per peptide. Fixed and variable user-defined modifications — including terminal modifications — are also supported.
Installation
uv sync
# or
just install
Requires Python 3.12+.
Usage
Via config file
peff-digest --config config.toml
Via flags
peff-digest --peff-file human.peff --output-file peptides.csv
Flags override any values set in --config. Run peff-digest --help for the full flag reference.
Output
The output CSV has five columns:
| Column | Description |
|---|---|
protein_id |
db_unique_id from the PEFF header |
sequence |
ProForma-annotated peptide sequence (includes mods) |
variant |
PEFF variant notation, e.g. (42|R), or empty for canonical |
length |
Peptide length in residues |
mass |
Monoisotopic mass in Da, or empty if not computable |
Config reference
All options can be set in a TOML or JSON config file. TOML example:
peff_file = "human.peff"
output_file = "peptides.csv"
cleave_on = "KR"
missed_cleavages = 2
semi_enzymatic = false
max_ptm_per_peptide = 2
min_length = 7
max_length = 40
restrict_after = "P"
restrict_before = ""
cterminal = true
min_mass = 400.0
max_mass = 10000.0
drop_invalid_mass = false
annotate_variants = true
# Internal modifications — one [[internal_mods]] block per mod:
# [[internal_mods]]
# modification = "Carbamidomethyl"
# residue = "C"
# mod_type = "fixed"
#
# [[internal_mods]]
# modification = "Oxidation"
# residue = "M"
# mod_type = "variable"
# Terminal modifications — one [[terminal_mods]] block per mod:
# [[terminal_mods]]
# modification = "Acetyl"
# position = "nterm"
# mod_type = "variable"
# protein_terminus = true # only the first peptide of each protein
#
# [[terminal_mods]]
# modification = "UNIMOD:737"
# position = "nterm"
# mod_type = "fixed"
# residue = "M" # only if the terminal residue is M
#
# [[terminal_mods]]
# modification = "Amidated"
# position = "cterm"
# mod_type = "variable"
DigestConfig fields
| Field | Type | Default | Description |
|---|---|---|---|
peff_file |
str |
required | Path to the input PEFF file. Must exist. |
output_file |
str |
"peptides.csv" |
Path for the output CSV. |
cleave_on |
str |
"KR" |
Amino acids at which to cleave (e.g. "KR" for trypsin). |
missed_cleavages |
int |
2 |
Maximum number of missed cleavage sites per peptide. Min 0. |
semi_enzymatic |
bool |
false |
Include semi-enzymatic peptides (one non-enzymatic terminus). |
max_ptm_per_peptide |
int |
2 |
Maximum number of variable mods (PEFF + user) applied simultaneously per peptide. 0 disables all variable mods. Min 0. |
min_length |
int |
7 |
Minimum peptide length in residues (inclusive). Min 1. |
max_length |
int |
40 |
Maximum peptide length in residues (inclusive). Min 1. |
restrict_after |
str |
"P" |
Skip cleavage when the following residue is in this set (e.g. "P" for trypsin/Pro rule). |
restrict_before |
str |
"" |
Skip cleavage when the preceding residue is in this set. |
cterminal |
bool |
true |
true = C-terminal cleavage (standard); false = N-terminal. |
internal_mods |
list[InternalMod] |
[] |
Per-residue modifications. See InternalMod fields below. |
terminal_mods |
list[TerminalMod] |
[] |
Terminal modifications. See TerminalMod fields below. |
min_mass |
float | None |
None |
Minimum peptide mass in Da. Ignored if None. |
max_mass |
float | None |
None |
Maximum peptide mass in Da. Ignored if None. |
drop_invalid_mass |
bool |
false |
If true, exclude peptides whose mass cannot be computed. |
annotate_variants |
bool |
true |
If false, do not set peptide_name on variant peptides. |
workers |
int | None |
None |
Number of worker processes. Defaults to all available CPUs. Min 1. |
InternalMod fields
| Field | Type | Default | Description |
|---|---|---|---|
modification |
str |
required | Modification name (e.g. "Carbamidomethyl", "UNIMOD:21"). |
residue |
str |
required | One or more amino acids the mod applies to (e.g. "C" or "KR"). |
mod_type |
"fixed" | "variable" |
required | "fixed" = always applied; "variable" = enumerated combinatorially (counts against max_ptm_per_peptide). |
TerminalMod fields
| Field | Type | Default | Description |
|---|---|---|---|
modification |
str |
required | Modification name (e.g. "Acetyl", "UNIMOD:737"). |
position |
"nterm" | "cterm" |
required | Which terminus to apply the mod to. |
mod_type |
"fixed" | "variable" |
required | "fixed" = always applied; "variable" = enumerated combinatorially (counts against max_ptm_per_peptide). |
residue |
str | None |
None |
If set, the mod is only applied when the terminal residue is in this string (e.g. "M" or "KR"). |
protein_terminus |
bool |
false |
If true, only apply to the protein-level terminus (first peptide for N-term, last for C-term). |
Python API
Full digest → Polars DataFrame
from peff_digest import DigestConfig, InternalMod, TerminalMod, digest
config = DigestConfig(
peff_file="human.peff",
missed_cleavages=2,
min_length=7,
max_length=40,
min_mass=400.0,
max_mass=10000.0,
drop_invalid_mass=True,
internal_mods=[
InternalMod(modification="Carbamidomethyl", residue="C", mod_type="fixed"),
InternalMod(modification="Oxidation", residue="M", mod_type="variable"),
],
terminal_mods=[
TerminalMod(modification="Acetyl", position="nterm", mod_type="variable", protein_terminus=True),
],
)
df = digest(config)
print(df)
Returns a polars.DataFrame with columns protein_id, sequence, variant, length, mass. All filtering from the config (mass bounds, drop_invalid_mass) is applied before returning.
Single-entry digest
import pefftacular as pf
from peff_digest import InternalMod, TerminalMod, digest_peff_sequence
entry = next(iter(pf.PeffReader("human.peff")))
peptides = digest_peff_sequence(
entry,
cleave_on="KR",
missed_cleavages=2,
min_length=7,
max_length=40,
restrict_after="P",
internal_mods=[
InternalMod(modification="Carbamidomethyl", residue="C", mod_type="fixed"),
InternalMod(modification="Oxidation", residue="M", mod_type="variable"),
],
max_ptm_per_peptide=2,
terminal_mods=[
TerminalMod(modification="Acetyl", position="nterm", mod_type="variable", protein_terminus=True),
TerminalMod(modification="Amidated", position="cterm", mod_type="variable"),
],
)
for peptide in peptides:
print(str(peptide), len(peptide), peptide.mass())
Returns a set[peptacular.ProFormaAnnotation]. Each element supports len(), .mass(), str(), and .peptide_name (PEFF variant notation, or None for canonical).
Development
just lint # ruff check
just format # ruff format + import sort
just test # pytest
just check # lint + type check (ty) + test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peff_digest-0.1.0.tar.gz.
File metadata
- Download URL: peff_digest-0.1.0.tar.gz
- Upload date:
- Size: 268.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c5918fa62b27d26edf7bf6648f10e715758178ceeb3c9d3e063d56aa65f5af3
|
|
| MD5 |
7456657399664216b9eff7951263368c
|
|
| BLAKE2b-256 |
6ce94ae17f361619bdab88ed95fa1286e2696fb06a926ea92bd7b1364dd31113
|
Provenance
The following attestation bundles were made for peff_digest-0.1.0.tar.gz:
Publisher:
release.yml on tacular-omics/peff_digest
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
peff_digest-0.1.0.tar.gz -
Subject digest:
5c5918fa62b27d26edf7bf6648f10e715758178ceeb3c9d3e063d56aa65f5af3 - Sigstore transparency entry: 1202012339
- Sigstore integration time:
-
Permalink:
tacular-omics/peff_digest@ddfa7e1f84bacf9f2208f045096f78a168535042 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/tacular-omics
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ddfa7e1f84bacf9f2208f045096f78a168535042 -
Trigger Event:
release
-
Statement type:
File details
Details for the file peff_digest-0.1.0-py3-none-any.whl.
File metadata
- Download URL: peff_digest-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f1d7bf8a5374c8e0b0fa3f3da27974c8f3d1a75ec0b97ac588ca20f7400cac6
|
|
| MD5 |
c57e3ed89d8cb1c7afacc8740933b01a
|
|
| BLAKE2b-256 |
2f4e0a096e27c828a994758846008a4060c78511140bec49acfbb2d39f6a1a37
|
Provenance
The following attestation bundles were made for peff_digest-0.1.0-py3-none-any.whl:
Publisher:
release.yml on tacular-omics/peff_digest
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
peff_digest-0.1.0-py3-none-any.whl -
Subject digest:
4f1d7bf8a5374c8e0b0fa3f3da27974c8f3d1a75ec0b97ac588ca20f7400cac6 - Sigstore transparency entry: 1202012348
- Sigstore integration time:
-
Permalink:
tacular-omics/peff_digest@ddfa7e1f84bacf9f2208f045096f78a168535042 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/tacular-omics
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ddfa7e1f84bacf9f2208f045096f78a168535042 -
Trigger Event:
release
-
Statement type: