A pure-Python library for reading and writing PEFF (PSI Extended FASTA Format) files.
Project description
pefftacular
Python library for reading and writing PEFF (PSI Extended FASTA Format) files. PEFF is a superset of FASTA used in proteomics that carries rich per-entry annotations — PTMs, variants, processed forms, and more — encoded directly in the sequence header.
Install
pip install pefftacular
Dev install:
just install
Quick start
read_peff — load everything into memory at once:
from pefftacular import read_peff
header, entries = read_peff("proteins.peff")
for entry in entries:
print(entry.db_unique_id, entry.pname, len(entry.sequence))
PeffReader — iterate lazily without loading the full file:
from pefftacular import PeffReader
with PeffReader("proteins.peff") as reader:
file_header = reader.header
for entry in reader:
process(entry)
Data model
read_peff and PeffReader yield SequenceEntry objects with these fields:
| Field | Type | Description |
|---|---|---|
prefix |
str |
Database prefix (e.g. sp, tr) |
db_unique_id |
str |
Accession (e.g. P12345) |
sequence |
str |
Amino acid sequence |
pname |
str | None |
Protein name (\\PName=) |
gname |
str | None |
Gene name (\\GName=) |
ncbi_tax_id |
int | None |
NCBI taxonomy ID (\\NcbiTaxId=) |
length |
int | None |
Sequence length (\\Length=) |
sv |
int | None |
Sequence version (\\SV=) |
ev |
int | None |
Entry version (\\EV=) |
pe |
int | None |
Protein existence level (\\PE=) |
variant_simple |
list[VariantSimple] |
Simple sequence variants |
variant_complex |
tuple[VariantComplex, ...] |
Multi-residue variants (start, end, new sequence, optional tag) |
mod_res_unimod |
list[ModResUnimod] |
UniMod modification sites |
mod_res_psi |
list[ModResPsi] |
PSI-MOD modification sites |
mod_res |
list[ModRes] |
Other named modification sites |
processed |
list[Processed] |
Processed sequence forms |
extra |
dict[str, str] |
Non-standard key/value pairs |
Annotations
Variants:
from pefftacular import read_peff
_, entries = read_peff("proteins.peff")
entry = entries[0]
for v in entry.variant_simple:
print(v.position, v.new_amino_acid, v.tag)
# e.g. 42, "K", "rs12345"
Modifications (UniMod):
for mod in entry.mod_res_unimod:
print(mod.position, mod.accession, mod.name)
# e.g. 17, "21", "Phospho"
Modifications (PSI-MOD):
for mod in entry.mod_res_psi:
print(mod.position, mod.accession, mod.name)
# e.g. 17, "MOD:00696", "phosphorylated residue"
Processed forms:
for proc in entry.processed:
print(proc.start_pos, proc.end_pos, proc.accession, proc.name)
# e.g. 1, 24, "PRO_0000012345", "Signal peptide"
Non-standard keys:
value = entry.extra.get("MyCustomKey")
Writing
Build a header and entries, then write:
from pefftacular import write_peff
from pefftacular.models import FileHeader, DatabaseHeader, SequenceEntry
db_header = DatabaseHeader(
prefix="sp",
db_name="SwissProt",
db_version="2024_01",
number_of_entries=1,
)
file_header = FileHeader(
version="1.0",
databases=[db_header],
)
entry = SequenceEntry(
prefix="sp",
db_unique_id="P12345",
sequence="MKTIIALSYIFCLVFA",
pname="Example protein",
gname="EXMP",
)
write_peff(file_header, [entry], "output.peff")
dest can be a file path string or a pathlib.Path. Pass an open binary file object to write to an existing stream.
Error handling
Parse errors raise PeffParseError:
from pefftacular import read_peff
from pefftacular.exceptions import PeffParseError
try:
header, entries = read_peff("malformed.peff")
except PeffParseError as e:
print(e.line) # the offending line number
print(e.context) # surrounding context string
Write errors raise PeffWriteError:
from pefftacular.exceptions import PeffWriteError
try:
write_peff(file_header, entries, "/read-only/output.peff")
except PeffWriteError as e:
print(e)
Development
just install # install dependencies
just test # run tests
just test-v # run tests (verbose)
just test-file tests/test_reader.py # run a single test file
just cov # run tests with coverage
just lint # ruff lint
just format # ruff format
just check # lint + type check + test
just build # build the package
just clean # remove cache files
just docs # serve docs locally
just docs-deploy # deploy docs to GitHub Pages
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pefftacular-0.2.0.tar.gz.
File metadata
- Download URL: pefftacular-0.2.0.tar.gz
- Upload date:
- Size: 741.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0123392e5f031b92b541a66e2e39cba3973aad5ae7358dcf83f080e9f156fa9a
|
|
| MD5 |
a9a86bbe4850f06b2a873777c273ec54
|
|
| BLAKE2b-256 |
f0d88376ff04b02eab3d651b2b657f1195fa7a4953cd0bc004ca743779052dbc
|
File details
Details for the file pefftacular-0.2.0-py3-none-any.whl.
File metadata
- Download URL: pefftacular-0.2.0-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05d27991c7748ca66a52f894bf97f1871cb12aaf689b83bbd843656312c65559
|
|
| MD5 |
d9e4da99f2d83e926fbe928dc9b0b25c
|
|
| BLAKE2b-256 |
65f8fcf92bc3b705e843f66c8268f503a27128e6133d62faea5d1b547e4f606c
|