A pure-Python library for reading and writing PEFF (PSI Extended FASTA Format) files.
Project description
pefftacular
Python library for reading and writing PEFF (PSI Extended FASTA Format) files. PEFF is a superset of FASTA used in proteomics that carries rich per-entry annotations — PTMs, variants, processed forms, and more — encoded directly in the sequence header.
Install
pip install pefftacular
Dev install:
just install
Quick start
read_peff — load everything into memory at once:
from pefftacular import read_peff
header, entries = read_peff("proteins.peff")
for entry in entries:
print(entry.db_unique_id, entry.pname, len(entry.sequence))
PeffReader — iterate lazily without loading the full file:
from pefftacular import PeffReader
with PeffReader("proteins.peff") as reader:
file_header = reader.header
for entry in reader:
process(entry)
Data model
read_peff and PeffReader yield SequenceEntry objects with these fields:
| Field | Type | Description |
|---|---|---|
prefix |
str |
Database prefix (e.g. sp, tr) |
db_unique_id |
str |
Accession (e.g. P12345) |
sequence |
str |
Amino acid sequence |
pname |
str | None |
Protein name (\\PName=) |
gname |
str | None |
Gene name (\\GName=) |
ncbi_tax_id |
int | None |
NCBI taxonomy ID (\\NcbiTaxId=) |
length |
int | None |
Sequence length (\\Length=) |
sv |
int | None |
Sequence version (\\SV=) |
ev |
int | None |
Entry version (\\EV=) |
pe |
int | None |
Protein existence level (\\PE=) |
variant_simple |
tuple[VariantSimple, ...] |
Simple sequence variants |
variant_complex |
tuple[VariantComplex, ...] |
Multi-residue variants (start, end, new sequence, optional tag) |
mod_res_unimod |
tuple[ModResUnimod, ...] |
UniMod modification sites |
mod_res_psi |
tuple[ModResPsi, ...] |
PSI-MOD modification sites |
mod_res |
tuple[ModRes, ...] |
Other named modification sites |
processed |
tuple[Processed, ...] |
Processed sequence forms |
custom_values |
dict[str, tuple[CustomKeyValue, ...]] |
Header-declared custom keys, parsed by their CustomKeyDef |
extra |
dict[str, str] |
Non-standard keys with no CustomKeyDef |
Annotations
Variants:
from pefftacular import read_peff
_, entries = read_peff("proteins.peff")
entry = entries[0]
for v in entry.variant_simple:
print(v.position, v.new_amino_acid, v.tag)
# e.g. 42, "K", "rs12345"
Modifications (UniMod):
for mod in entry.mod_res_unimod:
print(mod.position, mod.accession, mod.name)
# e.g. 17, "21", "Phospho"
Modifications (PSI-MOD):
for mod in entry.mod_res_psi:
print(mod.position, mod.accession, mod.name)
# e.g. 17, "MOD:00696", "phosphorylated residue"
Processed forms:
for proc in entry.processed:
print(proc.start_pos, proc.end_pos, proc.accession, proc.name)
# e.g. 1, 24, "PRO_0000012345", "Signal peptide"
Custom keys (declared via # CustomKeyDef= in the header):
When the database header declares a custom key, entry values for that key are
parsed using its RegExp / FieldNames / FieldTypes and exposed as typed
fields on entry.custom_values. The original item text is preserved in raw
for lossless round-trips.
Header excerpt:
# CustomKeyDef=(KeyName=SecondaryStructure|Description="..."|ConceptCURIE=BAO:0000014|RegExp="([0-9]+)\|([0-9]+)\|([A-Za-z]+:[0-9]+)?\|(.+)"|FieldNames=StartPosition,EndPosition,CURIE,Description|FieldTypes=integer,integer,string,string)
Entry usage:
>cu:P00001 \SecondaryStructure=(10|20|ncithesaurus:C47937|Helix)
Access:
ss = entry.custom_values["SecondaryStructure"]
ss[0].fields["StartPosition"] # 10 (int)
ss[0].fields["Description"] # "Helix"
Supported FieldTypes are XSD basic types (string, integer, decimal,
boolean, date, time) plus enumeration(a|b|c). Coercion failures and
enumeration mismatches emit UserWarning and fall back to the raw string.
If no RegExp is declared, the value is split on | and zipped with
FieldNames.
Other non-standard keys (no CustomKeyDef registered) still land in
entry.extra as raw strings:
value = entry.extra.get("MyCustomKey")
Writing
Build a header and entries, then write:
from pefftacular import DatabaseHeader, FileHeader, SequenceEntry, write_peff
db_header = DatabaseHeader(
prefix="sp",
db_name="SwissProt",
db_version="2024_01",
number_of_entries=1,
)
file_header = FileHeader(
peff_version="1.0",
databases=(db_header,),
)
entry = SequenceEntry(
prefix="sp",
db_unique_id="P12345",
sequence="MKTIIALSYIFCLVFA",
pname="Example protein",
gname="EXMP",
)
write_peff(file_header, [entry], "output.peff")
dest can be a file path string, a pathlib.Path, or a text-mode file object.
Error handling
Parse errors raise PeffParseError:
from pefftacular import PeffParseError, read_peff
try:
header, entries = read_peff("malformed.peff")
except PeffParseError as e:
print(e.line) # the offending line number
print(e.context) # surrounding context string
Write errors raise PeffWriteError:
from pefftacular import PeffWriteError
try:
write_peff(file_header, entries, "/read-only/output.peff")
except PeffWriteError as e:
print(e)
Development
just install # install dependencies
just test # run tests
just test-v # run tests (verbose)
just test-file tests/test_reader.py # run a single test file
just cov # run tests with coverage
just lint # ruff lint
just format # ruff format
just check # lint + type check + test
just build # build the package
just clean # remove cache files
just docs # serve docs locally
just docs-deploy # deploy docs to GitHub Pages
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pefftacular-0.3.0.tar.gz.
File metadata
- Download URL: pefftacular-0.3.0.tar.gz
- Upload date:
- Size: 748.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a61c16cbd24fcff5c3bde64a35e5c6cb05a6225973e4a24d2fdc41c5fec173c
|
|
| MD5 |
f633b74d34c313cd097898b9069feac9
|
|
| BLAKE2b-256 |
c85f398a5b0283accad698a94e33a581a2a7beb156caa8e6dbaccc8256cc2936
|
File details
Details for the file pefftacular-0.3.0-py3-none-any.whl.
File metadata
- Download URL: pefftacular-0.3.0-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16cd1a9c2a951c644e19375bce6db5ca90b8787cdb5f7ddbb193694c6b2378f8
|
|
| MD5 |
dbf950ee2b857110f15b497e594abd3f
|
|
| BLAKE2b-256 |
dbce2798f717b88344fcffa437dc55a9bab30e79722b75488cd89dd78c733340
|