Skip to main content

A pure-Python library for reading and writing PEFF (PSI Extended FASTA Format) files.

Project description

pefftacular

PyPI Python Package License Python

Python library for reading and writing PEFF (PSI Extended FASTA Format) files. PEFF is a superset of FASTA used in proteomics that carries rich per-entry annotations — PTMs, variants, processed forms, and more — encoded directly in the sequence header.

Install

pip install pefftacular

Dev install:

just install

Quick start

read_peff — load everything into memory at once:

from pefftacular import read_peff

header, entries = read_peff("proteins.peff")

for entry in entries:
    print(entry.db_unique_id, entry.pname, len(entry.sequence))

PeffReader — iterate lazily without loading the full file:

from pefftacular import PeffReader

with PeffReader("proteins.peff") as reader:
    file_header = reader.header
    for entry in reader:
        process(entry)

Data model

read_peff and PeffReader yield SequenceEntry objects with these fields:

Field Type Description
prefix str Database prefix (e.g. sp, tr)
db_unique_id str Accession (e.g. P12345)
sequence str Amino acid sequence
pname str | None Protein name (\\PName=)
gname str | None Gene name (\\GName=)
ncbi_tax_id int | None NCBI taxonomy ID (\\NcbiTaxId=)
length int | None Sequence length (\\Length=)
sv int | None Sequence version (\\SV=)
ev int | None Entry version (\\EV=)
pe int | None Protein existence level (\\PE=)
variant_simple list[VariantSimple] Simple sequence variants
variant_complex tuple[VariantComplex, ...] Multi-residue variants (start, end, new sequence, optional tag)
mod_res_unimod list[ModResUnimod] UniMod modification sites
mod_res_psi list[ModResPsi] PSI-MOD modification sites
mod_res list[ModRes] Other named modification sites
processed list[Processed] Processed sequence forms
extra dict[str, str] Non-standard key/value pairs

Annotations

Variants:

from pefftacular import read_peff

_, entries = read_peff("proteins.peff")
entry = entries[0]

for v in entry.variant_simple:
    print(v.position, v.new_amino_acid, v.tag)
    # e.g. 42, "K", "rs12345"

Modifications (UniMod):

for mod in entry.mod_res_unimod:
    print(mod.position, mod.accession, mod.name)
    # e.g. 17, "21", "Phospho"

Modifications (PSI-MOD):

for mod in entry.mod_res_psi:
    print(mod.position, mod.accession, mod.name)
    # e.g. 17, "MOD:00696", "phosphorylated residue"

Processed forms:

for proc in entry.processed:
    print(proc.start_pos, proc.end_pos, proc.accession, proc.name)
    # e.g. 1, 24, "PRO_0000012345", "Signal peptide"

Non-standard keys:

value = entry.extra.get("MyCustomKey")

Writing

Build a header and entries, then write:

from pefftacular import write_peff
from pefftacular.models import FileHeader, DatabaseHeader, SequenceEntry

db_header = DatabaseHeader(
    prefix="sp",
    db_name="SwissProt",
    db_version="2024_01",
    number_of_entries=1,
)

file_header = FileHeader(
    version="1.0",
    databases=[db_header],
)

entry = SequenceEntry(
    prefix="sp",
    db_unique_id="P12345",
    sequence="MKTIIALSYIFCLVFA",
    pname="Example protein",
    gname="EXMP",
)

write_peff(file_header, [entry], "output.peff")

dest can be a file path string or a pathlib.Path. Pass an open binary file object to write to an existing stream.

Error handling

Parse errors raise PeffParseError:

from pefftacular import read_peff
from pefftacular.exceptions import PeffParseError

try:
    header, entries = read_peff("malformed.peff")
except PeffParseError as e:
    print(e.line)     # the offending line number
    print(e.context)  # surrounding context string

Write errors raise PeffWriteError:

from pefftacular.exceptions import PeffWriteError

try:
    write_peff(file_header, entries, "/read-only/output.peff")
except PeffWriteError as e:
    print(e)

Development

just install      # install dependencies
just test         # run tests
just test-v       # run tests (verbose)
just test-file tests/test_reader.py   # run a single test file
just cov          # run tests with coverage
just lint         # ruff lint
just format       # ruff format
just check        # lint + type check + test
just build        # build the package
just clean        # remove cache files
just docs         # serve docs locally
just docs-deploy  # deploy docs to GitHub Pages

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pefftacular-0.2.0.tar.gz (741.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pefftacular-0.2.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file pefftacular-0.2.0.tar.gz.

File metadata

  • Download URL: pefftacular-0.2.0.tar.gz
  • Upload date:
  • Size: 741.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pefftacular-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0123392e5f031b92b541a66e2e39cba3973aad5ae7358dcf83f080e9f156fa9a
MD5 a9a86bbe4850f06b2a873777c273ec54
BLAKE2b-256 f0d88376ff04b02eab3d651b2b657f1195fa7a4953cd0bc004ca743779052dbc

See more details on using hashes here.

File details

Details for the file pefftacular-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pefftacular-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pefftacular-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 05d27991c7748ca66a52f894bf97f1871cb12aaf689b83bbd843656312c65559
MD5 d9e4da99f2d83e926fbe928dc9b0b25c
BLAKE2b-256 65f8fcf92bc3b705e843f66c8268f503a27128e6133d62faea5d1b547e4f606c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page