Fetch UniProt protein data and generate PEFF (PSI Extended FASTA Format) files.
Project description
peff_uniprot_fetcher
Generate PEFF (PSI Extended FASTA Format) files from the UniProt REST API. Fetches protein sequences and GFF annotations (variants, PTMs, processed forms) and writes them as annotated PEFF using pefftacular.
Installation
# From source
git clone https://github.com/pgarrett-scripps/peff_uniprot_fetcher
cd peff_uniprot_fetcher
just install
CLI Usage
Fetch PEFF by organism
# Human Swiss-Prot proteome (reviewed only)
fetch-peff human.peff --organism-id 9606
# E. coli K-12
fetch-peff ecoli.peff --organism-id 83333
# Include unreviewed (TrEMBL) entries
fetch-peff human_full.peff --organism-id 9606 --unreviewed
# Custom UniProt query
fetch-peff kinases.peff --query "organism_id:9606 AND keyword:KW-0418"
# Specific accessions
fetch-peff selected.peff --accessions P12345 Q99999 O75807
# Sequences only, no annotations
fetch-peff seqs.peff --organism-id 9606 --no-variants --no-modifications --no-processed
Convert a local FASTA to PEFF
Sequences come from the local file; GFF annotations are fetched from UniProt per accession.
fasta-to-peff input.fasta output.peff
Download raw UniProt files
Download FASTA and/or GFF files for local inspection.
# Single accession
download-uniprot --accession P04637
# Full organism (both formats)
download-uniprot --organism-id 9606 --output-dir data/human
# GFF only
download-uniprot --organism-id 9606 --formats gff --output-dir data/human
Annotation flags
All fetch-peff and fasta-to-peff commands accept:
| Flag | Effect |
|---|---|
--no-variants |
Exclude sequence variants (VariantSimple, VariantComplex) |
--no-modifications |
Exclude PTMs (ModResPsi, ModRes) |
--no-processed |
Exclude processed forms (Signal peptide, Chain, etc.) |
Python API
from peff_uniprot_fetcher import fetch_peff, fetch_peff_to_file, fasta_to_peff, fasta_to_peff_file
from pefftacular import write_peff
# Fetch and write in one call
fetch_peff_to_file("human.peff", query="organism_id:9606 AND reviewed:true")
# Or get the data back
header, entries = fetch_peff(accessions=["P12345", "Q99999"])
write_peff(header, entries, "output.peff")
# From a local FASTA file
fasta_to_peff_file("input.fasta", "output.peff")
header, entries = fasta_to_peff("input.fasta")
PEFF annotations
The following UniProt GFF feature types are mapped to PEFF annotations:
| UniProt feature | PEFF key |
|---|---|
| Natural variant, Mutagenesis, Sequence conflict | VariantSimple / VariantComplex |
| Alternative sequence (isoform) | VariantComplex |
| Modified residue | ModResPsi (with PSI-MOD accession) or ModRes |
| Glycosylation, Lipidation, Cross-link | ModRes |
| Signal peptide | Processed (PEFF:0001001) |
| Transit peptide | Processed (PEFF:0001002) |
| Propeptide | Processed (PEFF:0001003) |
| Chain (mature protein) | Processed (PEFF:0001004) |
| Peptide | Processed (PEFF:0001005) |
PTM names are mapped to PSI-MOD accessions using the UniProt PTM list.
Just recipes
just download-ecoli # download raw E. coli K-12 FASTA + GFF to data/ecoli/
just fetch-ecoli # generate PEFF for E. coli K-12
just fasta-to-peff-ecoli # convert downloaded E. coli FASTA to PEFF
Development
just lint # ruff check
just format # ruff format
just check # lint + type check + test
just test # pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peff_uniprot_fetcher-0.1.0.tar.gz.
File metadata
- Download URL: peff_uniprot_fetcher-0.1.0.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1145482f0b1329c0ec486277fed61e770448efa049128748733a177d88c58071
|
|
| MD5 |
829aab2712502e09bdb1aa2baed701b8
|
|
| BLAKE2b-256 |
df03a1f0833d82efffc4785403df81346dfd08273bff10cfb656fa37381b22fb
|
File details
Details for the file peff_uniprot_fetcher-0.1.0-py3-none-any.whl.
File metadata
- Download URL: peff_uniprot_fetcher-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e66be34acc7198d828ab7d6f312f3664c601159e6193dd834f79888d5c42968e
|
|
| MD5 |
8e49ce8556d86854afcda0ef959e628d
|
|
| BLAKE2b-256 |
fb5252611cdb9e786c56713bdfd7400a6dfaca1e526e3a67fc03aefd4e80b7ee
|