Skip to main content

Fetch UniProt protein data and generate PEFF (PSI Extended FASTA Format) files.

Project description

peff_uniprot_fetcher

Generate PEFF (PSI Extended FASTA Format) files from the UniProt REST API. Fetches protein sequences and GFF annotations (variants, PTMs, processed forms) and writes them as annotated PEFF using pefftacular.

Installation

# From source
git clone https://github.com/pgarrett-scripps/peff_uniprot_fetcher
cd peff_uniprot_fetcher
just install

CLI Usage

Fetch PEFF by organism

# Human Swiss-Prot proteome (reviewed only)
fetch-peff human.peff --organism-id 9606

# E. coli K-12
fetch-peff ecoli.peff --organism-id 83333

# Include unreviewed (TrEMBL) entries
fetch-peff human_full.peff --organism-id 9606 --unreviewed

# Custom UniProt query
fetch-peff kinases.peff --query "organism_id:9606 AND keyword:KW-0418"

# Specific accessions
fetch-peff selected.peff --accessions P12345 Q99999 O75807

# Sequences only, no annotations
fetch-peff seqs.peff --organism-id 9606 --no-variants --no-modifications --no-processed

Convert a local FASTA to PEFF

Sequences come from the local file; GFF annotations are fetched from UniProt per accession.

fasta-to-peff input.fasta output.peff

Download raw UniProt files

Download FASTA and/or GFF files for local inspection.

# Single accession
download-uniprot --accession P04637

# Full organism (both formats)
download-uniprot --organism-id 9606 --output-dir data/human

# GFF only
download-uniprot --organism-id 9606 --formats gff --output-dir data/human

Annotation flags

All fetch-peff and fasta-to-peff commands accept:

Flag Effect
--no-variants Exclude sequence variants (VariantSimple, VariantComplex)
--no-modifications Exclude PTMs (ModResPsi, ModRes)
--no-processed Exclude processed forms (Signal peptide, Chain, etc.)

Python API

from peff_uniprot_fetcher import fetch_peff, fetch_peff_to_file, fasta_to_peff, fasta_to_peff_file
from pefftacular import write_peff

# Fetch and write in one call
fetch_peff_to_file("human.peff", query="organism_id:9606 AND reviewed:true")

# Or get the data back
header, entries = fetch_peff(accessions=["P12345", "Q99999"])
write_peff(header, entries, "output.peff")

# From a local FASTA file
fasta_to_peff_file("input.fasta", "output.peff")
header, entries = fasta_to_peff("input.fasta")

PEFF annotations

The following UniProt GFF feature types are mapped to PEFF annotations:

UniProt feature PEFF key
Natural variant, Mutagenesis, Sequence conflict VariantSimple / VariantComplex
Alternative sequence (isoform) VariantComplex
Modified residue ModResPsi (with PSI-MOD accession) or ModRes
Glycosylation, Lipidation, Cross-link ModRes
Signal peptide Processed (PEFF:0001001)
Transit peptide Processed (PEFF:0001002)
Propeptide Processed (PEFF:0001003)
Chain (mature protein) Processed (PEFF:0001004)
Peptide Processed (PEFF:0001005)

PTM names are mapped to PSI-MOD accessions using the UniProt PTM list.

Just recipes

just download-ecoli        # download raw E. coli K-12 FASTA + GFF to data/ecoli/
just fetch-ecoli           # generate PEFF for E. coli K-12
just fasta-to-peff-ecoli   # convert downloaded E. coli FASTA to PEFF

Development

just lint      # ruff check
just format    # ruff format
just check     # lint + type check + test
just test      # pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peff_uniprot_fetcher-0.1.0.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peff_uniprot_fetcher-0.1.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file peff_uniprot_fetcher-0.1.0.tar.gz.

File metadata

  • Download URL: peff_uniprot_fetcher-0.1.0.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for peff_uniprot_fetcher-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1145482f0b1329c0ec486277fed61e770448efa049128748733a177d88c58071
MD5 829aab2712502e09bdb1aa2baed701b8
BLAKE2b-256 df03a1f0833d82efffc4785403df81346dfd08273bff10cfb656fa37381b22fb

See more details on using hashes here.

File details

Details for the file peff_uniprot_fetcher-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for peff_uniprot_fetcher-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e66be34acc7198d828ab7d6f312f3664c601159e6193dd834f79888d5c42968e
MD5 8e49ce8556d86854afcda0ef959e628d
BLAKE2b-256 fb5252611cdb9e786c56713bdfd7400a6dfaca1e526e3a67fc03aefd4e80b7ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page