A Python library for working with protein containing FASTA files.

These details have not been verified by PyPI

Project links

repository

Project description

ProFASTA

Python Version from PEP 621 TOML

Introduction

ProFASTA is a Python library for working with FASTA files containing protein records. It prioritizes simplicity while providing a practical set of features for proteomics-based mass spectrometry workflows.

Core functionality includes:

Parsing and writing FASTA files via profasta.io
Structured header parsing via a registry of built-in and user-defined parsers
A protein database (ProteinDatabase) for managing entries loaded from one or more FASTA files
Decoy database generation by sequence reversal
Header validation for non-ASCII characters

ProFASTA is developed as part of the computational toolbox for the Mass Spectrometry Facility at the Max Perutz Labs (University of Vienna).

Similar projects

If ProFASTA doesn't meet your requirements, consider exploring these alternative Python packages with a focus on protein-containing FASTA files:

fastapy is a lightweight package with no dependencies that offers FASTA reading functionality.
protfasta is another library with no dependencies that provides reading functionality along with basic validation (e.g., duplicate headers, conversion of non-canonical amino acids). The library also allows writing FASTA files with the ability to specify the sequence line length.
pyteomics is a feature-rich package that provides tools to handle various sorts of proteomics data. It provides functions for FASTA reading, automatic parsing of headers (in various formats defined at uniprot.org), writing, and generation of decoy entries. Note that pyteomics is a large package with many dependencies.

Requirements

Python >= 3.11

ProFASTA has no dependencies beyond the Python standard library.

Installation

Install from PyPI:

pip install profasta

Key concepts

FASTA parsing

The profasta.io.parse_fasta function reads a FASTA file and yields FastaRecord objects. Sequences are automatically normalized: letters are converted to uppercase, spaces are removed, and trailing * characters are stripped.

import profasta.io

with open("proteins.fasta", "r") as f:
    for record in profasta.io.parse_fasta(f):
        print(record.header, record.sequence)

Header parsers and the registry

ProFASTA uses a registry system for header parsers and writers. Built-in parsers are registered under the following names:

Name	Description
`"default"`	Splits on the first whitespace; never fails
`"uniprot"`	Strict UniProt format parser
`"uniprot_like"`	Tolerant UniProt-like format parser

Built-in writers follow the same naming convention and include an additional "decoy" writer that prepends a rev_ tag to the header.

Custom parsers and writers can be registered via:

profasta.parser.register_parser("my_parser", MyParser)
profasta.parser.register_writer("my_writer", MyWriter)

A parser must implement a parse(header: str) -> ParsedHeader classmethod, and a writer must implement a write(parsed_header: ParsedHeader) -> str classmethod.

ProteinDatabase

The ProteinDatabase class provides a dict-like interface for managing protein entries loaded from FASTA files:

import profasta

db = profasta.ProteinDatabase()
db.add_fasta("proteins.fasta", header_parser="uniprot")

entry = db["O75385"]
print(entry.header_fields["gene_name"])  # ULK1

Multiple FASTA files can be added to the same database. Entries with unparseable headers can be skipped using skip_invalid=True.

A ProteinDatabase can also be created directly from one or more FASTA files using the from_fasta convenience constructor:

fasta_paths = ["proteome1.fasta", "proteome2.fasta"]
db = profasta.ProteinDatabase.from_fasta(*fasta_paths, header_parser="uniprot")

Entries can be filtered by a condition using the filter method, which returns a new ProteinDatabase:

human_db = db.filter(lambda e: e.header_fields.get("organism_identifier") == "9606")

Header validation

The profasta.validation module provides a function for checking FASTA records for non-ASCII characters in their headers, which can cause issues in downstream processing:

import profasta.validation

with open("proteins.fasta", "r") as f:
    records = list(profasta.io.parse_fasta(f))

issues = profasta.validation.find_header_ascii_issues(records)
for issue in issues:
    print(issue.header, issue.non_ascii_characters)

Usage examples

Load a UniProt FASTA file and access a protein entry

import profasta

db = profasta.ProteinDatabase()
db.add_fasta("./examples/uniprot_hsapiens_10entries.fasta", header_parser="uniprot")

entry = db["O75385"]
print(entry.header_fields["gene_name"])  # ULK1

Combine multiple FASTA files and add decoy entries

A common proteomics workflow is to combine one or more FASTA files and append reversed decoy sequences. Use profasta.write_decoy_fasta to write decoy entries directly to a FASTA file:

import profasta

# Load one or more forward databases
db = profasta.ProteinDatabase()
db.add_fasta("proteome.fasta", header_parser="uniprot")
db.add_fasta("additional.fasta", header_parser="uniprot")

# Write the forward entries, then append decoy entries with reversed sequences
output_path = "combined_with_decoys.fasta"
db.write_fasta(output_path, header_writer="default")
profasta.decoy.write_decoy_fasta(db, output_path, append=True)

Decoy headers are automatically prefixed with rev_. A custom prefix can be set via the decoy_tag argument:

profasta.decoy.write_decoy_fasta(db, output_path, append=True, decoy_tag="decoy_")

Contributors

Juraj Ahel - @xeniorn

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

0.1.1

Mar 21, 2026

This version

0.1.0

Mar 20, 2026

0.0.5

Apr 19, 2024

0.0.4

Feb 16, 2024

0.0.3

Feb 16, 2024

0.0.2

Feb 14, 2024

0.0.1

Feb 1, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profasta-0.1.0.tar.gz (39.4 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

profasta-0.1.0-py3-none-any.whl (17.1 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file profasta-0.1.0.tar.gz.

File metadata

Download URL: profasta-0.1.0.tar.gz
Upload date: Mar 20, 2026
Size: 39.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Linux Mint","version":"22.2","id":"zara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for profasta-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`aa8c2e49a441c90b450bf55860c4cc0dc345e8441e5292e201570a91f4cc7d90`
MD5	`b88f2d28d469e438041081d37d5adc94`
BLAKE2b-256	`e0a164030ef34629e3bbaed3d5a378b31cbcdf629d3408d8661f3af443e1466b`

See more details on using hashes here.

File details

Details for the file profasta-0.1.0-py3-none-any.whl.

File metadata

Download URL: profasta-0.1.0-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 17.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Linux Mint","version":"22.2","id":"zara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for profasta-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d09c3ed6e5e395b3ac581b45f7a7a359e457cdf191d7c1d8c9ec7fc0af39db70`
MD5	`ff04194be32807c9b7facc99c5c160fc`
BLAKE2b-256	`6cc5cc9bfee96ef4814988bcf03e30364e9e44ba3192d024502cb55e25e2ed52`

See more details on using hashes here.

profasta 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ProFASTA

Introduction

Similar projects

Requirements

Installation

Key concepts

FASTA parsing

Header parsers and the registry

ProteinDatabase

Header validation

Usage examples

Load a UniProt FASTA file and access a protein entry

Combine multiple FASTA files and add decoy entries

Contributors

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes