Skip to main content

A Python library for working with protein containing FASTA files.

Project description

ProFASTA

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. Python Version from PEP 621 TOML pypi unit-tests

Introduction

ProFASTA is a Python library for working with FASTA files containing protein records. Unlike other packages, ProFASTA prioritizes simplicity, while aiming to provide a set of useful features required in the field of proteomics based mass spectrometry.

The library is still in early development and the interface might change over time. At the current stage ProFASTA provides functionality for parsing and writing FASTA files, as well as for providing access to protein records imported from FASTA files.

ProFASTA is developed as part of the computational toolbox for the Mass Spectrometry Facility at the Max Perutz Labs (University of Vienna).

Similar projects

If ProFASTA doesn't meet your requirements, consider exploring these alternative Python packages with a focus on protein-containing FASTA files:

  • fastapy is a lightweight package with no dependencies that offers FASTA reading functionality.
  • protfasta is another library with no dependencies that provides reading functionality along with basic validation (e.g., duplicate headers, conversion of non-canonical amino acids). The library also allows writing FASTA files with the ability to specify the sequence line length.
  • pyteomics is a feature-rich package that provides tools to handle various sorts of proteomics data. It provides functions for FASTA reading, automatic parsing of headers (in various formats defined at uniprot.org), writing, and generation of decoy entries. Note that pyteomics is a large package with many dependencies.

Usage example

The following code snippet shows how to import a FASTA file containing UniProt protein entries, retrieve a protein record by its UniProt accession number and print its gene name:

>>> import profasta
>>> 
>>> fasta_path = "./examples/uniprot_hsapiens_10entries.fasta"
>>> db = profasta.db.ProteinDatabase()
>>> db.add_fasta(fasta_path, header_parser="uniprot")
>>> protein_record = db["O75385"]
>>> print(protein_record.header_fields["gene_name"])
ULK1

For more examples how to use the ProFASTA library please refer to the code snippets Jupyter notebook.

Requirements

Python >= 3.9

Installation

The following command will install the latest version of ProFASTA and its dependencies from PyPi, the Python Packaging Index:

pip install profasta

To uninstall the ProFASTA library use:

pip uninstall profasta

Planned features

Main requirements

  • parse FASTA file
  • parse FASTA header
    • built-in parser that never fails
    • built-in parser for uniprot format
    • allow user defined parser
  • write FASTA file
    • allow custom FASTA header generation

Additional features

  • read multiple FASTA files and write a combined file
  • add protein records to an existing FASTA file
  • generate decoy protein records by reversing the sequence
    • add decoy protein records to an existing FASTA file
  • validate FASTA file / FASTA records

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profasta-0.0.5.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

profasta-0.0.5-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file profasta-0.0.5.tar.gz.

File metadata

  • Download URL: profasta-0.0.5.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for profasta-0.0.5.tar.gz
Algorithm Hash digest
SHA256 dbc6f3648f96d09ba32e692e0ccc0fd18eb84089a33f81dd10878a7a81404e63
MD5 84541116038ee5ccd1f265b850e0c436
BLAKE2b-256 a27b1275581e0ecb5634ab6aba50dd9b5a90c84bf8622b57d082e5f426e069c4

See more details on using hashes here.

File details

Details for the file profasta-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: profasta-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for profasta-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9fe15c29079d7d0fc8a4c1749dae7115f06aabbf13e94d68d7585c98499cf1c8
MD5 19e3834c278e7a7e057d38d190e489cf
BLAKE2b-256 4728f74c6c19e28c3130333522b1997e31ced98510616095705dc1668dfcbbab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page