A very simple fasta file parser.
Project description
FastaFrames
FastaFrames is a python package to convert between FASTA files and pandas DataFrames.
Usage
To install fastaframes use pip:
pip install fastaframes
Reading a FASTA file
from fastaframes import to_df
fasta_df = to_df(data='example.fasta')
Writing a FASTA file
from fastaframes import to_fasta
to_fasta(data=fasta_df, output_file='output.fasta')
Columns:
- db: Database from which the sequence was retrieved. db is 'sp' for UniProtKB/Swiss-Prot and 'tr' for UniProtKB/TrEMBL.
- unique_identifier: The primary accession number of the UniProtKB entry.
- entry_name: The entry name of the UniProtKB entry.
- protein_name: The recommended name of the UniProtKB entry as annotated in the RecName field. For UniProtKB/TrEMBL entries without a RecName field, the SubName field is used. In case of multiple SubNames, the first one is used. The 'precursor' attribute is excluded, 'Fragment' is included with the name if applicable.
- organism_name: The scientific name of the organism of the UniProtKB entry.
- organism_identifier: The unique identifier of the source organism, assigned by the NCBI.
- gene_name: The first gene name of the UniProtKB entry. If there is no gene name, OrderedLocusName or ORFname, the GN field is not listed.
- protein_existence: The numerical value describing the evidence for the existence of the protein.
- sequence_version: The version number of the sequence.
- protein_sequence: The protein amino acid sequence.
Example FASTA file:
>sp|A0A087X1C5|CP2D7_HUMAN Putative cytochrome P450 2D7 OS=Homo sapiens OX=9606 GN=CYP2D7 PE=5 SV=1
MGLEALVPLAMIVAIFLLLVDLMHRHQRWAARYPPGPLPLPGLGNLLHVDFQNTPYCFDQ
Will produce the following:
| db | unique_identifier | entry_name | protein_name | organism_name | organism_identifier | gene_name | protein_existence | sequence_version | protein_sequence | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | sp | A0A087X1C5 | CP2D7_HUMAN | Putative cytochrome P450 2D7 | Homo sapiens | 9606.0 | CYP2D7 | 5.0 | 1.0 | MGLEALVPLAMIVAIFLLLVDLMHRHQRWAARYPPGPLPLPGLGNLLHVDFQNTPYCFDQ |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fastaframes-1.2.2.tar.gz
(11.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastaframes-1.2.2.tar.gz.
File metadata
- Download URL: fastaframes-1.2.2.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0947c96feffdf64562ae100da3aca597903ee6b1d7b91ca1d594498ec2b2d7c6
|
|
| MD5 |
3ee66f3b9216918437e57fd6ff47cfc5
|
|
| BLAKE2b-256 |
541314d1ccd63b445c71dbfa7850eeeea346d1148c17a0d79a1265561d18643b
|
File details
Details for the file fastaframes-1.2.2-py3-none-any.whl.
File metadata
- Download URL: fastaframes-1.2.2-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
087f62c3f75ad2f466a51f9eb7215c0b3278a4d9f1d1358fc808c38e7a693a4e
|
|
| MD5 |
6be7e02ed0bd35789665c894a8c9b2ac
|
|
| BLAKE2b-256 |
a8a957b28bbe39fcb25362985f19a53e109332df969072d54bfc8d059e871acd
|