A very simple fasta file parser.
Project description
FastaFrames
This Python module provides a set of functions to work with FASTA files. It allows you to read FASTA files, convert them to pandas dataframes, manipulate data, and write data back to FASTA files. It also supports converting FASTA files to a list of FastaEntry dataclass objects.
Features
- Read FASTA files into pandas DataFrames
- Write FASTA files from pandas DataFrames
Usage
To install fastaframes use pip:
pip install fastaframes
Reading FASTA files
To read a FASTA file and convert it to a pandas DataFrame:
from fastaframes import to_df
# IO input
with open('example.fasta', 'r') as fasta_io:
fasta_df = to_df(fasta_data=fasta_io)
# or
# File input
fasta_df = to_df(fasta_data='example.fasta')
print(fasta_df.head())
Writing FASTA files
To write a pandas DataFrame to a FASTA file:
from fastaframes import to_fasta
# Write StringIO to file
fasta_io = to_fasta(fasta_data=fasta_df) # outputs StringIO if file=None
with open('output.fasta', 'w') as output_file:
output_file.write(fasta_io.getvalue())
# or
# Write directly to file
to_fasta(fasta_data=fasta_df, file='output.fasta')
Example DataFrame:
db | unique_identifier | entry_name | protein_name | organism_name | organism_identifier | gene_name | protein_existence | sequence_version | protein_sequence | |
---|---|---|---|---|---|---|---|---|---|---|
0 | sp | A0A087X1C5 | CP2D7_HUMAN | Putative cytochrome P450 2D7 | Homo sapiens | 9606.0 | CYP2D7 | 5.0 | 1.0 | MGLEALVPLAMIVAIFLLLVDLMHRHQRWAARYPPGPLPLPGLGNLLHVDFQNTPYCFDQ |
1 | sp | A0A0B4J2F2 | SIK1B_HUMAN | Putative serine/threonine-protein kinase SIK1B | Homo sapiens | 9606.0 | SIK1B | 5.0 | 1.0 | MVIMSEFSADPAGQGQGQQKPLRVGFYDIERTLGKGNFAVVKLARHRVTKTQVAIKIIDKLVQ |
2 | sp | A0A0C5B5G6 | MOTSC_HUMAN | Mitochondrial-derived peptide MOTS-c | Homo sapiens | 9606.0 | MT-RNR1 | 1.0 | 1.0 | MRWQEMGYIFYPRKLR |
3 | sp | A0A0K2S4Q6 | CD3CH_HUMAN | Protein CD300H | Homo sapiens | 9606.0 | CD300H | 1.0 | 1.0 | MTQRAGAAMLPSALLLLCVPGCLTVSGPSTVMGAVGESLSVQCRYEEKYKTFNKYWCRQP |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fastaframes-0.0.2.tar.gz
(8.3 kB
view hashes)
Built Distribution
Close
Hashes for fastaframes-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfd3518863cd1026df35758e373e09b94757345ca7ec8e45989e10099b6e2a2e |
|
MD5 | 93c537253a99c20af9f1e1c510f9404c |
|
BLAKE2b-256 | f6cfb40608536537b8c89bdcc9948da22c6be7695b8ccb13cb9b5e37a7536fc8 |