Sanitise protein FASTA files / data
Project description
tidyfasta
A python program to tidy and sanitise FASTA sequence files.
It can be imported as a package or used directly from the command line.
Problems and fixes
Problem | Fix |
---|---|
Sequence without ID | ID name added |
ID without sequence | Exception raised |
Multiline sequence | One line per sequence |
Non canonical AA | Exception raise |
Dangerous characters in ID | Exception raise |
Lowercase AA | Converts to uppercase AA |
Excessive Whitespace | Removes excessive whitespace |
Install
pip install tidyfasta
Usage
Command line interface
$ tidyfasta --input file.txt
$ tidyfasta --input file.txt --strict --single
Script
from tidyfasta.common.process import ProcessFasta
input_file = "sample.txt"
np = ProcessFasta(input_file, strict=True, single=False)
fasta_array = np.get_fasta()
print(fasta_array)
for i in np.validated_array:
print(i.id+"\n")
print(i.sequence+"\n")
np.write_fasta()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tidyfasta-1.0.3.tar.gz
(3.7 kB
view hashes)
Built Distribution
Close
Hashes for tidyfasta-1.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd8e7ba1b0d071601a1561ad34bbf6df470f56150c1fd25a3f9cf47a046899e5 |
|
MD5 | ca054852c83ab76659777c8911af4e81 |
|
BLAKE2b-256 | 9c3dc4618f70e84b028244e4f17ab3b825a4898159e3e8d8ed5a02886745788a |