Sanitise protein FASTA files / data
Project description
tidyfasta
A python program to tidy and sanitise FASTA sequence files.
It can be imported as a package or used directly from the command line.
Problems and fixes
Problem | Fix |
---|---|
Sequence without ID | ID name added |
ID without sequence | Exception raised |
Multiline sequence | One line per sequence |
Non canonical AA | Exception raise |
Dangerous characters in ID | Exception raise |
Lowercase AA | Converts to uppercase AA |
Excessive Whitespace | Removes excessive whitespace |
Usage
Command line interface
$ tidyfasta --input file.txt
$ tidyfasta --input file.txt --strict --single
Script
from tidyfasta.common.process import ProcessFasta
input_file = "sample.txt"
np = ProcessFasta(input_file, strict=True, single=False)
fasta_array = np.get_fasta()
print(fasta_array)
for i in np.validated_array:
print(i.id+"\n")
print(i.sequence+"\n")
np.write_fasta()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tidyfasta-1.0.2.tar.gz
(3.7 kB
view hashes)
Built Distribution
Close
Hashes for tidyfasta-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86edf1ab11ca2277731cdabbbbf56a292a92aa72a4ac4746c1f35691b0c2f74c |
|
MD5 | 72cec3bf00336cee0cc177042c1d26ec |
|
BLAKE2b-256 | 1280d6a5360d607f91aafd066bf0df30ad92a0b3e93c4b47515364afd7a6776e |