Sanitise protein FASTA files / data
Project description
tidyfasta
A python program to tidy and sanitise FASTA sequence files.
It can be imported as a package or used directly from the command line.
Problems and fixes
Problem | Fix |
---|---|
Sequence without ID | ID name added |
ID without sequence | Exception raised |
Multiline sequence | One line per sequence |
Non canonical AA | Exception raise |
Dangerous characters in ID | Exception raise |
Lowercase AA | Converts to uppercase AA |
Excessive Whitespace | Removes excessive whitespace |
Install
pip install tidyfasta
Usage
Command line interface
$ tidyfasta --input file.txt
$ tidyfasta --input file.txt --strict --single
Script
from tidyfasta.common.process import ProcessFasta
input_file = "sample.txt"
np = ProcessFasta(input_file, strict=True, single=False)
fasta_array = np.get_fasta()
print(fasta_array)
for i in np.validated_array:
print(i.id+"\n")
print(i.sequence+"\n")
np.write_fasta()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tidyfasta-1.0.4.tar.gz
(3.8 kB
view hashes)
Built Distribution
Close
Hashes for tidyfasta-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90328ba188deb60127133c16019b8238ce099e173d34bb12e18c50c659f56f3f |
|
MD5 | a5d287698a72272c726635cdc57270f9 |
|
BLAKE2b-256 | d6b5bb0fb5fb11f074c331717a133bc3bf9d9fe71b87e6f5bd384edc216373d9 |