Skip to main content

Sanitise protein FASTA files / data

Project description

tidyfasta

A python program to tidy and sanitise FASTA sequence files.

It can be imported as a package or used directly from the command line.

Problems and fixes

Problem Fix
Sequence without ID ID name added
ID without sequence Exception raised
Multiline sequence One line per sequence
Non canonical AA Exception raise
Dangerous characters in ID Exception raise
Lowercase AA Converts to uppercase AA
Excessive Whitespace Removes excessive whitespace

Usage

Command line interface

$ tidyfasta --input file.txt

$ tidyfasta --input file.txt --strict --single

Script

from tidyfasta.common.process import ProcessFasta

input_file = "sample.txt"

np = ProcessFasta(input_file, strict=True, single=False)

fasta_array = np.get_fasta()
print(fasta_array)

for i in np.validated_array:
    print(i.id+"\n")
    print(i.sequence+"\n")

np.write_fasta()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidyfasta-1.0.2.tar.gz (3.7 kB view hashes)

Uploaded Source

Built Distribution

tidyfasta-1.0.2-py3-none-any.whl (4.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page