Skip to main content

Parses fasta files using templates and creates formatted fasta.

Project description

Introduction

The tfasta python package simplifies working with fasta, providing functionality for both reading and writing fasta files. The “t” in “tfasta” represents “templated”, which means that fasta parsing is performed according to pre-defined or user-defined templates:

>>> from tfasta import fasta_parser, T_NR
>>> fast = fasta_parser("cytb.fas", template=T_NR)
>>> f = fast.next()
>>> print f['gi']
5524211
>>> print f['accession']
AAD44166.1
>>> print f['sequence'][:60]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFS

This example parses records that follow the conventions of the NCBI non-redundant database (nr).

More examples are given in the tfasta full documentation.

Installation

Install tfasta with pip (recommended) or easy_install:

sudo pip install tfasta

Optionally, download the source files from http://pypi.python.org/pypi/tfasta/ and run the following commands in the source directory:

python setup.py build
sudo python setup.py install

Basic Usage

Reading Fasta Files

Reading fasta files is performed with the fasta_parser() function. The following text is the first 2 records from a file called “short-extended.fas”:

>gi|32033604|ref|ZP_00133915.1|
ATGQVIGTFTVRNDNGLHARPSAVLVQTLKPFAAKVTVENLDRGTAPANAKSTMKVVALG
ASQAHRLRFVAEGEDAQQAIEALAKAFVEGLGESVSFVPAVEDTIEGAAQPQAVESAKNF
ANPTASEPTVEGQVEGTFVIQNEHGLHARPSAVLVNEVKKYNATIVVQNLDRNTQLVSAK
SLMKIVALGVVKGHRLHFVATGDDAQKAIDGIGEAIAAGLGE
>gi|1573424|gb|AAC22107.1|
VEGAVVGTFTIRNEHGLHARPSANLVNEVKKFTSKITMQNLTRESEVVSAKSLMKIVALG
VTQGHRLRFVAEGEDAKQAIESLGKAIANGLGENVSAVPPSEPDTIEIMGDQIHTPAVTE
DDNLPANAIEAVFVIKNEQGLHARPSAILVNEVKKYNASVAVQNLDRNSQLVSAKSLMKI
VALGVVKGTRLRFVATGEEAQQAIDGIGAVIESGLGE

Like any other fasta file, short-extended.fas may be parsed with a single command:

fast = fasta_parser(file_name)

For example:

>>> from tfasta import fasta_parser
>>> fast = fasta_parser("short-extended.fas")
>>> f = fast.next()
>>> print f['name']
gi|32033604|ref|ZP_00133915.1|
>>> print f['sequence'][:60]
ATGQVIGTFTVRNDNGLHARPSAVLVQTLKPFAAKVTVENLDRGTAPANAKSTMKVVALG
f = fast.next()
print f['name']
gi|1573424|gb|AAC22107.1|

In this example, the fasta_parser() function returns an iterator of dictionaries (“fast”) with two keys: name and sequence. The name key corresponds to all of the plain text after the fasta format marker “>” that marks a new sequence.

Other Functionality

See the tfasta full documentation for more sophisticated reading and writing of fasta.

Project details


Release history Release notifications

History Node

0.3.4

History Node

0.3.2

This version
History Node

0.3.1

History Node

0.2.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
tfasta-0.3.1.tar.gz (122.5 kB) Copy SHA256 hash SHA256 Source None Apr 11, 2014

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page