Skip to main content

Parses fasta files using templates and creates formatted fasta.

Project description

Introduction

The tfasta python package simplifies working with fasta, providing functionality for both reading and writing fasta files. The “t” in “tfasta” represents “templated”, which means that fasta parsing is performed according to pre-defined or user-defined templates:

>>> from tfasta import fasta_parser, T_NR
>>> fast = fasta_parser("cytb.fas", template=T_NR)
>>> f = fast.next()
>>> print f['gi']
5524211
>>> print f['accession']
AAD44166.1
>>> print f['sequence'][:60]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFS

This example parses records that follow the conventions of the NCBI non-redundant database (nr).

More examples are given in the tfasta full documentation.

Installation

Install tfasta with pip (recommended) or easy_install:

sudo pip install tfasta

Optionally, download the source files from http://pypi.python.org/pypi/tfasta/ and run the following commands in the source directory:

python setup.py build
sudo python setup.py install

Home Page & Repository

Basic Usage

Reading Fasta Files

Reading fasta files is performed with the fasta_parser() function. The following text is the first 2 records from a file called “short-extended.fas”:

>gi|32033604|ref|ZP_00133915.1|
ATGQVIGTFTVRNDNGLHARPSAVLVQTLKPFAAKVTVENLDRGTAPANAKSTMKVVALG
ASQAHRLRFVAEGEDAQQAIEALAKAFVEGLGESVSFVPAVEDTIEGAAQPQAVESAKNF
ANPTASEPTVEGQVEGTFVIQNEHGLHARPSAVLVNEVKKYNATIVVQNLDRNTQLVSAK
SLMKIVALGVVKGHRLHFVATGDDAQKAIDGIGEAIAAGLGE
>gi|1573424|gb|AAC22107.1|
VEGAVVGTFTIRNEHGLHARPSANLVNEVKKFTSKITMQNLTRESEVVSAKSLMKIVALG
VTQGHRLRFVAEGEDAKQAIESLGKAIANGLGENVSAVPPSEPDTIEIMGDQIHTPAVTE
DDNLPANAIEAVFVIKNEQGLHARPSAILVNEVKKYNASVAVQNLDRNSQLVSAKSLMKI
VALGVVKGTRLRFVATGEEAQQAIDGIGAVIESGLGE

Like any other fasta file, short-extended.fas may be parsed with a single command:

fast = fasta_parser(file_name)

For example:

>>> from tfasta import fasta_parser
>>> fast = fasta_parser("short-extended.fas")
>>> f = fast.next()
>>> print f['name']
gi|32033604|ref|ZP_00133915.1|
>>> print f['sequence'][:60]
ATGQVIGTFTVRNDNGLHARPSAVLVQTLKPFAAKVTVENLDRGTAPANAKSTMKVVALG
f = fast.next()
print f['name']
gi|1573424|gb|AAC22107.1|

In this example, the fasta_parser() function returns an iterator of dictionaries (”fast”) with two keys: name and sequence. The name key corresponds to all of the plain text after the fasta format marker “>” that marks a new sequence.

Iteration

The iterator returned by the fasta_parser() function may serve in for loops:

>>> from tfasta import fasta_parser
>>> for f in fasta_parser("short-extended.fas"):
...   print f['name']
gi|32033604|ref|ZP_00133915.1|
gi|1573424|gb|AAC22107.1|
[...]

Other Usage

See the tfasta full documentation for more sophisticated reading and writing of fasta.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tfasta-0.3.4.tar.gz (122.9 kB view details)

Uploaded Source

File details

Details for the file tfasta-0.3.4.tar.gz.

File metadata

  • Download URL: tfasta-0.3.4.tar.gz
  • Upload date:
  • Size: 122.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for tfasta-0.3.4.tar.gz
Algorithm Hash digest
SHA256 30d9de9ccbd028cfbd8ebadc0487ecd40d8e535f4287bf5829cc0eeba90b79b1
MD5 5dfad15e4c9e0a8308cab6b0cf914529
BLAKE2b-256 ba5eb1d8abdac8bff59e0db7ad0c4f8cfd0edf57bd812e98aa26869adab00265

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page