Parses fasta files using templates and creates formatted fasta.
Project description
Introduction
The tfasta python package simplifies working with fasta, providing functionality for both reading and writing fasta files. The “t” in “tfasta” represents “templated”, which means that fasta parsing is performed according to pre-defined or user-defined templates:
>>> from tfasta import fasta_parser, T_NR >>> fast = fasta_parser("cytb.fas", template=T_NR) >>> f = fast.next() >>> print f['gi'] 5524211 >>> print f['accession'] AAD44166.1 >>> print f['sequence'][:60] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFS
This example parses records that follow the conventions of the NCBI non-redundant database (nr).
More examples are given in the tfasta full documentation.
Installation
Install tfasta with pip (recommended) or easy_install:
sudo pip install tfasta
Optionally, download the source files from http://pypi.python.org/pypi/tfasta/ and run the following commands in the source directory:
python setup.py build sudo python setup.py install
Home Page & Repository
Home Page: http://pypi.python.org/pypi/tfasta/
Documentation: http://pythonhosted.org/tfasta/
Repository: https://github.com/jcstroud/tfasta/
Basic Usage
Reading Fasta Files
Reading fasta files is performed with the fasta_parser() function. The following text is the first 2 records from a file called “short-extended.fas”:
>gi|32033604|ref|ZP_00133915.1| ATGQVIGTFTVRNDNGLHARPSAVLVQTLKPFAAKVTVENLDRGTAPANAKSTMKVVALG ASQAHRLRFVAEGEDAQQAIEALAKAFVEGLGESVSFVPAVEDTIEGAAQPQAVESAKNF ANPTASEPTVEGQVEGTFVIQNEHGLHARPSAVLVNEVKKYNATIVVQNLDRNTQLVSAK SLMKIVALGVVKGHRLHFVATGDDAQKAIDGIGEAIAAGLGE >gi|1573424|gb|AAC22107.1| VEGAVVGTFTIRNEHGLHARPSANLVNEVKKFTSKITMQNLTRESEVVSAKSLMKIVALG VTQGHRLRFVAEGEDAKQAIESLGKAIANGLGENVSAVPPSEPDTIEIMGDQIHTPAVTE DDNLPANAIEAVFVIKNEQGLHARPSAILVNEVKKYNASVAVQNLDRNSQLVSAKSLMKI VALGVVKGTRLRFVATGEEAQQAIDGIGAVIESGLGE
Like any other fasta file, short-extended.fas may be parsed with a single command:
fast = fasta_parser(file_name)
For example:
>>> from tfasta import fasta_parser >>> fast = fasta_parser("short-extended.fas") >>> f = fast.next() >>> print f['name'] gi|32033604|ref|ZP_00133915.1| >>> print f['sequence'][:60] ATGQVIGTFTVRNDNGLHARPSAVLVQTLKPFAAKVTVENLDRGTAPANAKSTMKVVALG f = fast.next() print f['name'] gi|1573424|gb|AAC22107.1|
In this example, the fasta_parser() function returns an iterator of dictionaries (”fast”) with two keys: name and sequence. The name key corresponds to all of the plain text after the fasta format marker “>” that marks a new sequence.
Iteration
The iterator returned by the fasta_parser() function may serve in for loops:
>>> from tfasta import fasta_parser >>> for f in fasta_parser("short-extended.fas"): ... print f['name'] gi|32033604|ref|ZP_00133915.1| gi|1573424|gb|AAC22107.1| [...]
Other Usage
See the tfasta full documentation for more sophisticated reading and writing of fasta.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file tfasta-0.3.4.tar.gz
.
File metadata
- Download URL: tfasta-0.3.4.tar.gz
- Upload date:
- Size: 122.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30d9de9ccbd028cfbd8ebadc0487ecd40d8e535f4287bf5829cc0eeba90b79b1 |
|
MD5 | 5dfad15e4c9e0a8308cab6b0cf914529 |
|
BLAKE2b-256 | ba5eb1d8abdac8bff59e0db7ad0c4f8cfd0edf57bd812e98aa26869adab00265 |