Skip to main content

Simplest possible fasta parser

Project description

https://travis-ci.org/nhoffman/fastalite.svg?branch=master

The simplest possible fasta and fastq parsers I could come up with. Useful for simple manipulations of sequence files without creating complex dependencies.

The fastalite and fastqlite functions return an iterator of namedtuples, each with attributes id, (the header line before the first whitespace) description (the entire header line), and seq (the sequence as a string). fastqlite output also has an attribute qual containing the quality scores. For example:

from fastalite import fastalite

with open('inseqs.fasta') as infile, open('outseqs.fasta', 'w') as outfile:
    for seq in fastalite(infile):
        outfile.write('>{}\n{}\n'.format(seq.id, seq.seq))

The fastqlite parser also performs some limited error checking and raises ValueError when it encounters a malformed record.

The Opener class may be used in place of argparse.FileType to support transparent reading and writing of compressed files (inferred from a .gz or .bz2 suffix), for example:

import argparse
from fastalite import Opener, fastalite

parser = argparse.ArgumentParser()
parser.add_argument('infile', type=Opener())
args = parser.parse_args(arguments)
seqs = fastalite(args.infile)

You can perform a few actions on input files using the command line interface. For a list of available commands:

python -m fastalite -h

Installation

Compatible with Python versions 2.7 and 3.4+

Install from PyPi using pip:

pip install fastalite

Or install directly from the git repository:

pip install git+https://github.com/nhoffman/fastalite.git

Testing

Run all tests like this:

python setup.py test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastalite-0.4.1.tar.gz (5.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page