Skip to main content

lorax -- speaks for the (phylogenetic) trees.

Project description

lorax “speaks for the trees” by calculating multiple sequence alignments and trees.

How does lorax calculate multiple sequence alignments?

lorax uses a pre-calculated HMM and hmmalign to calculate multiple sequence alignments in either peptide or DNA space.

How does lorax calculate phylogenetic trees?

Phylogenetic trees are built from either DNA or protein sequences by external processes in a computationally-intensive step that can take from minutes to days depending on the number of sequences and their sizes. Generally the computational time increases linearly with the average length of sequences and as the square of the number of input sequences. The tree-building software that lorax knows about the following tree builders:

Tree Builder

Description

FastTree

FastTree is the fastest tree-builder and is the default tree-building algorithm.

RAxML

RAxML is believed to be the more accurate tree-building algorithm, but at its fastest is probably 100x slower than FastTree. It is possible to the RaxML EPA algorithm to do placement of test sequences on existing RaxML trees.

URLs

lorax interacts through standard http GET and POST commands. POST of sequence data and calculation of trees results in permanent URLs that lorax will serve up until such time as the corresponding disk entries are deleted. POSTing to existing sequence data will result in over-writing.

URL

Interpretation

/log.txt

A GET of this URL returns the log file of the current run.

/rq

A dashboard of the queues used by RQ, if asynchronous queueing is in use.

/trees/families.json

A GET of this URL returns a JSON list of defined families.

/trees/<family>/sequences

POST a FASTA-formatted set of aligned sequences. ID fields must be unique and use UTF-8 encoding. If the multipart key is peptide, the sequences are assumed to be protein sequences; if the key is DNA, DNA sequences are assumed. <family> is used as a directory name and must not include special characters such as /. See the ‘’post_to_lorax.sh`` script in the test/ directory for an example of how to post. Returns a JSON object that gives information about the number and length of submitted sequences. Throws a 400 error if a proper multipart key is not found. Throws a 406 error if empty or improperly-formatted FASTA file is sent.

/trees/<family>/alignment

POST a FASTA-formatted set of aligned sequences. Same as for sequences above, except dash (-) characters are used as spacings in alignments.

/trees/<family>/HMM

PUT a family HMM for use with hmmalign. Throws a 400 error if family has not been previously created. Returns a JSON dictionary of HMM stats. Throws a 406 error if the HMM definition is not valid.

/trees/<family>/hmmalign

A GET of this URL will cause an HMM alignment to be calculated. This step is not needed if an alignment is supplied. Throws a 400 error if a sequence file is not found. Throws a 417 error if hmmalign returned a non-zero process code.

/trees/<family>/FastTree

A GET of this URL will cause a FastTree tree to be calculated and a Newick tree to be returned. This operation may take a long time and result in a timeout, which is why polling methods are provided. Throws a 417 error if an improper family name was provided. Throws a 404 error if family was not previously created. Throws a 405 error if an alignment is not found. Throws a 417 error if

/trees/<family>/RAxML

Same as above, execpt using RAxML as the tree builder.

/trees/<fam>/hmmalign_FastTree

A GET of this URL will cause the alignment and tree- building steps to be chained.

/trees/<fam>/<meth>/status

Returns -1 if tree calculation is ongoing, and the exit code of the tree-builder <meth> if calculation is complete.

/trees/<fam>/<meth>/tree.nwk

Returns already-calculated tree in Newick format.

/trees/<fam>/<meth>/tree.xml

Returns already-calculated tree in phyloXML format.

/trees/<fam>/<meth>/run_log.txt

Returns the log file, including timings, of the tree calculation.

/trees/<fam>.<super>/sequences

POST additional sequences to be considered a superfamily of existing family <fam>. <super> cannot be a reserved name such as FastTree. These sequences will be concatenated to the existing family sequences, with <super> prepending ID strings.

/trees/<family>.<superfamily>/

DELETE a superfamily.

/trees/<f>.<s>/<meth>/status

Returns status of a superfamily tree calculation.

/trees/<f>.<s>/<meth>/tree.nwk

Returns tree of a superfamily in Newick format.

/trees/<f>.<s>/<meth>/tree.xml

Returns tree of a superfamily in phyloXML format.

/trees/<f>.<s>/<m>/run_log.txt

Returns the log file of a superfamily tree calculation.

Configuration

lorax is complex and configured in multiple layers. The lowest layer is from variables in the config.py file in the distribution, which defines multiple classes of configuration variable settings. These configuration modes are selected among at run time via the value of the environmental variable LORAX_CONFIGURATION:

LORAX_CONFIGURATION

Interpretation

base

A vanilla mode suitable for manual testing without debugging on.

development

A full debugging mode, not secure, highly verbose, and with synchronous operation (no queues).

testing

A mode suitable for running test suites.

deployment

Production mode, with logging to file.

The next level of configuration is via the pyfile config.cfg. Values placed in this file overlay values from config.py. The location of this file is instance-relative.

The lowest level of configuration is via environmental variables which begin with LORAX_ and which follow the values specified in the configuration pyfile.

Running lorax

lorax can be run via command line with no arguments.

If asynchronouse queues are switched on (modes other than development), then redis will need to be started at the address specified by the configuration variable RQ_REDIS_URL. After that, two RQ work queues will need to be started as well:

rqworker treebuilding rqworker alignment

It is also useful to run a dashboard (via rq-dashboard).

lorax is intended to be run in a trusted environment and contains no authentication. It should be run on ports that are accessible only to trusted hosts. Running lorax on a public port opens the possibility of denial-of-service attacks.

We recommend that lorax be run in a virtual environment if on a shared server. However, the shell scripts willwork for real environments as well.

Files

Log files with time-stamped names will be created in the directory specified by PATHS['log'], by default an instance-relative log/ directory.

Data files are created in the directory specified by PATHS['data'], by default an instance-relative data/ directory. In deployment, this will usually be an absolute path.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lorax-0.94.9.tar.gz (299.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page