Skip to main content

Automated Multi-Locus Sequence Analysis tool

Project description

Installation

automlsa2 is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows (untested) and supports Python 3.7+ and PyPy.

$ pip install --upgrade automlsa2

Dependencies

Python modules:

  1. pandas

  2. numpy

  3. biopython

  4. tqdm

See requirements.txt for more info.

External programs:

  1. NCBI BLAST+ >= 2.10.1

  2. mafft >= 7.471

  3. IQ-TREE COVID-19 release >= 2.1.1

You can install external programs using the automlsa2 --install_deps command. These will be installed to ${HOME}/.local/external unless otherwise specified.

Just tell me how to run it

$ automlsa2 --files Genus_species_1.fna Genus_species_2.fna ... \
  Genus_species_N.fna --query queries.fasta -t THREADS -- runID

Alternatively:

$ automlsa2 --dir path/to/genomes --query queries.fasta -t THREADS \
  -- runID

Overview

automlsa2 is a re-imagination of autoMLSA.pl

The entire codebase has been re-written in python. While the general algorithm produces similar output, and several steps are shared, there are many updates and differences between the two programs, which will be covered later.

The general overview can be summarized here:

  1. Input is a set of marker genes as queries, and a set of target genome FASTA files.

  2. BLAST databases are generated for each target genome, and each query gene is extracted from the input query FASTA files.

  3. BLAST searches are done with the extracted sequences and genomes.

  4. Per genome hits are calculated pending the cut-offs, and genomes are filtered from the analysis.

  5. Sequences are extracted from the BLAST results as unaligned multi-FASTAs.

  6. Unaligned sequences are aligned using mafft.

  7. A nexus file is generated pointing to all aligned sequences.

  8. A phylogenetic tree is generated using the nexus file as input.

BLAST searches are threaded, or, optionally, written to a file to be submitted to a compute cluster. mafft alignment commands can also be written to a file for submission to a compute cluster.

Input query files and genome directories are scanned for updates - if sequences are added, removed, or changed, the analysis is re-done.

Multiple queries targeting the same gene sequence can be used to improve coverage of disparate gene sequences, e.g. attempting to cover an entire phylum with multiple reference genomes being used.

Author Contact

Ed Davis

License

automlsa2 is distributed under the terms listed in the LICENSE file. The software is free to use for non-commercial use.

Copyrights

Copyright (c) 2020 Oregon State University All Rights Reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

automlsa2-0.1.0.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

automlsa2-0.1.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file automlsa2-0.1.0.tar.gz.

File metadata

  • Download URL: automlsa2-0.1.0.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.6

File hashes

Hashes for automlsa2-0.1.0.tar.gz
Algorithm Hash digest
SHA256 033b9fd4a88c45c6fd73498686cbd7739836050eddb2a1913c922e6e4f351501
MD5 d1d5e3875b87843dc6ce389f5bff0874
BLAKE2b-256 e14ce9c657a4ce7e388524d4cfd5634f0ef4bb783d1ae67f63d310898ab92993

See more details on using hashes here.

File details

Details for the file automlsa2-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: automlsa2-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.6

File hashes

Hashes for automlsa2-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17fa3d58b955d69578753a86df6f7b2103df6124de96cf280e58d003d5b59ee0
MD5 f6a75b3412c56629ba6b986ba2408040
BLAKE2b-256 5a179e96dc9c1847c2d872c793f913ebfded5848e61aeec4d0ecac82c18f172c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page