Skip to main content

Accurarate amplicon alignment to gene consensus

Project description

Amplicons to Global Gene (A2G2)

This program implements the progressive algorithm to align a large set of amplicons to a reference gene consensus, or a large set of sequences to an amplicon consensus, based on a reference consensus. This program makes use of traditional multiple aligners such as MAFFT (default), and muscle, and can be extended to other aligners.

Problem

Some taxonomic assignment software require a set of align sequences, both in the query as in the reference. Projects such as those using environmental DNA (eDNA) or trying to assess wide diversity using metagenomics often have a hard time creating such alignments, because of memory and computational restrictions. Another observation is that massive alignments often introduce more gaps in the sequences, and force alignment of segments that should not align in that region. Here is where A2G2 will use a global to local alignment to avoid such issues, and retained the ungapped alignment of the amplicons.

Basic usage

A2G2 will give you help by:

A2G -h

this should give you something like this:

A2G version: 2020.0.1
Copyright 2020 Jose Sergio Hleap

usage: A2G [-h] [--cpus CPUS] [--nowrite] [--out_prefix OUT_PREFIX]
           [--remove_duplicates]
           global_consensus local_consensus fasta

positional arguments:
  global_consensus      Sequence consensus of the global region, e.g. full COI
  local_consensus       Sequence consensus of the local region, e.g. Leray
                        fragment
  fasta                 fasta file with the focal sequences

optional arguments:
  -h, --help            show this help message and exit
  --cpus CPUS           number of cpus to use
  --nowrite             return string instead of writing
  --out_prefix OUT_PREFIX
                        Prefix of outputs
  --remove_duplicates   Keep or remove duplicated sequences

Then to run it, you can simply type:

A2G global_target local_target query_file --cpus 10 --out_prefix prefix --remove_duplicates

With this command, you will use the global_target as the overall region, the local_target as the amplicon reference sequence to anchor the query sequences, and query_file contains your query sequences. Those are the required arguments. The optional arguments allow you to control the execution. --cpus allow you to provide the number of cpus to use. In the example, up to 10 cpus will be used. --out_prefixchange the prefix of the outputs generated. Finally, the --remove_duplicates option will retain only unique sequences.

If the no_write option is used, A2G2 will output the alignment to standard out, and other info to standard error. If you would like to pipe only the alignment, you can redirect the standard error to a null device:

A2G global_target local_target query_file --cpus 10 --out_prefix prefix --no_write 2> /dev/null

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

A2G-2020.0.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distributions

A2G-2020.0.1-py3.6.egg (18.6 kB view details)

Uploaded Source

A2G-2020.0.1-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file A2G-2020.0.1.tar.gz.

File metadata

  • Download URL: A2G-2020.0.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for A2G-2020.0.1.tar.gz
Algorithm Hash digest
SHA256 b93434203f77c55549f675f7107309150b3d9c02b8dc9f4152e7e90327057018
MD5 68eba5a18a42379db956666a7bfaa421
BLAKE2b-256 c3952626d2eb8f9fc13071a47903347b183ee516948266fcd73100b5b810dec2

See more details on using hashes here.

File details

Details for the file A2G-2020.0.1-py3.6.egg.

File metadata

  • Download URL: A2G-2020.0.1-py3.6.egg
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for A2G-2020.0.1-py3.6.egg
Algorithm Hash digest
SHA256 1800edc41f01c29d29383eee387314d2b3b1b9b64301c7462d4176f7e79928a6
MD5 54af9488487c71615cf0871adf80b891
BLAKE2b-256 2a7b08b46000830082bdb27516bcf466394b9bb04c83654ae28a6405c2122329

See more details on using hashes here.

File details

Details for the file A2G-2020.0.1-py3-none-any.whl.

File metadata

  • Download URL: A2G-2020.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for A2G-2020.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7027171b68699459460f4a5097c2d2c9363bf982142e773cbd7c7083be2d9752
MD5 13c33c7b87c7d4fc9b47816bee65adc5
BLAKE2b-256 d6a9a0f14c64c81ecb1694a3dc09a797739d14901e528cd4d2722f2cb946b336

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page