metaerg

Annotation of genomes and contigs

These details have not been verified by PyPI

Project links

Project description

metaerg.py, version 2.2.X

Metaerg.py annotates genomes or sets of mags/bins from microbial ecosystems (bacteria, archaea, viruses). Input data consists of nucleotide fasta files, one per genome or mag, each with one or more contigs. Output files with annotations are in common formats such as .gff, .gbk, .fasta and .html with predicted genes, their functions and taxonomic classifications.

You can interact with a sample visualization here and here. These visualizations show the annotation of a cyanobacterial genome, Candidatus Phormidium alkaliphilum. Unfortunately the interacive search box does not work with the github html visualization, so you need to download the html
files to your computer (i.e. using "git clone ..."), to try out the interactive part.

Metaerg was originally developed in perl. It was relatively challenging to install and comes with complex database dependencies. This new python version 2.2 overcomes some of those issues. Also, the annotation pipeline has further evolved and has become more refined.

By using gtdbtk for taxonomic classification of genes and transferring functional annotations from the NCBI, metaerg.py uses a controlled vocabulary for taxonomy and a relatively clean vocabulary for functions. This makes annotations much more concise than the original version of metaerg and many other annotation tools. In addition, metaerg uses NCBI's conserved domain database and RPSBlast to assign genes to subsystems for effective data exploration. Subsystems are a work in progress, and can be expanded and customized as needed.

The Metaerg 2.2 pipeline ...

predicts CRISPR regions using Minced.
predicts tRNAs using Aragorn.
predicts RNA genes and other non-coding features using Infernal - cmscan and RFAM.
predicts retrotransposons with LTR Harvest - LTRHarvest.
predicts tandem repeats with Tandem Repeats Finder.
predicts other repeat regions with Repeatscout and Repeatmasker.
predicts coding genes with Prodigal.
annotates taxonomy and functions of RNA and protein genes using Diamond, NCBI blastn and a database of 62,296 bacterial, 3,406 archaeal 11,569 viral and 139 eukaryotic genomes.
annotates gene functions using RPSBlast and NCBI's Conserved Domain Database (CDD).
annotates genes involved in production of secondary metabolites using Antismash.
annotates membrane amd translocated proteins using TMHMM and SignalP.
assigns genes to a built-in set of functions using HMMER and HMM profiles from MetaScan, HydDB and CANT-HYD.
presents annotations in datatables/jQuery-based intuititve, searchable, colorful HTML that can be explored in a web browser and copy/pasted into excel.
saves annotations in apache feather format for effective exploration, statistics and visualization with Jupyter or R.
enables the user to add custom HMMs and expand the set of functional genes as needed.

Usage:

metaerg --contig_file contig-file.fna --database_dir /path/to/metaerg-databases/

To annotate a set of genomes in a given dir (each file should contain the contigs of a single genome):

metaerg --contig_file dir-with-contig-files --database_dir /path/to/metaerg-databases/ --file_extension .fa

Metaerg needs ~40 min to annotate a 4 Mb genome on a desktop computer. There's a few more optional arguments, for a complete list, run:

metaerg -h

Installation

To install metaerg, its 18 helper programs (diamond, prodigal, etc.) and databases run the commands below. FIRST, you need to manually download signalp and tmhmm programs from here. Then:

python -m virtualenv metaerg-env
source metaerg-env/bin/activate
pip install --upgrade metaerg
metaerg --install_deps /path/to/bin_dir --database_dir /path/to/database_dir --path_to_signalp path/to/signalp.tar.gz \
  --path_to_tmhmm path/to/tmhmm.tar.gz
source /path/to/bin_dir/profile
metaerg --download_database --database_dir /path/to/metaerg-databases/

The database was created from the following sources:

gtdbtk is used for its taxonomy
NCBI annotations of >40K representative archael and bacterial genomes present in gtdb are sourced directly from the ncbi ftp server.
NCBI (refseq) annotations of viral genes are obtained from viral refseq.
For Eukaryotes, for each taxon within Amoebozoa, Ancyromonadida, Apusozoa, Breviatea, CRuMs, Cryptophyceae, Discoba, Glaucocystophyceae, Haptista, Hemimastigophora, Malawimonadida, Metamonada, Rhodelphea, Rhodophyta, Sar, Aphelida, Choanoflagellata, Filasterea, Fungi, Ichthyosporea, Rotosphaeridagenomes, one genome is added to the database using ncbi-datasets.
RFAM and CDD databases are also used.
Specialized function databases - Cant-Hyd and MetaScan.

If you for some reason need to build this database yourself (this is usually not needed as the metaerg database can be downloaded as shown above):

metaerg --create_database --database_dir /path/to/metaerg-databases/ --gtdbtk_dir /path/to/gtdbtk-database/ [--tasks [PVEBRC]]

with tasks:

P - build prokaryotes
V - build viruses
E - build eukaryotes
B - build PVE blast databases
R - build RFAM
C - build CDD
S - build specialized functional databases
A - build antismash database

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.5.10

Jan 31, 2025

2.5.9

Jun 12, 2024

2.5.8

May 21, 2024

2.5.7

May 21, 2024

2.5.6

Apr 17, 2024

2.5.5

Apr 17, 2024

2.5.4

Mar 14, 2024

2.5.2

Mar 5, 2024

2.5.1

Feb 21, 2024

2.5.0

Feb 20, 2024

2.4.0

Dec 18, 2023

2.3.41

Jun 2, 2023

2.3.40

Jun 2, 2023

2.3.39

Apr 20, 2023

2.3.38

Mar 24, 2023

2.3.37

Mar 21, 2023

2.3.36

Mar 14, 2023

2.3.35

Mar 13, 2023

2.3.34

Mar 13, 2023

2.3.33

Mar 10, 2023

2.3.32

Mar 10, 2023

2.3.31

Mar 9, 2023

2.3.30

Mar 9, 2023

2.3.29

Mar 9, 2023

2.3.28

Mar 9, 2023

2.3.27

Mar 8, 2023

2.3.26

Mar 7, 2023

2.3.25

Mar 7, 2023

2.3.24

Mar 7, 2023

2.3.23

Mar 6, 2023

2.3.22

Feb 21, 2023

2.3.20

Feb 17, 2023

2.3.19

Feb 13, 2023

2.3.18

Feb 4, 2023

2.3.17

Feb 3, 2023

2.3.16

Jan 31, 2023

2.3.15

Jan 30, 2023

2.3.14

Jan 30, 2023

2.3.13

Jan 29, 2023

2.3.12

Jan 27, 2023

2.3.11

Jan 27, 2023

2.3.10

Jan 20, 2023

2.3.9

Jan 16, 2023

2.3.8

Jan 9, 2023

2.3.7

Jan 9, 2023

2.3.6

Jan 5, 2023

2.3.5

Jan 5, 2023

2.3.4

Dec 19, 2022

2.3.3

Dec 16, 2022

2.3.2

Dec 7, 2022

2.3.1

Dec 7, 2022

2.3.0

Dec 7, 2022

2.2.37

Nov 28, 2022

2.2.36

Nov 24, 2022

This version

2.2.35

Nov 11, 2022

2.2.34

Nov 10, 2022

2.2.33

Nov 10, 2022

2.2.32

Nov 8, 2022

2.2.31

Nov 5, 2022

2.2.30

Nov 3, 2022

2.2.29

Nov 3, 2022

2.2.28

Nov 1, 2022

2.2.27

Nov 1, 2022

2.2.25

Oct 21, 2022

2.2.24

Oct 20, 2022

2.2.23

Oct 7, 2022

2.2.22

Oct 1, 2022

2.2.21

Sep 22, 2022

2.2.20

Sep 12, 2022

2.2.19

Sep 9, 2022

2.2.18

Aug 3, 2022

2.2.17

Aug 3, 2022

2.2.16

Jul 28, 2022

2.2.15

Jul 18, 2022

2.2.12

Jul 5, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaerg-2.2.35.tar.gz (64.1 kB view details)

Uploaded Nov 11, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

metaerg-2.2.35-py3-none-any.whl (74.8 kB view details)

Uploaded Nov 11, 2022 Python 3

File details

Details for the file metaerg-2.2.35.tar.gz.

File metadata

Download URL: metaerg-2.2.35.tar.gz
Upload date: Nov 11, 2022
Size: 64.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for metaerg-2.2.35.tar.gz
Algorithm	Hash digest
SHA256	`890b2497b1759acf1c4a46cfd430900dfe326ca49539225d10ea0b40674d1711`
MD5	`9c16f4ddf3aaabd7568d6b60ca48bda8`
BLAKE2b-256	`08539dcaed06d96ae043a5dfeb7bf0089eb4f1f519eef60a84460a82597b8e53`

See more details on using hashes here.

File details

Details for the file metaerg-2.2.35-py3-none-any.whl.

File metadata

Download URL: metaerg-2.2.35-py3-none-any.whl
Upload date: Nov 11, 2022
Size: 74.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for metaerg-2.2.35-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5724537e86064bb095a962b97bd739f97275d5d336e905c49f3fb0f34203e2c3`
MD5	`61e5f2ee332997be9aa63aac93cc8c4d`
BLAKE2b-256	`054b2ad414bbeb42fb072e9f7471a70d8c1051ce46f60a553f9c3987a62220e9`

See more details on using hashes here.

metaerg 2.2.35

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

metaerg.py, version 2.2.X

Usage:

Installation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes