Skip to main content

FetchMGs extracts the 40 marker genes from genomes and metagenomes in an easy and accurate manner.

Project description

fetchMGs (1.0 - 1.2) is copyright (c) 2019 Shinichi Sunagawa and Daniel R Mende.

fetchMGs (1.3) - written by Chris Field

fetchMGs (>=2.0) - written by Hans-Joachim Ruscheweyh

Introduction

Phylogenetic markers are genes (and proteins) which can be used to reconstruct the phylogenetic history of different organisms. One classical phylogenetic marker is the 16S ribosomal RNA gene, which is often-used but is also known to be a sub-optimal phylogenetic marker for some organisms. Efforts to find a good set of protein coding phylogenetic marker genes (Ciccarelli et al., Science, 2006; Sorek et al., Science, 2007) lead to the identification of 40 universal single copy marker genes (MGs). These 40 marker genes occur in single copy in the vast majority of known organisms and they were used to successfully reconstruct a three domain phylogenetic tree (Ciccarelli et al., Science, 2006).

What the software does

The program fetchMGs was written to extract the 40 MGs from genomes and metagenomes in an easy and accurate manner. This is done by utilizing Hidden Markov Models (HMMs) trained on protein alignments of known members of the 40 MGs as well as calibrated cutoffs for each of the 40 MGs. Please note that these cutoffs are only accurate when using complete protein sequences as input files. The output of the program are the protein sequences of the identified proteins, as well as their nucleotide sequences, if the nucleotide sequences of all complete genes are given as an additional input.

Installation

FetchMGs and all its dependencies can be installed via pip and have been tested with Python 3.12.

$pip install fetchMGs

Input

Users can submit genes in protein space or (from v2.0 on) longer nucleotide sequences from assembled genomes/metagenomes.

Output

Per input sample (SAMPLE), fetchMGs will produce 3 output file:

  1. SAMPLE.fetchMGs.faa --> the marker genes in protein space
  2. SAMPLE.fetchMGs.fna --> the marker genes in nucleotide space
  3. SAMPLE.fetchMGs.scores --> A link between marker genes and their bitscores

Full program help

$fetchMGs

Program: FetchMGs extracts the 40
    single copy universal marker genes (decribed in Ciccarelli et al.,
    Science, 2006 and Sorek et al., Science, 2007) from genomes and metagenomes
    in an easy and accurate manner.

    fetchMGs <command> [options]

      extraction     extract marker genes from sequences

    Type fetchMGs <command> to print the help menu for a specific command


Extraction

$fetchMGs extraction

Program: FetchMGs extracts the 40
    single copy universal marker genes (decribed in Ciccarelli et al.,
    Science, 2006 and Sorek et al., Science, 2007) from genomes and metagenomes
    in an easy and accurate manner.

    fetchMGs extraction [options]

    Positional arguments:
         FILE[ FILE]  Input file(s) - plain or gzipped. Can be either:
                            - 1-n genome assembly file(s), requires -m genome. Will
                                call genes before marker gene extraction.
                            - 1-n metagenome assembly file(s), requires -m metagenome. Will
                                call genes before marker gene extraction.
                            - 1-n gene file(s) in protein space, requires -m gene. nucleotide
                                sequences can be provided with -d parameter
                            - 1 text file with one line per input file. Requires 
                                -m parameter to enable "metagenome", "genome" or "gene" mode.
                                In "gene" mode another text file with samples in the
                                same order can be provided with -d parameter. 
    Input options:
       -d FILE[ FILE] Nucleotide files associated with protein files in -i. Same order as
                        files in -i required. Enabled only in -m gene mode. Can be either a 
                        list of files or a text file with one line per input file. 

    Output options:
       -o   FOLDER    Output folder for marker genes

    Algorithm options:
       -m STR         Mode of extraction Values: [gene, genome, metagenome]

       -t INT         Number of threads. Default=[1]
       -v             Report only the very best hit per COG and input file. Only useful
                        if input files contain genes from genomes or are genomes.

Changelog

2.0.1

  • Changed automatic detection of input files to amino for pyhmmer
  • allow users to submit a file with a list of input files for positional and -d parameters

2.0.0

  • Calibration mode was removed
  • hmmer and prodigal were replaced with pyhmmer and pyrodigal
  • Input is more flexible. Users can now submit multiple files and use different input formats:
    • Genes (-m gene)
    • Genomes (-m genome)
    • Metagenomes (-m metagenome)
  • Output folder was cleaned up. Only one nucleotide and one protein file are generated compared to 40 in previous versions

1.3.0

  • FetchMGs was ported from Perl to Python 3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetchmgs-2.0.1.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fetchmgs-2.0.1-py3-none-any.whl (1.8 MB view details)

Uploaded Python 3

File details

Details for the file fetchmgs-2.0.1.tar.gz.

File metadata

  • Download URL: fetchmgs-2.0.1.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for fetchmgs-2.0.1.tar.gz
Algorithm Hash digest
SHA256 cbd4e3001b25f760f45b75f0e1726bf9842c21d884c66d4ff556c160035d37da
MD5 fb07943110d7d87fe7249c86bab7ce62
BLAKE2b-256 8b455cadec20ecbe8613bb80c97c5e92591545855442b8e0302659efa7e42c9f

See more details on using hashes here.

File details

Details for the file fetchmgs-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: fetchmgs-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for fetchmgs-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b61c2bf47b9060676a29039c2aa9a78d5f6e09f73411955eef779592ad7e6b95
MD5 5cffe78dc2bb66d8cc66b6b6cc5c5d16
BLAKE2b-256 7b33c46868a3351284b359c3b9f1648d814a863ba1439ea7fbadc4f4118d34b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page