Skip to main content

select gene families as markers for microbial phylogenomics

Project description

TMarSel

TMarSel is a tool for a Tailored Selection of gene families as Markers for microbial phylogenomics.

Table of contents

Flags

  • -h Display help message.
  • -i [required] Either a single annotation of ORFs into gene families file OR a direcotry containing multiple annotation files. File(s) contains three columns: orf|bit_score|gene_family.
  • -o [required] Output directory to save the ORFs and statistics of each marker.
  • -raw [required] IF input file(s) contain raw annotations from functional databases, which contains multiple columns depending on the database.
  • -db [required] IF input contains the raw annotations set in -raw. Name of the database used for genome annotation (eggnog or kegg).
  • -k Number of markers to select (default is 50).
  • -min_markers Minimum number of markers per genome. Can be a percentage or a number (default is 1). Genomes with fewer markers than the indicated value are discarded.
  • -th Threshold for filtering copies of each gene family per genome (default is 1.0) Retain the ORFs within -th of the maximum bit score for each gene family and genome. Lower values (e.g. 0.0) retains all ORFs, whereas higher values (e.g. 1.0) retains only the ORF with the highest bit score.
  • -p Exponent of the power mean (cost function) used to select markers (default is 0.0). We recommend not changing this value unless you are familiar with the method. Default value yields the optimal combination of markers.

Outputs

  1. k files containing the ORFs, genome and file of origin for each marker (see below). Files are saved to ./output_dir/orfs.
orf genome file
G000006605_1748 G000006605 kofamscan_wol2_example.tsv
G000006725_378 G000006725 kofamscan_wol2_example.tsv
... ... ...
  1. Statistics. Files are saved to ./output_dir/statistics
  • Number of markers per genome (see below). A given marker can contain more than one ORF per genome, therefore we provide the number of different markers (k) and the total number of markers. The details column is ; separated. Each item indicates the marker name and the number of ORFs (i.e. copies) in the genome.
genome number_of_different_markers total_number of markers details
G000006605 10 10 K01889:1;K01866:1; ...
G000006725 9 10 K02358:2;K01872:1; ...
... ... ... ...
  • Number of genomes per marker (see below). We provide the number of genomes containing the marker. The details column contains the genome name and the number of ORFs of the marker.
marker number_of_genomes details
K01409 1509 G000093065:2;G900097235:2;G002074035:2; ...
K01866 1508 G900097235:2;G001941465:2;G000006605:1; ...

Installation

  • pip
pip install TMarSel

Basic usage

 tmarsel -i input_file_or_dir -o output_dir

After installation, type tmarsel -h to learn all the options.

Examples

We provide multiple examples to showcase the usage of TMarSel. Data can be downloaded as explained in files.

1. Annotations of 1,510 genomes from the Web of Life 2 database

  • EggNOG annotations contained in a single file with three columns orf|bit_score|gene_family. See annotation for formating the raw annotation files.
tmarsel \
    -i    data/wol2/emapper_wol2_example.tsv \
    -o    out/wol2 
  • KEGG annotations contained in a single file with three columns orf|bit_score|gene_family. See annotation for formating the raw annotation files.
tmarsel \
    -i    data/wol2/kofamscan_wol2_example.tsv \
    -o    out/wol2 

2. Annotations of 793 metagenome-assembled genomes (MAGs) from the Earth Microbiome Project

  • EggNOG annotations contained multiple files with three columns orf|bit_score|gene_family. See annotation for formating the raw annotation files.
tmarsel \
    -i data/emp/eggnog_format \
    -o        out/emp
  • EggNOG annotations contained in multiple files with raw annotations.
tmarsel \
    -i data/emp/eggnog \
    -o        out/emp \
    -db          eggnog \
    -raw
  • KEGG annotations contained in multiple files with three columns orf|bit_score|gene_family. See annotation for formating the raw annotation files.
tmarsel \
    -i data/emp/kegg_format \
    -o        out/emp
  • KEGG annotations contained in multiple files with raw annotations.
tmarsel \
    -i data/emp/kegg \
    -o        out/emp \
    -db          kegg \
    -raw

Citation

The current version of TMarSel is described in

  • x.x

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmarsel-0.1.1.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tmarsel-0.1.1-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file tmarsel-0.1.1.tar.gz.

File metadata

  • Download URL: tmarsel-0.1.1.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for tmarsel-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5ac15c4211bbe41e8e6bef74d60abe3a8e6d784ca390d6e2db08deda984c5d03
MD5 dc8ea765ea90d6f981c3a4aa26588d11
BLAKE2b-256 e72f6464718ade25a9efb1069162408be83c5fa6c6d2502d1e6a01a9713b7163

See more details on using hashes here.

File details

Details for the file tmarsel-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tmarsel-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for tmarsel-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7d38a5746a16d2d95dc5bc8d10fd3245fec0f5a1e6697524b4aba8ea0df2904b
MD5 4512c8b758864b2efa4dffa613dc6ef7
BLAKE2b-256 31c71c6766797771b96dc1a83f202b002e54641636be378d3d6944a3fa1c2da0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page