Skip to main content

select gene families as markers for microbial phylogenomics

Project description

TMarSel

TMarSel is a tool for a Tailored Selection of gene families as Markers for microbial phylogenomics.

Table of contents

Flags

  • -h Display help message.
  • -i [required] Either a single annotation of ORFs into gene families file OR a direcotry containing multiple annotation files. File(s) contains three columns: orf|bit_score|gene_family.
  • -o [required] Output directory to save the ORFs and statistics of each marker.
  • -raw [required] IF input file(s) contain raw annotations from functional databases, which contains multiple columns depending on the database.
  • -db [required] IF input contains the raw annotations set in -raw. Name of the database used for genome annotation (eggnog or kegg).
  • -k Number of markers to select (default is 50).
  • -min_markers Minimum number of markers per genome. Can be a percentage or a number (default is 1). Genomes with fewer markers than the indicated value are discarded.
  • -th Threshold for filtering copies of each gene family per genome (default is 1.0) Retain the ORFs within -th of the maximum bit score for each gene family and genome. Lower values (e.g. 0.0) retains all ORFs, whereas higher values (e.g. 1.0) retains only the ORF with the highest bit score.
  • -p Exponent of the power mean (cost function) used to select markers (default is 0.0). We recommend not changing this value unless you are familiar with the method. Default value yields the optimal combination of markers.

Outputs

  1. k files containing the ORFs, genome and file of origin for each marker (see below). Files are saved to ./output_dir/orfs.
orf genome file
G000006605_1748 G000006605 kofamscan_wol2_example.tsv
G000006725_378 G000006725 kofamscan_wol2_example.tsv
... ... ...
  1. Statistics. Files are saved to ./output_dir/statistics
  • Number of markers per genome (see below). A given marker can contain more than one ORF per genome, therefore we provide the number of different markers (k) and the total number of markers. The details column is ; separated. Each item indicates the marker name and the number of ORFs (i.e. copies) in the genome.
genome number_of_different_markers total_number of markers details
G000006605 10 10 K01889:1;K01866:1; ...
G000006725 9 10 K02358:2;K01872:1; ...
... ... ... ...
  • Number of genomes per marker (see below). We provide the number of genomes containing the marker. The details column contains the genome name and the number of ORFs of the marker.
marker number_of_genomes details
K01409 1509 G000093065:2;G900097235:2;G002074035:2; ...
K01866 1508 G900097235:2;G001941465:2;G000006605:1; ...

Installation

  • pip

Basic usage

 python TMarSel.py -i input_file_or_dir -o output_dir

After installation, type python TMarSel.py -h to learn all the options.

Examples

We provide multiple examples to showcase the usage of TMarSel. Data can be downloaded as explained in files.

1. Annotations of 1,510 genomes from the Web of Life 2 database

  • EggNOG annotations contained in a single file with three columns orf|bit_score|gene_family. See annotation for formating the raw annotation files.
python TMarSel/TMarSel.py \
    -i    data/wol2/emapper_wol2_example.tsv \
    -o    out/wol2 
  • KEGG annotations contained in a single file with three columns orf|bit_score|gene_family. See annotation for formating the raw annotation files.
python TMarSel/TMarSel.py \
    -i    data/wol2/kofamscan_wol2_example.tsv \
    -o    out/wol2 

2. Annotations of 793 metagenome-assembled genomes (MAGs) from the Earth Microbiome Project

  • EggNOG annotations contained multiple files with three columns orf|bit_score|gene_family. See annotation for formating the raw annotation files.
python TMarSel/TMarSel.py \
    -i data/emp/eggnog_format \
    -o        out/emp
  • EggNOG annotations contained in multiple files with raw annotations.
python TMarSel/TMarSel.py \
    -i data/emp/eggnog \
    -o        out/emp \
    -db          eggnog \
    -raw
  • KEGG annotations contained in multiple files with three columns orf|bit_score|gene_family. See annotation for formating the raw annotation files.
python TMarSel/TMarSel.py \
    -i data/emp/kegg_format \
    -o        out/emp
  • KEGG annotations contained in multiple files with raw annotations.
python TMarSel/TMarSel.py \
    -i data/emp/kegg \
    -o        out/emp \
    -db          kegg \
    -raw

Citation

The current version of TMarSel is described in

  • x.x

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmarsel-0.1.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tmarsel-0.1.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file tmarsel-0.1.0.tar.gz.

File metadata

  • Download URL: tmarsel-0.1.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for tmarsel-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9c4d0d25b1b1189cdeffaaecb4f6a7a41be440b0925a709c1f8053ca2e48382b
MD5 15b7e6b6b58c4f7e9d7bca93ebdf2b1c
BLAKE2b-256 af87c2ab38e174bc817d317c6b2d0972fe20e8cc77585d6ec3b2cde7e56ffd97

See more details on using hashes here.

File details

Details for the file tmarsel-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tmarsel-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for tmarsel-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4938cb66c502e8509d1a387ff5ea046fbf2fb141c74c8b1274eb45a4fc2d43df
MD5 9a79724dca4de9fe83ce749f3759a95c
BLAKE2b-256 9677ba78aa8afef4da76bada016bfa0f8abacdd005222f3867a11c1f34cd3d13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page