select gene families as markers for microbial phylogenomics
Project description
TMarSel
TMarSel is a tool for a Tailored Selection of gene families as Markers for microbial phylogenomics.
Table of contents
Flags
-hDisplay help message.-i[required] Either a single annotation of ORFs into gene families file OR a direcotry containing multiple annotation files. File(s) contains three columns:orf|bit_score|gene_family.-o[required] Output directory to save the ORFs and statistics of each marker.-raw[required] IF input file(s) contain raw annotations from functional databases, which contains multiple columns depending on the database.-db[required] IF input contains the raw annotations set in-raw. Name of the database used for genome annotation (eggnogorkegg).-kNumber of markers to select (default is 50).-min_markersMinimum number of markers per genome. Can be a percentage or a number (default is 1). Genomes with fewer markers than the indicated value are discarded.-thThreshold for filtering copies of each gene family per genome (default is 1.0) Retain the ORFs within-thof the maximum bit score for each gene family and genome. Lower values (e.g. 0.0) retains all ORFs, whereas higher values (e.g. 1.0) retains only the ORF with the highest bit score.-pExponent of the power mean (cost function) used to select markers (default is 0.0). We recommend not changing this value unless you are familiar with the method. Default value yields the optimal combination of markers.
Outputs
kfiles containing the ORFs, genome and file of origin for each marker (see below). Files are saved to./output_dir/orfs.
| orf | genome | file |
|---|---|---|
| G000006605_1748 | G000006605 | kofamscan_wol2_example.tsv |
| G000006725_378 | G000006725 | kofamscan_wol2_example.tsv |
| ... | ... | ... |
- Statistics. Files are saved to
./output_dir/statistics
- Number of markers per genome (see below). A given marker can contain more than one ORF per genome, therefore we provide the number of different markers (
k) and the total number of markers. Thedetailscolumn is;separated. Each item indicates the marker name and the number of ORFs (i.e. copies) in the genome.
| genome | number_of_different_markers | total_number of markers | details |
|---|---|---|---|
| G000006605 | 10 | 10 | K01889:1;K01866:1; ... |
| G000006725 | 9 | 10 | K02358:2;K01872:1; ... |
| ... | ... | ... | ... |
- Number of genomes per marker (see below). We provide the number of genomes containing the marker. The
detailscolumn contains the genome name and the number of ORFs of the marker.
| marker | number_of_genomes | details |
|---|---|---|
| K01409 | 1509 | G000093065:2;G900097235:2;G002074035:2; ... |
| K01866 | 1508 | G900097235:2;G001941465:2;G000006605:1; ... |
Installation
- pip
pip install TMarSel
Basic usage
tmarsel -i input_file_or_dir -o output_dir
After installation, type tmarsel -h to learn all the options.
Examples
We provide multiple examples to showcase the usage of TMarSel. Data can be downloaded as explained in files.
1. Annotations of 1,510 genomes from the Web of Life 2 database
- EggNOG annotations contained in a single file with three columns
orf|bit_score|gene_family. See annotation for formating the raw annotation files.
tmarsel \
-i data/wol2/emapper_wol2_example.tsv \
-o out/wol2
- KEGG annotations contained in a single file with three columns
orf|bit_score|gene_family. See annotation for formating the raw annotation files.
tmarsel \
-i data/wol2/kofamscan_wol2_example.tsv \
-o out/wol2
2. Annotations of 793 metagenome-assembled genomes (MAGs) from the Earth Microbiome Project
- EggNOG annotations contained multiple files with three columns
orf|bit_score|gene_family. See annotation for formating the raw annotation files.
tmarsel \
-i data/emp/eggnog_format \
-o out/emp
- EggNOG annotations contained in multiple files with raw annotations.
tmarsel \
-i data/emp/eggnog \
-o out/emp \
-db eggnog \
-raw
- KEGG annotations contained in multiple files with three columns
orf|bit_score|gene_family. See annotation for formating the raw annotation files.
tmarsel \
-i data/emp/kegg_format \
-o out/emp
- KEGG annotations contained in multiple files with raw annotations.
tmarsel \
-i data/emp/kegg \
-o out/emp \
-db kegg \
-raw
Citation
The current version of TMarSel is described in
- x.x
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tmarsel-0.1.1.tar.gz.
File metadata
- Download URL: tmarsel-0.1.1.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ac15c4211bbe41e8e6bef74d60abe3a8e6d784ca390d6e2db08deda984c5d03
|
|
| MD5 |
dc8ea765ea90d6f981c3a4aa26588d11
|
|
| BLAKE2b-256 |
e72f6464718ade25a9efb1069162408be83c5fa6c6d2502d1e6a01a9713b7163
|
File details
Details for the file tmarsel-0.1.1-py3-none-any.whl.
File metadata
- Download URL: tmarsel-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d38a5746a16d2d95dc5bc8d10fd3245fec0f5a1e6697524b4aba8ea0df2904b
|
|
| MD5 |
4512c8b758864b2efa4dffa613dc6ef7
|
|
| BLAKE2b-256 |
31c71c6766797771b96dc1a83f202b002e54641636be378d3d6944a3fa1c2da0
|