A small python package to trace orthology neighborhood across feature files
Project description
Vicinator
What is Vicinator for?
Vicinator visualizes the microsynteny of grouped proteins (e.g. orthologs) across a large collection of genomes. As input, it requires a mapping of the genomes' proteins to the respective protein groups and a directory containing the genomes' feature files, i.e. files of the format *.gff or *_feature_table.txt.
What is Vicinator not for?
As stated above, Vicinator relies on a pre-computed grouping of proteins across genomes. It can not find these groups of genes for you.
Installation
Vicinator is written for Python 3.6+
It is recommended to install Vicinator inside a virtual environment, e.g. with venv:
python3 -m venv myenv
This activates the new environment called myenv. While activated, you can install the latest version via pip. The following command installs the latest version and all unmet requirements automatically.
pip install --upgrades vicinator
Requirements:
- ansi2html>=1.5.2
- colorama>=0.4.4
- ete3>=3.1.2
- pandas>=1.1.3
- importlib-metadata>=3.1.1
- setuptools-scm>=5.0.1
Options
python3 vicinator/vicinator.py --help
usage: Vicinator [-h] --tabular-ortholog-groups <orthology_table>
--feat-tables-dir <dir_path> --reference <file_path>
--centerprotein-accession <str> --extension-size <int>
[--tree <newick_tree_file_path>] [--outdir <dir_path>]
[--prefix <str>] [--outputlabel-map <file_path>]
[--nprocs <int>] [--force] [--version]
Track Microsynteny of target proteins and its orthologs across genomes.
required arguments:
--tabular-ortholog-groups <orthology_table>
path to mapping file with format
ortholog_group_id<tab>genome_id<tab>protein_seq_id
--feat-tables-dir <dir_path>
path to directory of *.feature_tables.txt or *.gff3
files that shall be screen
required arguments (neighborhood):
--reference <file_path>
path to a ncbi style feature table file that acts as a
reference
--centerprotein-accession <str>
unique identifier of the central gene of the window
--extension-size <int>
defines the #features that are co-checked to the left
and right of the centerprotein
optional arguments (output):
--tree <newick_tree_file_path>
path to newick tree that includes all taxa to be
screened
--outdir <dir_path> path to desired output directory
--prefix <str> if option is set, shows intergenic distances of genes
surrounding the center gene
--outputlabel-map <file_path>
Attempts to replace genome accessions in the outputs
with a replacement string. Requires a two-column map
file formatted like so: 'genome file accession' <tab>
'replacement string'
optional arguments (run):
--nprocs <int> Number of CPUs for parallel processing of genomes.
Default: Number of CPUs-1
--force if option is set, existing ortholog databases in the
output dir are ignored and will be overwritten
Input: Required Arguments
--tabular-ortholog-groups <orthology_table>
Vicinator requires a tab-separated three-column mapping of orthologs that is formatted like so:
group_id \tab genome_id \tab protein_id
--feat-tables-dir <dir_path>
Vicinator expects the path to a directory containing .gff format or _feature_table.txt files of all the genomes you want to trace the microsynteny in.
A recommended source for these files is NCBI RefSeq. In order for the mapping to work, the filenames should correspond to the genome_ids specified in the mapping file:
E.g. line 7: OG_2 genomeB protein_X011
triggers a search in a feature file named genomeB.gff or genomeB_genomic.gff or genomeB_feature_table.txt in the directory specified with--feat-tables-dir
. Effectively, it tries to locate the protein_X011 in this feature file.
--reference <file_path>
the path to a reference genome feature file where the center-protein accession must be found
--centerprotein-accession
& --extension-size <int>
Identifies the window of vicinity around a center-protein which is traced based on the findings in the reference genome.
Example Basic Usage
vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-size 3
Example Advanced Usage
When vicinator receives a phylogenetic tree (with genome_ids as leaf labels) it will trace the microsynteny in order of increasing phylogentic distance to the reference genome specified.
vicinator --tabular-ortholog-groups orthogenome_map.tsv --feat-tables-dir ./gff_dir --outdir ./results --reference gff_dir/MUSMU@10090@1.gff --centerprotein XP_006539605.1 --extension-size 3 --tree phylogeny.nwk
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for Vicinator-0.0.30-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cc27d1abcce45aeca102212b2243e3f5e4fdf829e2d86e92a5ee02185797cd4 |
|
MD5 | b9257d1cf9c12af5b848c15cf5556a72 |
|
BLAKE2b-256 | ed0a074fd6f23f3c23aa86d224eefb3d0c34d4591d64339690506b3e7930541d |