A tool to analyse gene family evolution from orthoxml
Project description
WARNING:
Family-Analyzer is outdated and have been replace by pyHam available at https://github.com/DessimozLab/pyham .
Family-Analyzer: summarize gene family evolution from orthoxml
Motivation
Family-Analyzer is a tool to further analyze the hierarchical orthologous groups from an orthoXML file. More informations on the schema of orthoxml and some examples are available at http://orthoxml.org.
Family-Analyzer report to the user a summary of the evolutionary history acting on the gene families. The summary reports with respect to one or two levels taxonomic levels what happens after respectively between the specified taxonimic levels which genes were maintained, got lost, duplicated, were gained in that period.
Installation
Family-Analyzer is written in python3, with little external dependencies, i.e. currently only the lxml library. The setup script should resolve these dependencies automatically. Consider using pip to install the package directly from a checked out git repo
pip install -e </path/to/family-analyzer-repo/>
Running Family-Analyzer
So far running the family analyzer on a specific dataset is relatively easy. The main entry point for it is the ‘main’ section in familyanalyzer/familyanalyzer.py
If this script is called with -h as argument, it gives a short description of the required and optional arguments and what they are used for. Here is what the usage output reports as of now. Since this is still work in progress, make sure the current usage did not change.
python familyanalyzer/familyanalyzer.py -h
usage: familyanalyzer.py [-h] [--xreftag XREFTAG] [--show_levels] [-r]
[--taxonomy TAXONOMY] [--propagate_top]
[--show_taxonomy]
[--store_augmented_xml STORE_AUGMENTED_XML]
[--compare_second_level COMPARE_SECOND_LEVEL]
orthoxml level species [species ...]
Analyze Hierarchical OrthoXML families.
positional arguments:
orthoxml path to orthoxml file to be analyzed
level taxonomic level at which analysis should be done
species (list of) species to be analyzed. Note that only genes
of the selected species are reported. In order for the
output to make sense, the selected species all must be
part of the linages specified in 'level' (and
--compare_second_level).
optional arguments:
-h, --help show this help message and exit
--xreftag XREFTAG xref tag of genes to report. OrthoXML allows to store
multiple ids and xref annotations per gene as
attributes in the species section. If not set, the
internal (purely numerical) ids are reported.
--show_levels print the levels and species found in the orthoXML
file and quit
-r, --use-recursion DEPRECATED: Use recursion to sample families that are
a subset of the query
--taxonomy TAXONOMY Taxonomy used to reconstruct intermediate levels. Has
to be either 'implicit' (default) or a path to a file
in Newick format. The taxonomy might be
multifurcating. If set to 'implicit', the taxonomy is
extracted from the input OrthoXML file. The orthoXML
level do not have to cover all the levels for all
families. In order to infer gene losses Family-
Analyzer needs to infer these skipped levels and
reconcile each family with the complete taxonomy.
--propagate_top propagate taxonomy levels up to the toplevel. As an
illustration, consider a gene family in an eukaryotic
analysis that has only mammalian genes. Its topmost
taxonomic level will therefor be 'Mammalia' and an
ancestral gene was gained at that level. However, if
'--propagete-top' is set, the family is assumed to
have already be present in the topmost taxonomic
level, i.e. Eukaryota in this example, and non-
mammalian species have all lost this gene.
--show_taxonomy write the taxonomy used to standard out.
--store_augmented_xml STORE_AUGMENTED_XML
filename to which the input orthoxml file with
augmented annotations is written. The augmented
annotations include for example the additional
taxonomic levels of orthologGroup and unique HOG IDs.
--compare_second_level COMPARE_SECOND_LEVEL
Compare secondary level with primary one, i.e. report
what happend between the secondary and primary level
to the individual histories. Note that the Second
level needs to be younger than the primary.
Code organisation
- OrthoXMLParser: class which holds the orthoxml file and gives access to its
data and keeps internal mappings to speed up lookups.
- Taxonomy: class wich provides a basic navigation through the species taxonomy.
Objects will be constructed using the TaxonomyFactory and can be either based on the orthoxml or a newick tree.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file familyanalyzer-0.7.3.tar.gz
.
File metadata
- Download URL: familyanalyzer-0.7.3.tar.gz
- Upload date:
- Size: 35.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69c219d29611c974903e120e0d6762e7dea5874d9193cb5496edd61410a08324 |
|
MD5 | 09aae437aeef36ce2d6a46b23549ae32 |
|
BLAKE2b-256 | dcf953b8bd990588eeddee44804558da45f38388ceed5e65780f01dbf75bdb71 |
File details
Details for the file familyanalyzer-0.7.3-py3-none-any.whl
.
File metadata
- Download URL: familyanalyzer-0.7.3-py3-none-any.whl
- Upload date:
- Size: 32.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 510c80df75164b9f5d59abec6d8ac89a89815799512ea099a035de4e87d0109a |
|
MD5 | 62c1ddc563f1a5ecde83d677f837cecb |
|
BLAKE2b-256 | 32cb199d1f261e332508e976d7f13e65b29e1e47f9f3842bd686e42d2b8b2577 |