Prediction of lineage-specific gain and loss of sequence elements using phylogenetic maximum parsimony.
Project description
Host | Downloads |
---|---|
PyPI | |
conda |
mapGL
Prediction of lineage-specific gain and loss of genomic sequence elements based on phylogenetic maximum parsimony.
Label genomic regions as orthologous, gained in the query species, or lost in the target species, based on inferred presence/absence in the most-recent common ancestor (MRCA). Chained alignment files are used to map features from query to target and one or more outgroup species. Features that map directly from query to target are labeled as orthologs, and ortholgous coordinates in the target species are given in the output. Non-mapping features are assigned as gains or losses based on a maximum-parsimony algorithm predicting presence or absence in the MRCA.
Based on bnMapper.py, by Ogert Denas (James Taylor lab):
- https://github.com/bxlab/bx-python/blob/master/scripts/bnMapper.py
- https://travis-ci.org/bxlab/bx-python
Dependencies
numpy bx-python cython six
Installation
We recommend installing with conda, into a new environment:
conda create -n mapGL --channel conda-forge --channel bioconda python=3.7 numpy bx-python cython six mapGL
conda activate mapGL
To install within an existing conda environment:
conda install -c bioconda mapgl
Install with pip:
pip install mapGL
Installation from the github repository is not recommended. However, if you must, follow the steps below:
- Install all dependencies listed above
- Clone the repository
- Add the mapGL/map_GL directory to your local path and python module search path
Usage
mapGL.py [-h] [-o FILE] [-t FLOAT] [-g GAP] [-v {info,debug,silent}] [-k] input tree qname tname alignments [alignments ...]
Required Arguments
Argument | Description |
---|---|
input | Input regions to process. Should be in standard bed format. Only the first four bed fields will be used. |
tree | Phylogenetic tree describing relationships of query and target species to outgroups. Must be in standard Newick format. Branch lengths are optional, and will be ignored. |
qname | Name of the query species. Regions from this species will be mapped to target species coordinates. |
tname | Name of the target species. Regions from the query species will be mapped to coordinates from this species. |
alignments | Alignment files (.chain or .pkl): One for the target species and one per outgroup species. Files should be named according to the convention: qname.tname[...].chain.gz, where qname is the query species name and tname is the name of the target/outgroup species. Names used for qname and tname must match names used in the phylogenetic tree. |
Options
Option | Description |
---|---|
-h, --help | Show help message and exit. |
-o FILE, --output FILE | Output file. (default: stdout) |
-t FLOAT, --threshold FLOAT | Mapping threshold i.e., |elem| * threshold <= |mapped_elem|. Default = 0.0 -- equivalent to accepting a single-base overlap. On the other end of the spectrum, setting this value to 1 is equivalent to only accepting full-length overlaps. (default: 0.0) |
-g GAP, --gap GAP | Ignore elements with an insertion/deletion of this or bigger size. Ignore elements with an insertion/deletion of this or bigger size. Using the default value (-1) will allow gaps of any size. (default: -1) |
-v {info,debug,silent}, --verbose {info,debug,silent} | Verbosity level (default: info) |
-d, --drop_split | Follow the bnMapper convention of silently dropping elements that span multiple chains, rather than the liftOver mapping convention for split alignments: keep elements that span multiple chains and report the longest aligned segment. This is not recommended, as it may lead to spurious gain/loss predictions for orthologous elements that happen to be split across chains due to chromosomal rearrangements, etc... (default: False) |
-i {BED,narrowPeak}, --in_format {BED,narrowPeak} | Input file format. (default: BED) |
-f, --full_labels | Predict gain/loss events on the whole tree, not just branches leading to query and target species. (default: False) |
-n, --no_prune | Do not attempt to disambiguate the root state to resolve ambiguous gain/loss predictions. Instead, label affected features as 'ambiguous'. (default: False) |
-p, --priority {gain,loss} | When resolving ambiguous trees, prioritize sequence gain or sequence loss. This can be thought of as assigning a lower cost to sequence insertions relative to deletions, or vice-versa. When priority='gain', ambiguity is resolved by assigning 0 state to the root node, such that sequence presence on a descendant branch will be interpreted as a gain. When priority='loss', ambiguity is resolved by asssigning state 1 to the root node, such that sequence absence in a descendant node is interpreted as a sequence loss. (default: gain) |
Output
Predictions are reported in tab-delimited format with the first four columns following the BED4 convention. The predicted evolutionary history (i.e., ortholog, gain in query, or loss in target) is reported in the "status" column. The final three columns contain the mapped location, in target coordinates, of mapped (ortholog) elements.
Column | Description |
---|---|
chrom | Chromosome on which the query element is located. |
start | Start position on query chromosome. |
end | End position on query chromosome. |
name | Element name or ID. |
peak | Peak location (narrowPeak input) or element midpoint (BED input) |
status | Predicted phylogenetic history: ortholog, gain_qname, loss_tname, or ambiguous, if --no_prune is used. If --full_labels is used, this may include additional loss/gain events on other branches, in a comma-delimited list format. |
mapped chrom | For mapped (ortholog) elements, the chromosome on which the mapped element is located, in target coordinates. |
mapped start | For mapped (ortholog) elements, the start position on the target chromosome on which the mapped element is located. |
mapped end | For mapped (ortholog) elements, the end position on the target chromosome on which the mapped element is located. |
mapped_peak | For mapped (ortholog) elements, the mapped peak position (narrowPeak input) or mapped element midpoint (BED input). |
Citation
MapGL: Inferring evolutionary gain and loss of short genomic sequence features by phylogenetic maximum parsimony Adam G Diehl, Alan P Boyle bioRxiv 827907; doi: https://doi.org/10.1101/827907 https://www.biorxiv.org/content/10.1101/827907v1
Copyright 2018, Adam Diehl (adadiehl@umich.edu), Boyle Lab, University of Michigan
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mapGL-1.3.1.tar.gz
.
File metadata
- Download URL: mapGL-1.3.1.tar.gz
- Upload date:
- Size: 21.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200712 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 386cec7ffdd55431793e9988b2c56489b71d019e8ca01a816a29c512aa09035c |
|
MD5 | 67bfeb7a057b81c4bf897b7b60c1ffeb |
|
BLAKE2b-256 | 8d52efe1bbb53fe28646b56ff9f08afef032abd0a275ee23a344dc84e76fe77d |
File details
Details for the file mapGL-1.3.1-py3-none-any.whl
.
File metadata
- Download URL: mapGL-1.3.1-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200712 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa0ed64115b5d70ec4e20ebdbd9ff898ccea7fcbab3992d6e4d08b815bec05e7 |
|
MD5 | b4003c18cdc1809f7113afba34644dc9 |
|
BLAKE2b-256 | 57709542002cbf1fdedcebde0b6d3e478b831d749d622a6c2f9988dc9ccba8ed |