Skip to main content

Prediction of lineage-specific gain and loss of sequence elements using phylogenetic maximum parsimony.

Project description

PyPI version Conda GitHub

Host Downloads
PyPI Downloads
conda Conda

mapGL

Prediction of lineage-specific gain and loss of genomic sequence elements based on phylogenetic maximum parsimony.

Label genomic regions as orthologous, gained in the query species, or lost in the target species, based on inferred presence/absence in the most-recent common ancestor (MRCA). Chained alignment files are used to map features from query to target and one or more outgroup species. Features that map directly from query to target are labeled as orthologs, and ortholgous coordinates in the target species are given in the output. Non-mapping features are assigned as gains or losses based on a maximum-parsimony algorithm predicting presence or absence in the MRCA.

Based on bnMapper.py, by Ogert Denas (James Taylor lab):

Dependencies

numpy bx-python cython six

Usage

mapGL.py [-h] [-o FILE] [-t FLOAT] [-g GAP] [-v {info,debug,silent}] [-k] input tree qname tname alignments [alignments ...]

Required Arguments

Argument Description
input Input regions to process. Should be in standard bed format. Only the first four bed fields will be used.
tree Phylogenetic tree describing relationships of query and target species to outgroups. Must be in standard Newick format. Branch lengths are optional, and will be ignored.
qname Name of the query species. Regions from this species will be mapped to target species coordinates.
tname Name of the target species. Regions from the query species will be mapped to coordinates from this species.
alignments Alignment files (.chain or .pkl): One for the target species and one per outgroup species. Files should be named according to the convention: qname.tname[...].chain.gz, where qname is the query species name and tname is the name of the target/outgroup species. Names used for qname and tname must match names used in the phylogenetic tree.

Options

Option Description
-h, --help Show help message and exit.
-o FILE, --output FILE Output file. (default: stdout)
-t FLOAT, --threshold FLOAT Mapping threshold i.e.,
-g GAP, --gap GAP Ignore elements with an insertion/deletion of this or bigger size. Ignore elements with an insertion/deletion of this or bigger size. Using the default value (-1) will allow gaps of any size. (default: -1)
-v {info,debug,silent}, --verbose {info,debug,silent} Verbosity level (default: info)
-d, --drop_split Follow the bnMapper convention of silently dropping elements that span multiple chains, rather than the liftOver mapping convention for split alignments: keep elements that span multiple chains and report the longest aligned segment. This is not recommended, as it may lead to spurious gain/loss predictions for orthologous elements that happen to be split across chains due to chromosomal rearrangements, etc... (default: False)
-i {BED,narrowPeak}, --in_format {BED,narrowPeak} Input file format. (default: BED)
-f, --full_labels Predict gain/loss events on the whole tree, not just branches leading to query and target species. (default: False)
-n, --no_prune Do not use pruned tree to resolve ambiguous gain/loss predictions. Instead, these will be labelled 'ambiguous'. (default: False)

Output

Predictions are reported in tab-delimited format with the first four columns following the BED4 convention. The predicted evolutionary history (i.e., ortholog, gain in query, or loss in target) is reported in the "status" column. The final three columns contain the mapped location, in target coordinates, of mapped (ortholog) elements.

Column Description
chrom Chromosome on which the query element is located.
start Start position on query chromosome.
end End position on query chromosome.
name Element name or ID.
peak Peak location (narrowPeak input) or element midpoint (BED input)
status Predicted phylogenetic history: ortholog, gain_qname, loss_tname, or ambiguous, if --no_prune is used. If --full_labels is used, this may include additional loss/gain events on other branches, in a comma-delimited list format.
mapped chrom For mapped (ortholog) elements, the chromosome on which the mapped element is located, in target coordinates.
mapped start For mapped (ortholog) elements, the start position on the target chromosome on which the mapped element is located.
mapped end For mapped (ortholog) elements, the end position on the target chromosome on which the mapped element is located.
mapped_peak For mapped (ortholog) elements, the mapped peak position (narrowPeak input) or mapped element midpoint (BED input).

Citation

MapGL: Inferring evolutionary gain and loss of short genomic sequence features by phylogenetic maximum parsimony Adam G Diehl, Alan P Boyle bioRxiv 827907; doi: https://doi.org/10.1101/827907 https://www.biorxiv.org/content/10.1101/827907v1

Copyright 2018, Adam Diehl (adadiehl@umich.edu), Boyle Lab, University of Michigan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mapGL-1.2.0.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

mapGL-1.2.0-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file mapGL-1.2.0.tar.gz.

File metadata

  • Download URL: mapGL-1.2.0.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1.post20200622 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for mapGL-1.2.0.tar.gz
Algorithm Hash digest
SHA256 abac187a886b88b615688d5df22a75db76aadaae5414f87a74483659a4e91058
MD5 f62fb492c317ca08d36c78e1e9159a74
BLAKE2b-256 610e8ba5b046f09ea0c0207c3933132c60ea8331c37b34fbee16ae13885e6fe7

See more details on using hashes here.

File details

Details for the file mapGL-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: mapGL-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1.post20200622 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for mapGL-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f9b54956fad4b84fdc63b0531ed914164d477fc8ee993c13f277f467c96669a
MD5 01253dc918d88b19a17f3e4439a0fd21
BLAKE2b-256 c9a0b62ccfba3c31cc2d82ad452510bf159182692dfeae751da808432c04381a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page