Skip to main content

Dynamic core ortholog compilation tool

Project description

dcc2

PyPI version License: GPL v3 Build Status Github Build

dcc2 is a tool for compiling core set data for h1s using predicted orthologs from OMA, both OMA-browser and OMA-standalone. Outputs of this tool are 3 (optional 4) folders required for a HaMStR run, including (1) core_orthologs (comprises of OMA orthologous group - OG, or OMA pairs - OP. Each OG/OP has its own directory, where a multiple fasta file and a corresponding profile HMM can be found), (2) genome_dir (contains gene sets of taxa, from which the orthologs are originated), (3) blast_dir (holds the blast databases of those gene sets within genome_dir), and an optional (4) weight_dir (contains feature architecure annotations of all gene sets).

Table of Contents

How to install

dcc2 is distributed as a python package called dcc2. It is compatible with Python ≥ v3.7.

Install the dcc2 package

You can install dcc2 using pip:

python3 -m pip install dcc2

or, in case you do not have admin rights, and don't use package systems like Anaconda to manage environments you need to use the --user option:

python3 -m pip install --user dcc2

and then add the following line to the end of your ~/.bashrc or ~/.bash_profile file, restart the current terminal to apply the change (or type source ~/.bashrc):

export PATH=$HOME/.local/bin:$PATH

Setup dcc2

After installing dcc2, you need to run the prepare script to download and parse required OMA browser data.

You can do it by running this command

dcc2.prepare -o /output/path/for/oma/data

Usage

For parsing OMA orthologs by using an OMA group ID:

dcc2.parseOmaById -g 1 -n HUMAN,THEAM,DESM0 -o /output/path/ -j jobName --cpus 8

Or using list of OMA taxa:

dcc2.parseOmaBySpec -n HUMAN,ECOLI,YEAST -o /output/path/ -j jobName --annoFas --cpus 8

If only 2 OMA taxa are given, you can choose to use OMA pairs instead of OMA groups:

dcc2.parseOmaBySpec -n HUMAN,ECOLI -t pair -o /output/path/ -j jobName --annoFAS --cpus 8

For parsing an output from OMA-standalone, dcc2 requires:

  • the output orthoXML file from OMA,
  • a taxon mapping file in tab-delimited format containing 3 columns <NCBI taxon ID> <Original taxon name> <Abbr. taxon name>, where original taxon name is the name that is written in the orthoXML input file and abbr. taxon name is its abbreviation species code (for example: HOMSA for Homo sapiens)
  • protein set of included taxa. This can be either given as a folder, or automatically downloaded from OMA database
dcc2.parseOrthoxml -i input.orthoxml -m mapping_file.txt -g /path/to/gene/set -o /output/path/ -j jobName --annoFas --cpus 8

Bugs

Any bug reports or comments, suggestions are highly appreciated. Please open an issue on GitHub or be in touch via email.

Contributors

Contact

For further support or bug reports please contact: tran@bio.uni-frankfurt.de

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcc2-0.2.3.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

dcc2-0.2.3-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file dcc2-0.2.3.tar.gz.

File metadata

  • Download URL: dcc2-0.2.3.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for dcc2-0.2.3.tar.gz
Algorithm Hash digest
SHA256 22be1bf81a65a7e0a0b79fcd75720d3e64d2434af7bbc903225c7ac741a16f4f
MD5 a921e72e28c98ff12790cb88588e529f
BLAKE2b-256 b3f1b248b3e4b63f7e017386732bf726cbfa94f3566cc93e3f46998ec608e818

See more details on using hashes here.

File details

Details for the file dcc2-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: dcc2-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for dcc2-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8e76e29e9564743fb56842b3065754eb2386b78451490c803214e3cf07a05549
MD5 6259fccc84cf6c03febb0da73f9d8eb5
BLAKE2b-256 4a94dccf29ca50efb0ca344d6929be48381438cf0401fd7ebc182070f4cb624a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page