Skip to main content

Dynamic core ortholog compilation tool

Project description

dcc2

PyPI version License: GPL v3 Build Status Github Build

dcc2 is a tool for compiling core set data for fDOG using predicted orthologs from OMA (either OMA-browser or OMA-standalone), as well as from an OrthoXML file resulted from any ortholog search tools.

Outputs of this tool are 3 (optional 4) folders required for a fDOG run, including (1) core_orthologs, where a multiple fasta file and a corresponding profile HMM can be found, (2) genome_dir contains gene sets of taxa, from which the orthologs are originated, (3) blast_dir holds the blast databases of those gene sets within genome_dir, and an optional (4) weight_dir that contains feature architecure annotations of all gene sets.

Table of Contents

How to install

dcc2 is distributed as a python package called dcc2. It is compatible with Python ≥ v3.7.

Install the dcc2 package

You can install dcc2 using pip:

python3 -m pip install dcc2

or, in case you do not have admin rights, and don't use package systems like Anaconda to manage environments you need to use the --user option:

python3 -m pip install --user dcc2

and then add the following line to the end of your ~/.bashrc or ~/.bash_profile file, restart the current terminal to apply the change (or type source ~/.bashrc):

export PATH=$HOME/.local/bin:$PATH

Setup dcc2

After installing dcc2, you need to run the prepare script to download and parse required OMA browser data.

You can do it by running this command

dcc.prepare -o /output/path/for/oma/data

Usage

For parsing OMA orthologs by using an OMA group ID:

dcc.parseOmaById -g 1 -n HUMAN,THEAM,DESM0 -o /output/path/ -j jobName --cpus 8

Or using list of OMA taxa:

dcc.parseOmaBySpec -n HUMAN,ECOLI,YEAST -o /output/path/ -j jobName --annoFas --cpus 8 [--missingTaxa 1]

--missingTaxa deinfies the number of taxa that are allowed to be absent in each ortholog group.

If only 2 OMA taxa are given, you can choose to use OMA pairs instead of OMA groups:

dcc.parseOmaBySpec -n HUMAN,ECOLI -t pair -o /output/path/ -j jobName --annoFAS --cpus 8

For parsing an output from OMA-standalone or an OrthoXML file, dcc2 requires:

  • the output orthoXML file of the ortholog groups,
  • a taxon mapping file in tab-delimited format containing 3 columns <NCBI taxon ID> <Original taxon name> <Abbr. taxon name>, where original taxon name is the name that is written in the orthoXML input file and abbr. taxon name is its abbreviation species code (for example: HOMSA for Homo sapiens)
  • protein set of included taxa. In case of OMA, this can be either given as a folder, or automatically downloaded from OMA database
dcc.parseOrthoxml -i input.orthoxml -m mapping_file.txt -g /path/to/gene/set -o /output/path/ -j jobName --annoFas --cpus 8

One can also limit the taxa included in the ortholog groups, if not all the taxa in the orthoXML file are needed. In this case, just reduce the taxon mapping file to contain only required taxa and use this function:

dcc.parseOrthoxmlCustom -i input.orthoxml -m mapping_file_reduced.txt -g /path/to/gene/set -o /output/path/ -j jobName --annoFas --cpus 8 [--minTaxa 10]

--minTaxa defines the minimum number of taxa should be included in each ortholog group.

Bugs

Any bug reports or comments, suggestions are highly appreciated. Please open an issue on GitHub or be in touch via email.

Contributors

Contact

For further support or bug reports please contact: tran@bio.uni-frankfurt.de

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcc2-0.3.3.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcc2-0.3.3-py3-none-any.whl (35.7 kB view details)

Uploaded Python 3

File details

Details for the file dcc2-0.3.3.tar.gz.

File metadata

  • Download URL: dcc2-0.3.3.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dcc2-0.3.3.tar.gz
Algorithm Hash digest
SHA256 d55d3ce313cea9c052a0276f016ae43647d9c43d8dff8e52c779a745b8b22875
MD5 f5ae37411b681900ea02dec5945fc8a5
BLAKE2b-256 2c433ef38f380f534bfde55dfbcb70a2059e5a21870000ebb2a393e4d4945ce3

See more details on using hashes here.

File details

Details for the file dcc2-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: dcc2-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 35.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dcc2-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dc1c838bfd8c76111f96ab9cef57461e26b1fea06cf9aeda00f952244140577c
MD5 6299c7948bd45369aca8e31e3bd1fb67
BLAKE2b-256 c40b75fd239338858a1f1e619912296e4f6cd78541a06d6f142e92e2a8f0d2e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page