Skip to main content

Phylogenetic Profiling with OMA and minhashing

Project description

HogProf

  • HogProf is an extensible and tunable approach to phylogenetic profiling using orthology data. It is powered by minhash based datastructures and computationally efficient.
  • Still under major development and may change

Features

  • Using orthoxoml files and a taxonomy calculated enhanced phylogenies of each family
  • These are transformed into minhash signatures and a locally sensitive hashing forest object for search and comparison of profiles
  • Taxonomic levels and evolutionary event types ( presence, loss, duplication ) can have custom weight in profile construction
  • Optimization of weights using machine learning

If you run into any problems feel free to contact me at dmoi@unil.ch

Quickstart

to install from github

$ git clone https://github.com/DessimozLab/HogProf.git
$ pip install -r pipreqs.txt .

or to install from pypi

$ pip install hogprof

lets get a current version of the OMA hdf5 file and GAF. This will alow us to use the HOGs and study the functional enrichment of our search results.

$ cd ../..
$ mkdir YourOmaDirectory
$ cd YourOmaDirectory
$ wget https://omabrowser.org/All/OmaServer.h5
$ wget https://omabrowser.org/All/oma-go.txt.gz

We also need to make a location to store our pyprofiler databases

$ cd ..
$ mkdir YourHogProfDirectory

Ok. We're ready! Now let's compile a database containing all HOGs and our desired taxonomic levels using default settings. Launch the lshbuilder. dbtypes available on the command line are : all , plants , archaea, bacteria , eukarya , protists , fungi , metazoa and vertebrates. These will use the NCBI taxonomy as a tree to annotate events in different gene family's histories.

$python lshbuilder.py --outpath YourHogProfDirectory --dbtype all --OMA YourOmaDirectory/OmaServer.h5 --nthreads numberOfCPUcores         

This should build a taxonomic tree for the genomes contained in the release and then calculate enhanced phylogenies for all HOGs in OMA.

Once the database is completed it can be interogated using a profiler object. Construction and usage of this object should be done using a python script or notebook. This shown in the example notebook searchenrich.ipynb found in the examples. Please feel free to modify it to suit the needs of your own research.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HogProf-0.0.8.tar.gz (104.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

HogProf-0.0.8-py3-none-any.whl (115.3 kB view details)

Uploaded Python 3

File details

Details for the file HogProf-0.0.8.tar.gz.

File metadata

  • Download URL: HogProf-0.0.8.tar.gz
  • Upload date:
  • Size: 104.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for HogProf-0.0.8.tar.gz
Algorithm Hash digest
SHA256 19e29c386cc375a58e84001b79f1d0a05a7a8e2b426d0728f6c6bae4e2174e60
MD5 eec8feaffe2cc2240467cf94831be0ba
BLAKE2b-256 9b5f0e3e94e2c94a87139bb3bcbfdf566ef6ed21245ed729e3a32bbf79b6e75a

See more details on using hashes here.

File details

Details for the file HogProf-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: HogProf-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 115.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for HogProf-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 24316d59884d8d2d3da06f3dab1f428be90e1c4f16fefc94083be5bcac207367
MD5 9aa9af9d7711dcb9e7850312d1940280
BLAKE2b-256 6be3d7eae4c185133bfb1d81fff87b56b58b502a6696924abda3aaedcaa6fa5d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page