Skip to main content

EmbedPVP: Integration of Genomics and Phenotypes for Variant Prioritization using Deep Learning

Project description

EmbedPVP: Embedding-based Phenotype Variant Predictor

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning.

Annotation data sources (integrated in the candidate SNP prediction workflow)

We integrated the annotations from different sources:

  • Gene ontology (GO)
  • Mammalian Phenotype ontology (MP)
  • Human Phenotype Ontology (HPO)
  • Uber-anatomy ontology (UBERON)

Dependencies

mOWL library
  • The code was developed and tested using Python 3.9.

  • We used (mOWL) library to process the input dataset as well as generate the embedding representation using different embedding-based methods.

    You need to have JAVA and JDK installed in your machine.

Get the data

  1. Download all the files from data and place the uncompressed the file in the folder named /data.
  2. Download the required database using CADD and follow the instructions to generate the TSV file with CADD scores for the input VCF file.

Use the tool

You can install the tool either from source or PyPi as follows:

:ballot_box_with_check: Install from source

git clone https://github.com/bio-ontology-research-group/EmbedPVP.git
cd EmbedPVP/
pip install -r requirements.txt
mkdir output
cd embedpvp
python main.py [args]
  • Run the command python main.py --help to display help and parameters:
Usage: main.py [OPTIONS]

Options:
  -d, --data-root TEXT      Data root folder  [required]
  -i, --in_file TEXT        Annotated Input VCF file  [required]
  -p, --pathogenicity TEXT  Path to the pathogenicity prediction file (CADD) [required]
  -hpo, --hpo TEXT          List of phenotype codes separated by commas [required]
  -m, --model_type TEXT     Ontology model, one of the following (go , mp , hp, uberon, union)
  -e, --embedding TEXT      Preferred embedding model (e.g. TransD, TransE, TranR, ConvE ,DistMult, DL2vec, OWL2vc, EL, ELBox)
  -dir, --outdir TEXT       Path to the output directory
  -o, --outfile TEXT        Path to the results output file
  --help                    Show this message and exit.

  • Run the example:
python main.py -d ../data/ -i example_annotation.vcf.hg38_multianno.txt  -p example_cadd.tsv.gz -hpo HP:0004791,HP:0002020,HP:0100580,HP:0001428,HP:0011459 -m hp -e TransE -dir ../output/ -o example_output1.tsv

 Annotate VCF file (example.vcf) with the phenotypes (HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218,HP:0000007)...
 |========                        | 25% Annotated files generated successfully.
 |================                | 50% Phenotype prediction...
 |========================        | 75% Variants prediction...
 |================================| 100%
The analysis is Done. You can find the priortize list in the output file: ../output/example_output.txt 

:ballot_box_with_check: Install from PyPi

Output:

The script will output a ranking a score for the candidate caustive list of variants.

Reference

For further details or if you used EmbedPVP in your work, please refer to this article:

@article{althagafi2023prioritizing,
  title={Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning},
  author={Althagafi, Azza and Zhapa-Camacho, Fernando and Hoehndorf, Robert},
  journal={bioRxiv},
  pages={2023--11},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}

Note

For any questions or comments please contact azza.althagafi@kaust.edu.sa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

embedpvp-1.0.5-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file embedpvp-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: embedpvp-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 26.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.6

File hashes

Hashes for embedpvp-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d2367dc861f4470c535d705fbcf0cc9cca0c28d79a1a1ac5bc2e0692ac6dae70
MD5 cf5f57862592a4d53abc563682670729
BLAKE2b-256 d202a0890405bc7a836d697502cbbf6a8d795f1ec5879945818621491da931f4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page