Skip to main content

EmbedPVP: Integration of Genomics and Phenotypes for Variant Prioritization using Deep Learning

Project description

EmbedPVP: Embedding-based Phenotype Variant Predictor

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning.

Annotation data sources (integrated in the candidate SNP prediction workflow)

We integrated the annotations from different sources:

  • Gene ontology (GO)
  • Mammalian Phenotype ontology (MP)
  • Human Phenotype Ontology (HPO)
  • Uber-anatomy ontology (UBERON)

Dependencies

mOWL library
  • The code was developed and tested using Python 3.9.

  • We used (mOWL) library to process the input dataset as well as generate the embedding representation using different embedding-based methods.

    You need to have JAVA and JDK installed in your machine.

Get the data

  1. Download all the files from data and place the uncompressed the file in the folder named /data.
  2. Download the required database using CADD and follow the instructions to generate the TSV file with CADD scores for the input VCF file.

Use the tool

You can install the tool either from source or PyPi as follows:

:ballot_box_with_check: Install from source

git clone https://github.com/bio-ontology-research-group/EmbedPVP.git
cd EmbedPVP/
pip install -r requirements.txt
mkdir output
cd embedpvp
python main.py [args]
  • Run the command python main.py --help to display help and parameters:
Usage: main.py [OPTIONS]

Options:
  -d, --data-root TEXT      Data root folder  [required]
  -i, --in_file TEXT        Annotated Input VCF file  [required]
  -p, --pathogenicity TEXT  Path to the pathogenicity prediction file (CADD) [required]
  -hpo, --hpo TEXT          List of phenotype codes separated by commas [required]
  -m, --model_type TEXT     Ontology model, one of the following (go , mp , hp, uberon, union)
  -e, --embedding TEXT      Preferred embedding model (e.g. TransD, TransE, TranR, ConvE ,DistMult, DL2vec, OWL2vc, EL, ELBox)
  -dir, --outdir TEXT       Path to the output directory
  -o, --outfile TEXT        Path to the results output file
  --help                    Show this message and exit.

  • Run the example:
python main.py -d ../data/ -i example_annotation.vcf.hg38_multianno.txt  -p example_cadd.tsv.gz -hpo HP:0004791,HP:0002020,HP:0100580,HP:0001428,HP:0011459 -m hp -e TransE -dir ../output/ -o example_output1.tsv

 Annotate VCF file (example.vcf) with the phenotypes (HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218,HP:0000007)...
 |========                        | 25% Annotated files generated successfully.
 |================                | 50% Phenotype prediction...
 |========================        | 75% Variants prediction...
 |================================| 100%
The analysis is Done. You can find the priortize list in the output file: ../output/example_output.txt 

:ballot_box_with_check: Install from PyPi

Output:

The script will output a ranking a score for the candidate caustive list of variants.

Reference

For further details or if you used EmbedPVP in your work, please refer to this article:

@article{althagafi2023prioritizing,
  title={Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning},
  author={Althagafi, Azza and Zhapa-Camacho, Fernando and Hoehndorf, Robert},
  journal={bioRxiv},
  pages={2023--11},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}

Note

For any questions or comments please contact azza.althagafi@kaust.edu.sa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

embedpvp-1.0.5-py3-none-any.whl (26.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page