Skip to main content

DeepSVP: Integration of Genomics and Phenotypes forStructural Variant Prioritization using Deep Learning

Project description

DeepSVP

DeepSVP is a computational method to prioritize structural variants involved in genetic diseases by combining genomic information with information about gene functions. We incorporate phenotypes linked to genes, functions of gene products, gene expression in individual celltypes, and anatomical sites of expression, and systematically relate them to their phenotypic consequences through ontologies and machine learning

Dataset

We train and evaluate our method using human genomic Structural Variation collected from dbvar dataset.

Prediction the candidate CNVs workflow

We integrate the annotates from Gene ontology GO, Uber-anatomy ontology UBERON, Mammalian Phenotype ontology MP, and Human Phenotype Ontology HPO using DL2vec. We convert different types of Description Logic axioms into graph representation, and then generate an embedding for each node and edge type. We collected genomics features using public tool AnnotSV (v2.3 or 2.2).

Installation

pip install deepsvp

Running the prediction model

  • Download all the files in data and place them into data folder.

  • Download and install the required database AnnoSV (v2.3 or 2.2), and then run:

    bash scripts/annotation.sh -i input.vcf -o annotated_file
    

    and place the annotated VCF file into data folder.

  • Run the command deepsvp --help to display help and parameters:

    Usage: main.py [OPTIONS]
    
    DeepSVP: A phenotype-based tool to prioritize caustive CNV using WGS data
    and Phenotype/Gene Functional Similarity
    
    Options:
    -d, --data-root TEXT      Data root folder  [required]
    -i, --in-file TEXT        Annotated Input file  [required]
    -p, --hpo TEXT            List of phenotype ids separated by commas
                              [required]
    -maf, --maf_filter FLOAT  Allele frequency filter using gnomAD and 1000G
                              default<=0.01
    -m, --model_type TEXT     Ontology model, one of the following (go , mp ,
                              hp, cl, uberon, union), default=mp
    -ag, --aggregation TEXT   Aggregation method for the genes within CNV (max
                              or mean) default=max
    -o, --outfile TEXT        Output result file
    --help                    Show this message and exit.
    
    

Example:

deepsvp -d data/ -i example_annotsv.tsv -p HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218 -m cl -maf 0.01 -ag max -o example_output.txt
|========                        | 25% Reading the input phenotypes...
|================                | 50% Phenotype prediction... 
|========================        | 75% CNV Prediction... 
|================================| 100% DONE! You can find the prediction results in the output file: example_output.txt

Output:

The script will output a ranking a score for the candidate caustive CNV.

Scripts

  • Details for predicting pathogenic variants and comparison with other methods can be found in the experiment folder.
  • annotations.sh: This script is used to annotate the varaints.
  • data_preprocessing.py: preprocessing the annotations and features.
  • pheno_model.py: script to get the DL2vec score using the trained model.
  • deepsvp_training.py: script to train and testing the model, with Hyperparameter optimization
  • BWA_GATK.sh : script to run GATK workflow for the input fastq files for the real samples, run using KAUST Supercomputing IBEX.
  • run_Manta.sh : script to generate VCF with the structural variants (SVs), we used Manta to identify the candidate SVs. run using KAUST Supercomputing IBEX.

Final notes

For any questions or comments please contact: azza.althagafi@kaust.edu.sa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

deepsvp-1.0.2-py3-none-any.whl (9.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page