Skip to main content

PDIVAS: Pathogenicity predictor of Deep-Intronic Variants causing Aberrant Splicing

Project description

PDIVAS : Pathogenicity Predictor for Deep-Intronic Variants causing Aberrant Splicing

License: MIT

PDIVAS image

UPDATE info

to v.1.6 (2024/11/13)

  • PDIVAS subcommand vcf2tsv became able to handle & output sample columns in VCF files.
  • SpliceAI annotation file (grch38.txt) was updated to GENCODE V47.
  • Debug PDIVAS exceptional output (about 'wo_annots' and 'out_of_scope').

Sumary

  • PDIVAS is a pathogenicity predictor for deep-intronic variants causing aberrant splicing.
  • The deep-intronic variants can cause pathogenic pseudoexons or extending exons which disturb the normal gene expression and can be the cause of patients with Mendelian diseases.
  • PDIVAS efficiently prioritizes the causal candidates from a vast number of deep-intronic variants detected by whole-genome sequencing.
  • The scope of PDIVAS prediction is variants in protein-coding genes on autosomes and X chromosome.
  • This command-line interface is compatible with variant files in VCF format.

PDIVAS is modeled on random forest algorism to classify pathogenic and benign variants with referring to features from

  1. Splicing predictors of SpliceAI (Jaganathan et al., Cell 2019) and MaxEntScan (Yeo and Berge, j. Comput. Biol. 2004)
    (*)The output module of SpliceAI was customed for PDIVAS features (see the Option2, for the details).

  2. Human splicing constraint score of ConSplice (Cormier et al., BMC Bioinfomatics 2022).

Reference & contact

Kurosawa et al. BMC Genomics 2023
a0160561@yahoo.co.jp (Ryo Kurosawa at Kyoto University)

<Option1>
Prediction with the PDIVAS-precomputed files (SNV+ short indels (1~4nt))

For the quick implementation of PDIVAS, please use the score-precomputed file here. Possible rare SNVs and short indels (1~4nt) in genes (n=4,512) of Mendelian diseases were comprehensively annotated in the file. To annotate your VCF file, please run the command below,for example.

0. Installation

conda install -c bioconda vcfanno
git clone https://github.com/brentp/vcfanno.git

1. Setting score-precomputed files
(Download score-precomputed file above and create a configure file (following https://github.com/brentp/vcfanno))

vi ./conf.toml

Write as below

[[annotation]]
file="./PDIVAS_precomputed/GRCh38/PDIVAS_precomputed_short_GRCh38.vcf.gz"
# ID and FILTER are special fields that pull the ID and FILTER columns from the VCF
fields = ["PDIVAS"]
ops=["self"]
names=["PDIVAS"]

2. Perform PDIVAS annotation

# Move to your working directory. (The case below is the directory in this repository.)
cd examples

# Perform annotation
vcfanno -lua ./vcfanno/example/custom.lua ./conf.toml ./ex.vcf > output_precomp.vcf
#Compare the output_precomp.vcf with output_precomp_expect.vcf.gz to validate the successful annotation.

<Option2>
Perform annotation of individual features and calculation of PDIVAS scores

For more comprehensive annotation than pre-computed files, run PDIVAS by following the description below.

0-1. Installation

#It is better to prepare new conda environments for PDIVAS installation.
#They take a little long time to solve the environment.
conda create -n PDIVAS -c bioconda -c conda-forge spliceai tensorflow==2.6.2 pdivas bcftools vcfanno
conda create -n VEP -c conda-forge -c bioconda perl==5.26.2 ensembl-vep==105

The successful installation was verified on anaconda version 23.3.1

0-2. Setting customed usages

-For output-customized SpliceAI for PDIVAS conda environment

git clone https://github.com/shiro-kur/PDIVAS.git
cd PDIVAS/Customed_SpliceAI
cp ./__main__for_customed_SpliceAI.py installed_path/__main__.py
cp ./utils_for_customed-SpliceAI.py installed_path/utils.py
cp -rf ./annotations_for_customed_SpliceAI installed_path/annotations

# Examples of installed_path (~/miniconda3/envs/ex/lib/python3.9/site-packages/spliceai)
# files and directories included in the spliceai directory by default ↓
# __init__.py  __main__.py  __pycache__  annotations  models  utils.py
# the successfully-customed result was described in examples/~~.vcf

-For VEP custom usage

1. Preprocessing VCF format (resolve the multi-allelic site to biallelic sites)

conda activate PDIVAS
bcftools norm -m - multi.vcf > bi.vcf

2. Add gene annotations, MaxEntScan scores, and ConSplice scores with VEP.

conda activate VEP
vep \
--cache --offline --cache_version 107 --assembly GRCh38 --hgvs --pick_allele_gene \
--fasta ./references/hg38.fa.gz --vcf --force \
--custom ./references/ConSplice.50bp_region.inverse_proportion_refo_hg38.bed.gz,ConSplice,bed,overlap,0 \
--plugin MaxEntScan,./references/MaxEntScan/fordownload,SWA,NCSS \
--fields "Consequence,SYMBOL,Gene,INTRON,HGVSc,STRAND,ConSplice,MES-SWA_acceptor_diff,MES-SWA_acceptor_alt,MES-SWA_donor_diff,MES-SWA_donor_alt" \
--compress_output bgzip \
-i ./examples/ex.vcf.gz -o ./examples/ex_vep.vcf.gz

3. Add output-customized SpliceAI scores

conda activate PDIVAS
spliceai -I examples/ex_vep.vcf.gz -O examples/ex_vep_AI.vcf -R hg38.fa -A grch38 -D 300 -M 1

4. Perform the detection of deep-intronic variants and PDIVAS prediction

pdivas predict -I examples/ex_vep_AI.vcf -O examples/ex_vep_AI_PD.vcf.gz -F off

5. (Optional) Convert VCF file with PDIVAS annotation to TSV file (1 gene annotation per 1 line)

pdivas vcf2tsv -I examples/ex_vep_AI_PD.vcf.gz -O examples/ex_vep_AI_PD.tsv

Usage of PDIVAS command line

1. $ pdivas predict
Required parameters:

  • -I: Input VCF(.vcf/.vcf.gz) with variants of interest.
  • -O: Output VCF(.vcf/.vcf.gz) with PDIVAS predictions GENE_ID|PDIVAS_score Variants in multiple genes have separate predictions for each gene.

Optional parameters:

  • -F: filtering function (off/on) : Output all variants (-F off; default) or only deep-intronic variants with PDIVAS scores (-F on)")

Details of PDIVAS INFO field:

ID Description
GENE_ID Ensembl gene ID based on GENCODE V41(GRCh38) or V19(GRCh37)
PDIVAS <Predicted result>
Pattern 1 : 0.000-1.000 float value (The higher, the more deleterious)
<Exceptions>
- Output with '-F off'. Filtered with '-F on'.
Pattern 2 : 'wo_annots', variants out of VEP or SpliceAI annotations :
Pattern 3 : 'out_of_scope', variants without PDIVAS annotation scope
(chrY, non-coding gene or non-deep-intronic variants) 
Pattern 4 :'no_gene_match', variants without matched gene annotation between VEP and SpliceAI

2. $ pdivas vcf2tsv
Required parameters:

  • -I: *Input VCF(.vcf/.vcf.gz) with VEP, SpliceAI,and PDIVAS annotations.
  • -O: The path to output tsv file name and pass.
    *Input VCF is valid only when it was generated through this pipeline.

Interpretation of PDIVAS scores

More details in Kurosawa et al. medRxiv 2023 .

Threshold Sensitivity (*1) candidates/individual (*2)
>=0.082 95% 26.8
>=0.151 90% 14.5
>=0.340 85% 6.7
>=0.501 80% 4.1
>=0.575 75% 3.0
>=0.763 70% 1.2

(*1) Sensitivities were calculated on curated pathogenic deep-intronic variants in a test dataset.
(*2) Candidates of pathogenic deep-intronic variants were obtained through the process described below. (WGS: Whole-genome sequencing)

Cand_image

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdivas-1.2.0.tar.gz (433.6 kB view details)

Uploaded Source

Built Distribution

pdivas-1.2.0-py3-none-any.whl (446.4 kB view details)

Uploaded Python 3

File details

Details for the file pdivas-1.2.0.tar.gz.

File metadata

  • Download URL: pdivas-1.2.0.tar.gz
  • Upload date:
  • Size: 433.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for pdivas-1.2.0.tar.gz
Algorithm Hash digest
SHA256 4a2234564557622c5bd2213478956c4a4858ba6768251d24ee43a4fc7550f721
MD5 98d467592f34932ca541ce319700ad49
BLAKE2b-256 fc7e281e2053ea00eae24162211af598c247d3c5e4fb2c0d043e24c2e19897e6

See more details on using hashes here.

File details

Details for the file pdivas-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: pdivas-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 446.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for pdivas-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 793aa2a225b97e63c9c842f84f9a4f55859b1328194360b8b0ce2bf068be8840
MD5 905825c88eeba05a6d8a3ab2829c5114
BLAKE2b-256 259c8f3445b2c670656b625ad3d24bd8456047b4406472e07787b5cc9749362c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page