PDIVAS: Pathogenicity predictor of Deep-Intronic Variants causing Aberrant Splicing
Project description
PDIVAS : Pathogenicity Predictor for Deep-Intronic Variants causing Aberrant Splicing
UPDATE info
to v.1.6 (2024/11/13)
- PDIVAS subcommand vcf2tsv became able to handle & output sample columns in VCF files.
- SpliceAI annotation file (grch38.txt) was updated to GENCODE V47.
- Debug PDIVAS exceptional output (about 'wo_annots' and 'out_of_scope').
Sumary
- PDIVAS is a pathogenicity predictor for deep-intronic variants causing aberrant splicing.
- The deep-intronic variants can cause pathogenic pseudoexons or extending exons which disturb the normal gene expression and can be the cause of patients with Mendelian diseases.
- PDIVAS efficiently prioritizes the causal candidates from a vast number of deep-intronic variants detected by whole-genome sequencing.
- The scope of PDIVAS prediction is variants in protein-coding genes on autosomes and X chromosome.
- This command-line interface is compatible with variant files in VCF format.
PDIVAS is modeled on random forest algorism to classify pathogenic and benign variants with referring to features from
-
Splicing predictors of SpliceAI (Jaganathan et al., Cell 2019) and MaxEntScan (Yeo and Berge, j. Comput. Biol. 2004)
(*)The output module of SpliceAI was customed for PDIVAS features (see the Option2, for the details). -
Human splicing constraint score of ConSplice (Cormier et al., BMC Bioinfomatics 2022).
Reference & contact
Kurosawa et al. BMC Genomics 2023
a0160561@yahoo.co.jp (Ryo Kurosawa at Kyoto University)
<Option1>
Prediction with the PDIVAS-precomputed files (SNV+ short indels (1~4nt))
For the quick implementation of PDIVAS, please use the score-precomputed file here. Possible rare SNVs and short indels (1~4nt) in genes (n=4,512) of Mendelian diseases were comprehensively annotated in the file. To annotate your VCF file, please run the command below,for example.
0. Installation
conda install -c bioconda vcfanno
git clone https://github.com/brentp/vcfanno.git
1. Setting score-precomputed files
(Download score-precomputed file above and create a configure file (following https://github.com/brentp/vcfanno))
vi ./conf.toml
Write as below
[[annotation]]
file="./PDIVAS_precomputed/GRCh38/PDIVAS_precomputed_short_GRCh38.vcf.gz"
# ID and FILTER are special fields that pull the ID and FILTER columns from the VCF
fields = ["PDIVAS"]
ops=["self"]
names=["PDIVAS"]
2. Perform PDIVAS annotation
# Move to your working directory. (The case below is the directory in this repository.)
cd examples
# Perform annotation
vcfanno -lua ./vcfanno/example/custom.lua ./conf.toml ./ex.vcf > output_precomp.vcf
#Compare the output_precomp.vcf with output_precomp_expect.vcf.gz to validate the successful annotation.
<Option2>
Perform annotation of individual features and calculation of PDIVAS scores
For more comprehensive annotation than pre-computed files, run PDIVAS by following the description below.
0-1. Installation
#It is better to prepare new conda environments for PDIVAS installation.
#They take a little long time to solve the environment.
conda create -n PDIVAS -c bioconda -c conda-forge spliceai tensorflow==2.6.2 pdivas bcftools vcfanno
conda create -n VEP -c conda-forge -c bioconda perl==5.26.2 ensembl-vep==105
The successful installation was verified on anaconda version 23.3.1
0-2. Setting customed usages
-For output-customized SpliceAI for PDIVAS conda environment
git clone https://github.com/shiro-kur/PDIVAS.git
cd PDIVAS/Customed_SpliceAI
cp ./__main__for_customed_SpliceAI.py installed_path/__main__.py
cp ./utils_for_customed-SpliceAI.py installed_path/utils.py
cp -rf ./annotations_for_customed_SpliceAI installed_path/annotations
# Examples of installed_path (~/miniconda3/envs/ex/lib/python3.9/site-packages/spliceai)
# files and directories included in the spliceai directory by default ↓
# __init__.py __main__.py __pycache__ annotations models utils.py
# the successfully-customed result was described in examples/~~.vcf
-For VEP custom usage
- Download VEP cache file (version>=107, should correspond to your installed VEP version).
Follow the instructions of "Manually downloading caches" part below.
(https://asia.ensembl.org/info/docs/tools/vep/script/vep_cache.html) - To implement MaxEntScan plugin, follow the instructions below.
(https://asia.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#maxentscan) - Download ConSplice score file from here.
The file was edited from the originally scored file by (Cormier et al., BMC Bioinformatics 2022).
1. Preprocessing VCF format (resolve the multi-allelic site to biallelic sites)
conda activate PDIVAS
bcftools norm -m - multi.vcf > bi.vcf
2. Add gene annotations, MaxEntScan scores, and ConSplice scores with VEP.
conda activate VEP
vep \
--cache --offline --cache_version 107 --assembly GRCh38 --hgvs --pick_allele_gene \
--fasta ./references/hg38.fa.gz --vcf --force \
--custom ./references/ConSplice.50bp_region.inverse_proportion_refo_hg38.bed.gz,ConSplice,bed,overlap,0 \
--plugin MaxEntScan,./references/MaxEntScan/fordownload,SWA,NCSS \
--fields "Consequence,SYMBOL,Gene,INTRON,HGVSc,STRAND,ConSplice,MES-SWA_acceptor_diff,MES-SWA_acceptor_alt,MES-SWA_donor_diff,MES-SWA_donor_alt" \
--compress_output bgzip \
-i ./examples/ex.vcf.gz -o ./examples/ex_vep.vcf.gz
3. Add output-customized SpliceAI scores
conda activate PDIVAS
spliceai -I examples/ex_vep.vcf.gz -O examples/ex_vep_AI.vcf -R hg38.fa -A grch38 -D 300 -M 1
4. Perform the detection of deep-intronic variants and PDIVAS prediction
pdivas predict -I examples/ex_vep_AI.vcf -O examples/ex_vep_AI_PD.vcf.gz -F off
5. (Optional) Convert VCF file with PDIVAS annotation to TSV file (1 gene annotation per 1 line)
pdivas vcf2tsv -I examples/ex_vep_AI_PD.vcf.gz -O examples/ex_vep_AI_PD.tsv
Usage of PDIVAS command line
1. $ pdivas predict
Required parameters:
-I
: Input VCF(.vcf/.vcf.gz) with variants of interest.-O
: Output VCF(.vcf/.vcf.gz) with PDIVAS predictionsGENE_ID|PDIVAS_score
Variants in multiple genes have separate predictions for each gene.
Optional parameters:
-F
: filtering function (off/on) : Output all variants (-F off; default) or only deep-intronic variants with PDIVAS scores (-F on)")
Details of PDIVAS INFO field:
ID | Description |
---|---|
GENE_ID | Ensembl gene ID based on GENCODE V41(GRCh38) or V19(GRCh37) |
PDIVAS | <Predicted result> Pattern 1 : 0.000-1.000 float value (The higher, the more deleterious) <Exceptions> - Output with '-F off'. Filtered with '-F on'. Pattern 2 : 'wo_annots', variants out of VEP or SpliceAI annotations : Pattern 3 : 'out_of_scope', variants without PDIVAS annotation scope (chrY, non-coding gene or non-deep-intronic variants) Pattern 4 :'no_gene_match', variants without matched gene annotation between VEP and SpliceAI |
2. $ pdivas vcf2tsv
Required parameters:
-I
: *Input VCF(.vcf/.vcf.gz) with VEP, SpliceAI,and PDIVAS annotations.-O
: The path to output tsv file name and pass.
*Input VCF is valid only when it was generated through this pipeline.
Interpretation of PDIVAS scores
More details in Kurosawa et al. medRxiv 2023 .
Threshold | Sensitivity (*1) | candidates/individual (*2) |
---|---|---|
>=0.082 | 95% | 26.8 |
>=0.151 | 90% | 14.5 |
>=0.340 | 85% | 6.7 |
>=0.501 | 80% | 4.1 |
>=0.575 | 75% | 3.0 |
>=0.763 | 70% | 1.2 |
(*1) Sensitivities were calculated on curated pathogenic deep-intronic variants in a test dataset.
(*2) Candidates of pathogenic deep-intronic variants were obtained through the process described below. (WGS: Whole-genome sequencing)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdivas-1.2.0.tar.gz
.
File metadata
- Download URL: pdivas-1.2.0.tar.gz
- Upload date:
- Size: 433.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.13.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a2234564557622c5bd2213478956c4a4858ba6768251d24ee43a4fc7550f721 |
|
MD5 | 98d467592f34932ca541ce319700ad49 |
|
BLAKE2b-256 | fc7e281e2053ea00eae24162211af598c247d3c5e4fb2c0d043e24c2e19897e6 |
File details
Details for the file pdivas-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: pdivas-1.2.0-py3-none-any.whl
- Upload date:
- Size: 446.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.13.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 793aa2a225b97e63c9c842f84f9a4f55859b1328194360b8b0ce2bf068be8840 |
|
MD5 | 905825c88eeba05a6d8a3ab2829c5114 |
|
BLAKE2b-256 | 259c8f3445b2c670656b625ad3d24bd8456047b4406472e07787b5cc9749362c |