Skip to main content

MPA: MoBiDiC Prioritization Algorithm

Project description

MPA: MoBiDiC Prioritizing Algorithm


license release pypiV pythonV pypiS

MPA

Overview

The MPA is a prioritizing algorithm for Next Generation Sequencing molecular diagnosis. We propose an open source and free for academic user workflow.

Variant ranking is made with a unique score that take into account curated database, biological assumptions, splicing predictions and the sum of various predictors for missense alterations. Annotations are made for exonic and splicing variants up to +300nt.

We show the pertinence of our clinical diagnosis approach with an updated evaluation of in silico prediction tools using DYSF, DMD, LMNA, NEB and TTN variants from the human expert-feeded Universal Mutation Database [1] with courtesy regards of curators for pathogenic variants and from the ExAc database [2] to define the dataset of neutral variants.

MPA needs an annotated vcf by ANNOVAR and give as output an annotated vcf with MPA score & ranks.

MPA diagram

*PTC: Premature Truncation Codon : nonsense or frameshift

**: intronic positions between -20 and +5

Input

The MPA uses, as input, an annotated VCF file with Annovar [3] and the following databases :

  • Curated database: ClinVar [4]
  • Biological assumption : refGene [5]
  • Splicing predicition : SpliceAI [6], dbscSNV [7]
  • Missense prediction : dbNSFP [8]

Note : Short tutorial to annotate your VCF with Annovar (cf. Quick guide for Annovar).

Update April 2019: spliceAI annotations now replace spidex. Waiting for spliceAI to be included in ANNOVAR, Files for this dataset in the proper format are available upon request (hg19 or hg38).

Multi-allelic variants in vcf should be splitted to biallelic variants with bcftools norm.

bcftools norm -m - file.vcf > file_breakmulti.vcf

Output

In a VCF format

VCF is annotated with multiples items : MPA_impact (Clinvar_pathogenicity, splice_impact, stop and frameshift_impact, missense_impact and unknown_impact), MPA_ranking (1 to 8), MPA_final_score (from 0 to 10) and details for the scoring as MPA_available (from 0 to 10 missense tools which annotate), MPA_deleterious (number of missense tools that annotate pathogenic), MPA_adjusted (normalize missense score from 0 to 10).

Ranking : from 1 to 10 and score

Ranking : from 1 to 10 and score

    1. clinvar_pathogenicity : Pathogenic variants reported on ClinVar (score : 10)
    1. stop or frameshift_impact : Premature Truncation Codon : nonsense or frameshift (score : 10)
    1. splicing_impact (ADA, RF) : Affecting splice variants predictions ranked by algorithm performance robustness and strength (score : 10)
    1. splicing_impact (spliceAI high) : Affecting splice variants predictions ranked by algorithm performance robustness and strength (score : 10)
    1. missense impact moderate to high impact (6-10)
    1. moderate splicing_impact (spliceAI moderate) (score 6)
    1. missense_impact moderate : Missense variants scores low impact (score : 2-6)
    1. low splicing impact (spliceAI low) (indel) (score : 2)
    1. missense_impact low : Missense variants scores low impact (score : 0-2)
    1. unknown impact : Exonic variants with not clearly annotated ORFs and splicing variants not predicted pathogenic ; or NULL (no annotation on genes, splice etc...) (score : 0-10)

With a simple interface (Captain ACHAB)

MPA is a part of MobiDL captainAchab workflow. MPA is the core of ranking in our useful and simple interface to easily interpret NGS variants at a glance named Captain ACHAB. Find more informations at Captain ACHAB


Installation

Requirements

  • Python 3

pip

python3 -m pip install mobidic-mpa

Quick start

To run the MPA script, use this command line :

mpa -i path/to/input.vcf -o path/to/output.vcf

Quick guide for Annovar

This algorithm introduce here need some basics annotation. We introduce here a quick guide to annotate your VCF files with Annovar.

Install Annovar

Follow instruction to download Annovar at :

http://www.openbioinformatics.org/annovar/annovar_download_form.php

Unpack the package by using this command :

tar xvfz annovar.latest.tar.gz

Download all databases

In Annovar folder, download all database needed with annotate_variation.pl:

perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar refGene humandb/
perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar clinvar_20180603 humandb/
perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp33a  humandb/
perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbscsnv11 humandb/

Deprecated: For Spidex database, follow instruction here :

http://www.openbioinformatics.org/annovar/spidex_download_form.php

Update April 2019: spliceAI annotations now replace spidex. Waiting for spliceAI to be included in ANNOVAR, Files for this dataset in the proper format are available upon request (hg19 or hg38).

Annotate a VCF

The following command line annotate a VCF file :

perl path/to/table_annovar.pl path/to/example.vcf humandb/ -buildver hg19 -out path/to/output/name -remove -protocol refGene,refGene,clinvar_20180603,dbnsfp33a,spliceai_filtered,dbscsnv11 -operation g,g,f,f,f,f -nastring . -vcfinput -otherinfo -arg '-splicing 20','-hgvs',,,,

Citing MPA

Yauy et al. MPA, a free, accessible and efficient pipeline for SNV annotation and prioritization for NGS routine molecular diagnosis. The Journal of Molecular Diagnostics (2018) https://doi.org/10.1016/j.jmoldx.2018.03.009


Montpellier Bioinformatique pour le Diagnostique Clinique (MoBiDiC)

CHU de Montpellier

France

MoBiDiC

Visit our website


  1. Béroud, C. et al. UMD (Universal Mutation Database): 2005 update. Hum. Mutat. 26, 184–191 (2005).
  2. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
  3. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164–e164 (2010).
  4. Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2015).
  5. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–45 (2016).
  6. Jaganathan et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell 176, 535-548 (2019).
  7. Jian, X., Boerwinkle, E. & Liu, X. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res. 42, 13534–13544 (2014).
  8. Liu, X., Wu, C., Li, C. & Boerwinkle, E. dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs. Hum. Mutat. 37, 235–241 (2016).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mobidic-mpa-1.1.0.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

mobidic_mpa-1.1.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file mobidic-mpa-1.1.0.tar.gz.

File metadata

  • Download URL: mobidic-mpa-1.1.0.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.4

File hashes

Hashes for mobidic-mpa-1.1.0.tar.gz
Algorithm Hash digest
SHA256 4a1fe0990b09284e28721fcda77bb633f4fd2030baa79a4595c014c6d801f94d
MD5 b24c08ea1bce017aa5148fa645349eed
BLAKE2b-256 205c7da70f97f2f294cbe026b70c877bc7b6bf1d2471e2be2b70999342383c23

See more details on using hashes here.

File details

Details for the file mobidic_mpa-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: mobidic_mpa-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.4

File hashes

Hashes for mobidic_mpa-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d86b82a4e32c200756c1098f6c44ec8e7f7bf641933f3e69075f53dd1c442b4
MD5 a395eabb8a80ad5fec09fe40ec7f9a72
BLAKE2b-256 bb9106e2c40281d0c24bf9072e5003b0cc7acb57c94640b061cb909e166cbd94

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page