Skip to main content

MPA: MoBiDiC Prioritization Algorithm

Project description

MPA: MoBiDiC Prioritizing Algorithm


MPA

license release pypiV pythonV pypiS

Overview

The MPA is a prioritizing algorithm for Next Generation Sequencing molecular diagnosis. We propose an open source and free for academic user workflow.

Variant ranking is made with a unique score that take into account curated database, biological assumptions, splicing predictions and the sum of various predictors for missense alterations. Annotations are made for exonic and splicing variants up to +300nt.

We show the pertinence of our clinical diagnosis approach with an updated evaluation of in silico prediction tools using DYSF, DMD, LMNA, NEB and TTN variants from the human expert-feeded Universal Mutation Database [1] with courtesy regards of curators for pathogenic variants and from the ExAc database [2] to define the dataset of neutral variants.

MPA needs an annotated vcf by ANNOVAR and give as output an annotated vcf with MPA score & ranks.

MPA diagram

*PTC: Premature Truncation Codon : nonsense or frameshift

**: intronic positions between -20 and +5

Input

The MPA uses, as input, an annotated VCF file with Annovar [3] and the following databases :

  • Curated database: ClinVar [4]
  • Biological assumption : refGene [5]
  • Splicing predicition : SpliceAI [6], dbscSNV [7]
  • Missense prediction : dbNSFP [8]

Note : Short tutorial to annotate your VCF with Annovar (cf. Quick guide for Annovar).

Update April 2019: spliceAI annotations now replace spidex. Waiting for spliceAI to be included in ANNOVAR, Files for this dataset in the proper format are available upon request (hg19 or hg38).

Multi-allelic variants in vcf should be splitted to biallelic variants with bcftools norm.

bcftools norm -m - file.vcf > file_breakmulti.vcf

Output

In a VCF format

VCF is annotated with multiples items : MPA_impact (Clinvar_pathogenicity, splice_impact, stop and frameshift_impact, missense_impact and unknown_impact), MPA_ranking (1 to 8), MPA_final_score (from 0 to 10) and details for the scoring as MPA_available (from 0 to 10 missense tools which annotate), MPA_deleterious (number of missense tools that annotate pathogenic), MPA_ajusted (normalize missense score from 0 to 10).

Ranking : from 1 to 10 and score

  • 1 - 10 with clinvar_pathogenicity : Pathogenic variants reported on ClinVar
  • 2 - 10 with stop or frameshift_impact : Premature Truncation Codon : nonsense or frameshift
  • 3,4,5 - 10 with splicing_impact (ADA, RF, spliceAI) : Affecting splice variants predictions ranked by algorithm performance robustness and strength
  • 6 - 8 with moderate splicing_impact (spliceAI)
  • 7 - with splicing_impact (indel) - Indel in splicing regions (as there is no splicing predictions for this case)
  • 8 - with missense_impact (10 to 0) : Missense variants scores
  • 9 - 6 with low splicing_impact (spliceAI)
  • 10 - with unknown_impact : Exonic variants with not clearly annotated ORFs and splicing variants not predicted pathogenic

With a simple interface (Captain ACHAB)

MPA is a part of MobiDL captainAchab workflow. MPA is the core of ranking in our useful and simple interface to easily interpret NGS variants at a glance named Captain ACHAB. Find more informations at Captain ACHAB


Installation

Requirements

  • Python 3

pip

python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple mobidic-mpa

Quick start

To run the MPA script, use this command line :

mpa -i path/to/input.vcf -o path/to/output.vcf

Quick guide for Annovar

This algorithm introduce here need some basics annotation. We introduce here a quick guide to annotate your VCF files with Annovar.

Install Annovar

Follow instruction to download Annovar at :

http://www.openbioinformatics.org/annovar/annovar_download_form.php

Unpack the package by using this command :

tar xvfz annovar.latest.tar.gz

Download all databases

In Annovar folder, download all database needed with annotate_variation.pl:

perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar refGene humandb/
perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar clinvar_20180603 humandb/
perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp33a  humandb/
perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbscsnv11 humandb/

Deprecated: For Spidex database, follow instruction here :

http://www.openbioinformatics.org/annovar/spidex_download_form.php

Update April 2019: spliceAI annotations now replace spidex. Waiting for spliceAI to be included in ANNOVAR, Files for this dataset in the proper format are available upon request (hg19 or hg38).

Annotate a VCF

The following command line annotate a VCF file :

perl path/to/table_annovar.pl path/to/example.vcf humandb/ -buildver hg19 -out path/to/output/name -remove -protocol refGene,refGene,clinvar_20180603,dbnsfp33a,spliceai_filtered,dbscsnv11 -operation g,g,f,f,f,f -nastring . -vcfinput -otherinfo -arg '-splicing 20','-hgvs',,,,

Citing MPA

Yauy et al. MPA, a free, accessible and efficient pipeline for SNV annotation and prioritization for NGS routine molecular diagnosis. The Journal of Molecular Diagnostics (2018) https://doi.org/10.1016/j.jmoldx.2018.03.009


Montpellier Bioinformatique pour le Diagnostique Clinique (MoBiDiC)

CHU de Montpellier

France

MoBiDiC

Visit our website


  1. Béroud, C. et al. UMD (Universal Mutation Database): 2005 update. Hum. Mutat. 26, 184–191 (2005).
  2. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
  3. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164–e164 (2010).
  4. Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2015).
  5. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–45 (2016).
  6. Jaganathan et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell 176, 535-548 (2019).
  7. Jian, X., Boerwinkle, E. & Liu, X. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res. 42, 13534–13544 (2014).
  8. Liu, X., Wu, C., Li, C. & Boerwinkle, E. dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs. Hum. Mutat. 37, 235–241 (2016).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mobidic-mpa-0.0.7.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

mobidic_mpa-0.0.7-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file mobidic-mpa-0.0.7.tar.gz.

File metadata

  • Download URL: mobidic-mpa-0.0.7.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.4

File hashes

Hashes for mobidic-mpa-0.0.7.tar.gz
Algorithm Hash digest
SHA256 8c24dc82f0cdb4ae97a31144341ede62c7f6d1b65ac62866c772ff92af67d3b5
MD5 e9e3dc3b45d4305d10785701578c83a0
BLAKE2b-256 28a9cc933831eaf90395d40c331dcc63fc6ef3f511063465e8c73a9cb88eb3c5

See more details on using hashes here.

File details

Details for the file mobidic_mpa-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: mobidic_mpa-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.4

File hashes

Hashes for mobidic_mpa-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e931d44bc88e884d5cb0dfee3b723a1f7891308d2514adc8858eb887e6e4e9db
MD5 0d7bcca60abf4cc74656dffac6e20216
BLAKE2b-256 d7527c577778aca214a53335cb42c6f0e43ca8432edada11adb3de0ba40c1da4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page