Predict splicing variant effect from VCF

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

MMSplice & MTSplice

Predict (tissue-specific) splicing variant effect from VCF. MTSplice is integrated into MMSplice with the same API.

Paper: Cheng et al. https://doi.org/10.1101/438986, https://www.biorxiv.org/content/10.1101/2020.06.07.138453v1

MMSplice MTSplice

Installation

External dependencies:

pip install cyvcf2 cython

Conda installation is recommended:

conda install cyvcf2 cython -y

pip install mmsplice

Run MMSplice Online

You can run mmsplice with following google colab notebooks online:

run on vcf file

Preparation

1. Prepare annotation (gtf) file

Standard human gene annotation file in GTF format can be downloaded from ensembl or gencode. MMSplice can work directly with those files, however, some filtering is higly recommended.

Filter for protein coding genes.

2. Prepare variant (VCF) file

A correctly formatted VCF file with work with MMSplice, however the following steps will make it less prone to false positives:

Quality filtering. Low quality variants leads to unreliable predictions.
Avoid presenting multiple variants in one line by splitting them into multiple lines. Example code to do it:
```
bcftools norm -m-both -o out.vcf in.vcf.gz
```
Left-normalization. For instance, GGCA-->GG is not left-normalized while GCA-->G is. Details for unified representation of genetic variants see Tan et al.
```
bcftools norm -f reference.fasta -o out.vcf in.vcf
```

3. Prepare reference genome (fasta) file

Human reference fasta file can be downloaded from ensembl/gencode. Make sure the chromosome name matches with GTF annotation file you use.

Example code

Check notebooks/example.ipynb

To score variants (including indels), we suggest to use primarily the deltaLogitPSI predictions, which is the default output. The differential splicing efficiency (dse) model was trained from MMSplice modules and exonic variants from MaPSy, thus only the predictions for exonic variants are calibrated.

MTSplice To predict tissue-specific variant effect with MTSplice, specify tissue_specific=True in SplicingVCFDataloader.

# Import
from mmsplice.vcf_dataloader import SplicingVCFDataloader
from mmsplice import MMSplice, predict_save, predict_all_table
from mmsplice.utils import max_varEff

# example files
gtf = 'tests/data/test.gtf'
vcf = 'tests/data/test.vcf.gz'
fasta = 'tests/data/hg19.nochr.chr17.fa'
csv = 'pred.csv'

Dataloader to load variants from vcf

dl = SplicingVCFDataloader(gtf, fasta, vcf, tissue_specific=False)

To predict tissue-specific effect, in the dataloader use tissue_specific=True in the dataloader instead

dl = SplicingVCFDataloader(gtf, fasta, vcf, tissue_specific=True)

Run prediction with default MMSplice parameters

# Specify model
model = MMSplice()

# Or predict and return as df
predictions = predict_all_table(model, dl, pathogenicity=True, splicing_efficiency=True)

To predict variant effect on scale instead of . This option only works with tissue specific predictions dl = SplicingVCFDataloader(..., tissue_specific=True):

# Or predict and return as df
predictions = predict_all_table(model, dl, natural_scale=True)

One variant might map to multiple exons. In the end we summarize the effect of as the maximum across all exons.

# Summerize with maximum effect size
predictionsMax = max_varEff(predictions)

Output

Output of MMSplice is an tabular data which contains following described columns:

ID: id string of the variant
delta_logit_psi: The main score is predicted by MMSplice, which shows the effect of the variant on the inclusion level (PSI percent spliced in) of the exon. The score is on a logit scale. If the score is positive, it shows that variant leads higher inclusion rate for the exon. If the score is negative, it shows that variant leads higher exclusion rate for the exon. If delta_logit_psi is bigger than 2 or smaller than -2, the effect of variant can be considered strong.
exons: Genetics location of exon whose inclusion rate is effected by variant
exon_id: Genetic id of exon whose inclusion rate is effected by variant
gene_id: Genetic id of the gene which the exon belongs to.
gene_name: Name of the gene which the exon belongs to.
transcript_id: Genetic id of the transcript which the exon belongs to.
ref_acceptorIntron: acceptor intron score of the reference sequence
ref_acceptor: acceptor score of the reference sequence
ref_exon: exon score of the reference sequence
ref_donor: donor score of the reference sequence
ref_donorIntron: donor intron score of the reference sequence
alt_acceptorIntron: acceptor intron score of variant sequence
alt_acceptor: acceptor score of the sequence with variant
alt_exon: exon score of the sequence with variant
alt_donor: donor score of the sequence with variant
alt_donorIntron: donor intron score of the sequence with variant
pathogenicity: Potential pathogenic effect of the variant.
efficiency: The effect of the variant on the splicing efficiency of the exon.

VEP Plugin

The VEP plugin wraps the prediction function from mmsplice python package. Please check documentation of vep plugin under VEP_plugin/README.md.

======= History

1.0.0 (2019-07-23)

Dependicies fixed #16
Valide gtf, fasta, vcf chrom annotation #15
Ship mmsplice with prebuild exon set. #12
Faster variant overlapping with pyranges #11
Batch prediction with masking update in exon module

0.1.0 (2018-07-17)

First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

2.4.0

May 23, 2023

2.3.0

Dec 25, 2021

2.2.0

Feb 20, 2021

2.1.1

Sep 7, 2020

2.1.0

Sep 6, 2020

2.0.0

Jun 6, 2020

1.0.3

Jan 16, 2020

1.0.2

Nov 4, 2019

1.0.1

Jul 23, 2019

1.0.0

Jul 23, 2019

0.2.7

Nov 23, 2018

0.2.6

Oct 31, 2018

0.2.5

Oct 17, 2018

0.2.4

Oct 10, 2018

0.2.2

Oct 7, 2018

0.2.1

Oct 5, 2018

0.2.0

Sep 19, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmsplice-2.4.0.tar.gz (62.4 MB view details)

Uploaded May 23, 2023 Source

Built Distribution

mmsplice-2.4.0-py2.py3-none-any.whl (62.4 MB view details)

Uploaded May 23, 2023 Python 2Python 3

File details

Details for the file mmsplice-2.4.0.tar.gz.

File metadata

Download URL: mmsplice-2.4.0.tar.gz
Upload date: May 23, 2023
Size: 62.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.13

File hashes

Hashes for mmsplice-2.4.0.tar.gz
Algorithm	Hash digest
SHA256	`e467f5a96485afe1fbd01ab91dc22261df9c256fb67c4bc0d8fb38017b99c0ed`
MD5	`d6cf4da2f6a3f9a93fa2c584455dedb8`
BLAKE2b-256	`32bd3154f9e2979b2e7a35a2f3bcf071c12c9acced59b6d25415dcb9a9c6efc3`

See more details on using hashes here.

File details

Details for the file mmsplice-2.4.0-py2.py3-none-any.whl.

File metadata

Download URL: mmsplice-2.4.0-py2.py3-none-any.whl
Upload date: May 23, 2023
Size: 62.4 MB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.13

File hashes

Hashes for mmsplice-2.4.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`d70a3a281b9433b89e6a8612cc2ca2611f54c35b90f46e16772b2868ec94b60a`
MD5	`d2d23cbe10c30a54d8ed2446a9b1042e`
BLAKE2b-256	`14aa2bd61394b483d56b0d9011921e785d019848216336581b3144a21faff124`

See more details on using hashes here.

mmsplice 2.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MMSplice & MTSplice

Installation

Run MMSplice Online

Preparation

1. Prepare annotation (gtf) file

2. Prepare variant (VCF) file

3. Prepare reference genome (fasta) file

Example code

Output

VEP Plugin

======= History

1.0.0 (2019-07-23)

0.1.0 (2018-07-17)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes