A splice site preditction toolkit
Project description
The SpliceAI-toolkit is a flexible framework designed for easy retraining of the SpliceAI model with new datasets. It comes with models pre-trained on various species, including humans (MANE database), mice, thale cress (Arabidopsis), honey bees, and zebrafish. Additionally, the SpliceAI-toolkit is capable of processing genetic variants in VCF format to predict their impact on splicing.
Why SpliceAI-toolkit❓#
Easy-to-retrain framework: Transitioning from the outdated Python 2.7, along with older versions of TensorFlow and Keras, the SpliceAI-toolkit is built on Python 3.7 and leverages the powerful PyTorch library. This simplifies the retraining process significantly. Say goodbye to compatibility issues and hello to efficiency — retrain your models with just two simple commands.
Pretrained on new dataset: SpliceAI is great, but SpliceAI-toolkit makes it even better! Pretrained with the latest MANE annotations (released in 2022), it ensures your research is powered by the most accurate and up-to-date genomic information available.
Pretrained on various species: Concerned that the SpliceAI model does not generalize to your study species because you are not studying humans? No problem! The SpliceAI-toolkit is released with models pretrained on various species, including human MANE, mouse, thale cress, honey bee, and zebrafish.
Predict the impact of genetic variants on splicing: Similar to SpliceAI, the SpliceAI-toolkit can take genetic variants in VCF format and predict the impact of these variants on splicing with any of the pretrained models.
SpliceAI-toolkit is open-source, free, and combines the ease of Python with the power of PyTorch for accurate splicing predictions.
Who is it for❓#
If you want to study splicing in humans, just use the newly pretrained human SpliceAI-MANE! Better annotation, better results!
If you want to do splicing research in other species, the SpliceAI-toolkit has you covered! It comes with models pretrained on various species! And you can easily train your own SpliceAI with your own genome & annotation data.
If you are interested in predicting the impact of genetic variants on splicing, SpliceAI-toolkit is the perfect tool for you!
What does SpliceAI-toolkit do❓#
The spliceai-toolkit
create-data
command takes a genome and annotation file as input and generates a dataset for training and testing your SpliceAI model.The spliceai-toolkit
train
command uses the created dataset to train your own SpliceAI model.The spliceai-toolkit
predict
command takes a random gene sequence and predicts the score of each position, determining whether it is a donor, acceptor, or neither.The spliceai-toolkit
variant
command takes a VCF file and predicts the impact of genetic variants on splicing.
Cite us#
Chao, Kua-Hao, Alan Mao, Anqi Liu, Mihaela Pertea, and Steven L. Salzberg. "SpliceAI-toolkit" bioRxiv.
Jaganathan, K., Panagiotopoulou, S.K., McRae, J.F., Darbandi, S.F., Knowles, D., Li, Y.I., Kosmicki, J.A., Arbelaez, J., Cui, W., Schwartz, G.B. and Chow, E.D."Predicting splicing from primary sequence with deep learning" Cell.
User support#
Please go through the documentation below first. If you have questions about using the package, a bug report, or a feature request, please use the GitHub issue tracker here:
https://github.com/Kuanhao-Chao/spliceAI-toolkit/issues
Key contributors#
SpliceAI-toolkit was designed and developed by Kuan-Hao Chao. This documentation was written by Kuan-Hao Chao. The LiftOn logo was designed by Kuan-Hao Chao.
Table of contents#
Examples
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spliceai_toolkit-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2608d0b17e19588c8a52fa99259cacdc4edddb753bd5a99fa6496df41b0dcf55 |
|
MD5 | 63308d2bb3a60bc2c4f1e9e5efeb63b0 |
|
BLAKE2b-256 | a984aa5aecc4f2b874e20b18a8d0bbfe4622fa87384f9a366e9ba4da5b324cbd |