Skip to main content

Deep learning framework that decodes splicing across species

Project description

https://img.shields.io/badge/License-GPLv3-yellow.svg https://img.shields.io/badge/version-v.0.0.4-blue https://static.pepy.tech/personalized-badge/openspliceai?period=total&units=abbreviation&left_color=grey&right_color=blue&left_text=PyPi%20downloads https://img.shields.io/github/downloads/Kuanhao-Chao/OpenSpliceAI/total.svg?style=social&logo=github&label=Download https://img.shields.io/badge/platform-macOS_/Linux-green.svg


OpenSpliceAI is an open‐source, efficient, and modular framework for splice site prediction. It is a reimplementation and extension of SpliceAI (Jaganathan et al., 2019) built on the modern PyTorch framework. OpenSpliceAI provides researchers with a user‐friendly suite of tools for studying transcript splicing - from creating training datasets and training models to predicting splice sites and assessing the impact of genetic variants.


Key Features#

  • Modern, Retrainable Framework: Built on Python 3 and PyTorch, OpenSpliceAI improves the limitations of older TensorFlow/Keras implementations. Its modular design enables fast and efficient prediction, as well as easy retraining on species-specific data with just a few commands.

  • Updated and Cross-Species Models: OpenSpliceAI includes a pre-trained human model, OSAIMANE-10000nt, updated from GRCh37 to GRCh38 using the latest MANE annotations, along with models for mouse, thale cress (Arabidopsis), honey bee, and zebrafish. This versatility empowers researchers to study splicing across diverse species.

  • Variant Impact Prediction: OpenSpliceAI not only predicts splice sites but also assesses the impact of genetic variants (SNPs and INDELs) on splicing. Its variant subcommand calculates “delta” scores that quantify changes in splice site strength and predicts cryptic splice sites.

  • Efficiency and Scalability: Optimized for improved processing speeds, lower memory usage, and efficient GPU utilization, OpenSpliceAI can handle large genomic regions and whole-genome predictions on a single GPU.


Who Should Use OpenSpliceAI?#

  • Human Genomics Researchers: Use the newly retrained OpenSpliceAI model, OSAIMANE-10000nt, for highly accurate splice site predictions based on the latest human annotations.

  • Comparative and Non-Human Genomics: Whether you’re studying mouse, zebrafish, honey bee, or thale cress, OpenSpliceAI offers models pre-trained on multiple species — and the ability to train your own models — ensuring broad applicability.

  • Variant Analysts: If you need to predict how genetic variants affect splicing, OpenSpliceAI’s variant subcommand provides detailed delta scores and positional information to assess functional impacts.


What OpenSpliceAI Does#

  • Data Preprocessing (create-data): Converts genome FASTA and annotation (GFF/GTF) files into one-hot encoded datasets (HDF5 format) for training and testing.

  • Model Training (train): Trains deep residual convolutional neural networks on the preprocessed datasets. OpenSpliceAI supports training from scratch and employs adaptive learning rate schedulers and early stopping.

  • Transfer Learning (transfer): Fine-tunes a pre-trained human model for other species, reducing training time and improving performance on species with limited data.

  • Model Calibration (calibrate): Adjusts model output probabilities to better reflect true splice site likelihoods, enhancing prediction accuracy.

  • Prediction (predict): Uses trained models to generate splice site predictions from FASTA sequences, outputting BED files with donor and acceptor site coordinates.

  • Variant Analysis (variant): Annotates VCF files with delta scores and positions to evaluate the impact of genetic variants on splicing.


Cite Us#

If you use OpenSpliceAI in your research, please cite our work as well as the original SpliceAI paper:

Kuan-Hao Chao, Alan Mao, Anqi Liu, Mihaela Pertea, and Steven L. Salzberg. "OpenSpliceAI provides an efficient modular implementation of SpliceAI enabling easy retraining across nonhuman species" eLife 14:RP107454.

Kishore Jaganathan, Sofia Kyriazopoulou Panagiotopoulou, Jeremy F. McRae, Siavash Fazel Darbandi, David Knowles, Yang I. Li, Jack A. Kosmicki, Juan Arbelaez, Wenwu Cui, Grace B. Schwartz, Eric D. Chow, Efstathios Kanterakis, Hong Gao, Amirali Kia, Serafim Batzoglou, Stephan J. Sanders, and Kyle Kai-How Farh. "Predicting splicing from primary sequence with deep learning" Cell.


User Support & Contributors#

If you have questions, encounter issues, or would like to request a new feature, please use our GitHub issue tracker at: https://github.com/Kuanhao-Chao/OpenSpliceAI/issues

OpenSpliceAI was developed by Kuan-Hao Chao, Alan Mao, and collaborators at Johns Hopkins University. For further details on usage, methods, and performance, please refer to the full documentation and online methods sections.


Next Steps#

Check out the Installation Guide to get started with OpenSpliceAI. For a quick overview of the main commands and subcommands, see the Quick Start Guide.


Table of Contents#






Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openspliceai-0.0.5.tar.gz (735.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openspliceai-0.0.5-py3-none-any.whl (83.5 kB view details)

Uploaded Python 3

File details

Details for the file openspliceai-0.0.5.tar.gz.

File metadata

  • Download URL: openspliceai-0.0.5.tar.gz
  • Upload date:
  • Size: 735.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for openspliceai-0.0.5.tar.gz
Algorithm Hash digest
SHA256 eea7a6d08160a9e119f1239c45aa69f0448d5498614634567737e61e69534c65
MD5 8f73f1fe5b207500f56a33296eef8fe7
BLAKE2b-256 807bfa0a41f96f4e7c8dd51793b74d4227b54a0540ab03b3908dca143c7bd796

See more details on using hashes here.

File details

Details for the file openspliceai-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: openspliceai-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 83.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for openspliceai-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8715e07f31be0d4b94ca47292f509d7796bb388fa090c975502c00ea74337f46
MD5 f8e2d04f38847d8d6a0bcce14fffc307
BLAKE2b-256 9124dd3e46b2c8dc63c874d06ad50f2ac4452c6c48b49dea261d28a2e4028c87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page