Skip to main content

Deep learning framework that decodes splicing across species

Project description

OpenSpliceAI

https://img.shields.io/badge/License-GPLv3-yellow.svg https://img.shields.io/badge/version-v.0.0.6-blue https://static.pepy.tech/personalized-badge/openspliceai?period=total&units=abbreviation&left_color=grey&right_color=blue&left_text=PyPi%20downloads https://img.shields.io/github/downloads/Kuanhao-Chao/OpenSpliceAI/total.svg?style=social&logo=github&label=Download https://img.shields.io/badge/platform-macOS_/Linux-green.svg


OpenSpliceAI is an open‐source, efficient, and modular framework for splice site prediction. It is a reimplementation and extension of SpliceAI (Jaganathan et al., 2019) built on the modern PyTorch framework. OpenSpliceAI provides researchers with a user‐friendly suite of tools for studying transcript splicing - from creating training datasets and training models to predicting splice sites and assessing the impact of genetic variants.


Key Features#

  • Modern, Retrainable Framework: Built on Python 3 and PyTorch, OpenSpliceAI improves the limitations of older TensorFlow/Keras implementations. Its modular design enables fast and efficient prediction, as well as easy retraining on species-specific data with just a few commands.

  • Updated and Cross-Species Models: OpenSpliceAI includes a pre-trained human model, OSAIMANE-10000nt, updated from GRCh37 to GRCh38 using the latest MANE annotations, along with models for mouse, thale cress (Arabidopsis), honey bee, and zebrafish. This versatility empowers researchers to study splicing across diverse species.

  • Variant Impact Prediction: OpenSpliceAI not only predicts splice sites but also assesses the impact of genetic variants (SNPs and INDELs) on splicing. Its variant subcommand calculates “delta” scores that quantify changes in splice site strength and predicts cryptic splice sites.

  • Efficiency and Scalability: Optimized for improved processing speeds, lower memory usage, and efficient GPU utilization, OpenSpliceAI can handle large genomic regions and whole-genome predictions on a single GPU.


Who Should Use OpenSpliceAI?#

  • Human Genomics Researchers: Use the newly retrained OpenSpliceAI model, OSAIMANE-10000nt, for highly accurate splice site predictions based on the latest human annotations.

  • Comparative and Non-Human Genomics: Whether you’re studying mouse, zebrafish, honey bee, or thale cress, OpenSpliceAI offers models pre-trained on multiple species — and the ability to train your own models — ensuring broad applicability.

  • Variant Analysts: If you need to predict how genetic variants affect splicing, OpenSpliceAI’s variant subcommand provides detailed delta scores and positional information to assess functional impacts.


What OpenSpliceAI Does#

  • Data Preprocessing (create-data): Converts genome FASTA and annotation (GFF/GTF) files into one-hot encoded datasets (HDF5 format) for training and testing.

  • Model Training (train): Trains deep residual convolutional neural networks on the preprocessed datasets. OpenSpliceAI supports training from scratch and employs adaptive learning rate schedulers and early stopping.

  • Transfer Learning (transfer): Fine-tunes a pre-trained human model for other species, reducing training time and improving performance on species with limited data.

  • Model Calibration (calibrate): Adjusts model output probabilities to better reflect true splice site likelihoods, enhancing prediction accuracy.

  • Prediction (predict): Uses trained models to generate splice site predictions from FASTA sequences, outputting BED files with donor and acceptor site coordinates.

  • Variant Analysis (variant): Annotates VCF files with delta scores and positions to evaluate the impact of genetic variants on splicing.


Cite Us#

If you use OpenSpliceAI in your research, please cite our work as well as the original SpliceAI paper:

Kuan-Hao Chao, Alan Mao, Anqi Liu, Mihaela Pertea, and Steven L. Salzberg. "OpenSpliceAI provides an efficient modular implementation of SpliceAI enabling easy retraining across nonhuman species" eLife 14:RP107454.

Kishore Jaganathan, Sofia Kyriazopoulou Panagiotopoulou, Jeremy F. McRae, Siavash Fazel Darbandi, David Knowles, Yang I. Li, Jack A. Kosmicki, Juan Arbelaez, Wenwu Cui, Grace B. Schwartz, Eric D. Chow, Efstathios Kanterakis, Hong Gao, Amirali Kia, Serafim Batzoglou, Stephan J. Sanders, and Kyle Kai-How Farh. "Predicting splicing from primary sequence with deep learning" Cell.


User Support & Contributors#

If you have questions, encounter issues, or would like to request a new feature, please use our GitHub issue tracker at: https://github.com/Kuanhao-Chao/OpenSpliceAI/issues

OpenSpliceAI was developed by Kuan-Hao Chao, Alan Mao, and collaborators at Johns Hopkins University. For further details on usage, methods, and performance, please refer to the full documentation and online methods sections.


Next Steps#

Check out the Installation Guide to get started with OpenSpliceAI. For a quick overview of the main commands and subcommands, see the Quick Start Guide.


Development & Testing#

Install the package with its development dependencies (pytest, pytest-cov, ruff, pre-commit) from a clone of the repository:

pip install -e '.[dev]'

Tests live under tests/ (unit / integration / regression) and run on CPU only:

pytest -m "not slow and not integration"   # fast inner loop
pytest                                      # full suite

Tests are tagged with markers (declared in pytest.ini) — integration, slow, gpu, and keras (auto-skipped when TensorFlow is absent) — selectable with -m, e.g. pytest -m integration.

Lint and pre-commit hooks (configured in ruff.toml and .pre-commit-config.yaml):

ruff check .            # lint (add --fix to auto-fix)
pre-commit install      # run ruff + checks on every commit

See the full documentation for the complete development and testing guide.


Table of Contents#






Johns Hopkins University

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openspliceai-0.0.6.tar.gz (4.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openspliceai-0.0.6-py3-none-any.whl (3.6 MB view details)

Uploaded Python 3

File details

Details for the file openspliceai-0.0.6.tar.gz.

File metadata

  • Download URL: openspliceai-0.0.6.tar.gz
  • Upload date:
  • Size: 4.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for openspliceai-0.0.6.tar.gz
Algorithm Hash digest
SHA256 872a840bb513143f4499fdee79eccaefe9a49237a9767759e38abe727ab38d9c
MD5 d4a3085973139248fc38ad59ff4ed260
BLAKE2b-256 29f9689a1012a7abde9e526c4f40967d35c92e832c66208792f6e389bc333d7a

See more details on using hashes here.

File details

Details for the file openspliceai-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: openspliceai-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for openspliceai-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f5388b177585d881add3a678578aed4e94bc663ef60ec81cf28a6d16ba307c0c
MD5 cce336df31c9985894724f798c6af849
BLAKE2b-256 9d518661556650ea4eb12702ca94c53507e509c6ca7c012cc700abeb2a4bd82e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page