Skip to main content

Deep learning framework that decodes splicing across species

Project description

OpenSpliceAI

https://img.shields.io/badge/License-GPLv3-yellow.svg https://img.shields.io/badge/version-v.0.0.7-blue https://static.pepy.tech/personalized-badge/openspliceai?period=total&units=abbreviation&left_color=grey&right_color=blue&left_text=PyPi%20downloads https://img.shields.io/github/downloads/Kuanhao-Chao/OpenSpliceAI/total.svg?style=social&logo=github&label=Download https://img.shields.io/badge/platform-macOS_/Linux-green.svg


OpenSpliceAI is an open‐source, efficient, and modular framework for splice site prediction. It is a reimplementation and extension of SpliceAI (Jaganathan et al., 2019) built on the modern PyTorch framework. OpenSpliceAI provides researchers with a user‐friendly suite of tools for studying transcript splicing - from creating training datasets and training models to predicting splice sites and assessing the impact of genetic variants.


Key Features#

  • Modern, Retrainable Framework: Built on Python 3 and PyTorch, OpenSpliceAI improves the limitations of older TensorFlow/Keras implementations. Its modular design enables fast and efficient prediction, as well as easy retraining on species-specific data with just a few commands.

  • Updated and Cross-Species Models: OpenSpliceAI includes a pre-trained human model, OSAIMANE-10000nt, updated from GRCh37 to GRCh38 using the latest MANE annotations, along with models for mouse, thale cress (Arabidopsis), honey bee, and zebrafish. This versatility empowers researchers to study splicing across diverse species.

  • Variant Impact Prediction: OpenSpliceAI not only predicts splice sites but also assesses the impact of genetic variants (SNPs and INDELs) on splicing. Its variant subcommand calculates “delta” scores that quantify changes in splice site strength and predicts cryptic splice sites.

  • Efficiency and Scalability: Optimized for improved processing speeds, lower memory usage, and efficient GPU utilization, OpenSpliceAI can handle large genomic regions and whole-genome predictions on a single GPU.


Who Should Use OpenSpliceAI?#

  • Human Genomics Researchers: Use the newly retrained OpenSpliceAI model, OSAIMANE-10000nt, for highly accurate splice site predictions based on the latest human annotations.

  • Comparative and Non-Human Genomics: Whether you’re studying mouse, zebrafish, honey bee, or thale cress, OpenSpliceAI offers models pre-trained on multiple species — and the ability to train your own models — ensuring broad applicability.

  • Variant Analysts: If you need to predict how genetic variants affect splicing, OpenSpliceAI’s variant subcommand provides detailed delta scores and positional information to assess functional impacts.


What OpenSpliceAI Does#

  • Data Preprocessing (create-data): Converts genome FASTA and annotation (GFF/GTF) files into one-hot encoded datasets (HDF5 format) for training and testing.

  • Model Training (train): Trains deep residual convolutional neural networks on the preprocessed datasets. OpenSpliceAI supports training from scratch and employs adaptive learning rate schedulers and early stopping.

  • Transfer Learning (transfer): Fine-tunes a pre-trained human model for other species, reducing training time and improving performance on species with limited data.

  • Model Calibration (calibrate): Adjusts model output probabilities to better reflect true splice site likelihoods, enhancing prediction accuracy.

  • Prediction (predict): Uses trained models to generate splice site predictions from FASTA sequences, outputting BED files with donor and acceptor site coordinates.

  • Variant Analysis (variant): Annotates VCF files with delta scores and positions to evaluate the impact of genetic variants on splicing.


Installation#

Install the latest release from PyPI:

pip install openspliceai

or from Bioconda (make sure the conda-forge and bioconda channels are enabled):

conda install -c conda-forge -c bioconda openspliceai

See the Installation Guide for GPU/CUDA setup and other options.


Cite Us#

If you use OpenSpliceAI in your research, please cite our work as well as the original SpliceAI paper:

Kuan-Hao Chao, Alan Mao, Anqi Liu, Mihaela Pertea, and Steven L. Salzberg. "OpenSpliceAI provides an efficient modular implementation of SpliceAI enabling easy retraining across nonhuman species" eLife 14:RP107454.

Kishore Jaganathan, Sofia Kyriazopoulou Panagiotopoulou, Jeremy F. McRae, Siavash Fazel Darbandi, David Knowles, Yang I. Li, Jack A. Kosmicki, Juan Arbelaez, Wenwu Cui, Grace B. Schwartz, Eric D. Chow, Efstathios Kanterakis, Hong Gao, Amirali Kia, Serafim Batzoglou, Stephan J. Sanders, and Kyle Kai-How Farh. "Predicting splicing from primary sequence with deep learning" Cell.


User Support & Contributors#

If you have questions, encounter issues, or would like to request a new feature, please use our GitHub issue tracker at: https://github.com/Kuanhao-Chao/OpenSpliceAI/issues

OpenSpliceAI was developed by Kuan-Hao Chao, Alan Mao, and collaborators at Johns Hopkins University. For further details on usage, methods, and performance, please refer to the full documentation and online methods sections.


Next Steps#

Check out the Installation Guide to get started with OpenSpliceAI. For a quick overview of the main commands and subcommands, see the Quick Start Guide.


Development & Testing#

Install the package with its development dependencies (pytest, pytest-cov, ruff, pre-commit) from a clone of the repository:

pip install -e '.[dev]'

Tests live under tests/ (unit / integration / regression) and run on CPU only:

pytest -m "not slow and not integration"   # fast inner loop
pytest                                      # full suite

Tests are tagged with markers (declared in pytest.ini) — integration, slow, gpu, and keras (auto-skipped when TensorFlow is absent) — selectable with -m, e.g. pytest -m integration.

Lint and pre-commit hooks (configured in ruff.toml and .pre-commit-config.yaml):

ruff check .            # lint (add --fix to auto-fix)
pre-commit install      # run ruff + checks on every commit

See the full documentation for the complete development and testing guide.


Table of Contents#






Johns Hopkins University

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openspliceai-0.0.7.tar.gz (4.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openspliceai-0.0.7-py3-none-any.whl (3.6 MB view details)

Uploaded Python 3

File details

Details for the file openspliceai-0.0.7.tar.gz.

File metadata

  • Download URL: openspliceai-0.0.7.tar.gz
  • Upload date:
  • Size: 4.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for openspliceai-0.0.7.tar.gz
Algorithm Hash digest
SHA256 d1a37dbdd91595b631e571ef6f611f763e5b598ceae506669174fc54a8e6b436
MD5 9f976bc67840fc066eb9a57a9e4f477d
BLAKE2b-256 36f2a51cc6fb5a6aa025444d470056b57f69fa44c9f0843c3e918c4c2ac526ad

See more details on using hashes here.

File details

Details for the file openspliceai-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: openspliceai-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for openspliceai-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 0bf645a1589948ed350e2608f7c4ea0f7528769fb0bd7231e05bab7d2a14aef3
MD5 2e61a638310d3dce9c2c63217a977640
BLAKE2b-256 608040fe438b29e2c8f4a0e2e9e1a52e2c37fdabff25cb67e64e25607687bc82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page