Splice junction scoring tool
Project description
Splam is a splice junction recognition model based on a deep residual convolutional neural network that offers fast and precise assessment of splice junctions. It was trained on combined donor-acceptor pairs and focuses on a narrow window of 400 base pairs surrounding each splice site, inspired by the understanding that the splicing process primarily depends on signals within this region.
Why Splam❓#
- We need a tool to evaluate splice junctions & spliced alignments. Thousands of RNA-Seq datasets are generated every day, but there are no tools available for cleaning up spurious spliced alignments in these data. Splam addresses this problem!
- Splam-cleaned alignments lead to improved transcript assembly, which, in turn, may enhance all downstream RNA-Seq analyses, including transcript quantification, differential gene expression analysis, and more.
Who is it for❓#
If you are (1) doing RNA-Seq data analysis or (2) seeking a trustworthy way to evaluate splice junctions (introns), then Splam is the tool that you are looking for!
What does Splam do❓#
There are two main use case scenarios:
-
Improving your alignment file. Splam evaluates the quality of spliced alignments and removes those containing spurious splice junctions. This significantly enhances the quality of downstream transcriptome assemblies [Link].
-
Evaluating the quality of introns in your annotation file or assembled transcripts [Link].
Documentation#
📒 The full user manual is available here
Table of contents#
Installation#
Splam is on PyPi. This is the easiest installation approach. Check out all the releases here.
$ pip install splam
You can also install Splam from source
$ git clone https://github.com/Kuanhao-Chao/splam --recursive
$ cd splam/src/
$ python setup.py install
Quick Start#
Running Splam is simple. It only requires three lines of code!
See these examples on Google Colab:
Example 1: clean up alignment files (BAM
)
$ cd test
# Step 1: extract splice junctions in the alignment file
$ splam extract -P SRR1352129_chr9_sub.bam -o tmp_out_alignment
# Step 2: score all the extracted splice junctions
$ splam score -G chr9_subset.fa -m ../model/splam_script.pt -o tmp_out_alignment tmp_out_alignment/junction.bed
#Step 3: output a cleaned and sorted alignment file
$ splam clean -o tmp_out_alignment
Example 2: evaluate annotation files / assembled transcripts (GFF
)
$ cd test
# Step 1: extract introns in the annotation
$ splam extract refseq_40_GRCh38.p14_chr_fixed.gff -o tmp_out_annotation
# Step 2: score introns in the annotation
$ splam score -G chr9_subset.fa -m ../model/splam_script.pt -o tmp_out_annotation tmp_out_annotation/junction.bed
#Step 3: output statistics of each transcript
$ splam clean -o tmp_out_annotation
Example 3: evaluate mouse annotation files (GFF
)
$ cd test
# Step 1: extract introns in the annotation
$ splam extract mouse_chr19.gff -o tmp_out_generalization
# Step 2: score introns in the annotation
$ splam score -A GRCm39_assembly_report.txt -G mouse_chr19.fa -m ../model/splam_script.pt -o tmp_out_generalization tmp_out_generalization/junction.bed
# Step 3: output statistics of each transcript
$ splam clean -o tmp_out_generalization
Scripts for Splam model training & analysis#
All the scripts for Splam training and data analysis are in this GitHub repository.
Citation#
Kuan-Hao Chao*, Alan Mao, Steven L Salzberg, Mihaela Pertea*, "Splam: a deep-learning-based splice site predictor that improves spliced alignments ", bioRxiv 2023.07.27.550754, doi: https://doi.org/10.1101/2023.07.27.550754, 2023
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.