TENNIS is an evolution-based model to predict unannotated isoforms and refine existing transcriptome annotations
Project description
TENNIS 🎾: Transcript EvolutioN for New Isoform Splicing
TENNIS is an evolution-based model to predict unannotated isoforms and refine existing transcriptome annotations without requiring additional data.
Installation
Prerequisites
- Python >= 3.7
- PySAT
Installation
The only dependency of TENNIS is PySAT, which can be installed with pip
. TENNIS can be installed by directly cloning this repository.
# install PySAT
pip install python-sat[aiger,approxmc,cryptosat,pblib]
# install TENNIS
git clone https://github.com/Shao-Group/TENNIS
cd TENNIS
chmod +x src/tennis
This repository also modified and re-distributes GTF.py codes (retrieved from here) developed by Kamil Slowikowski. Users don't have to re-download it.
Test and Example
# display help message
./src/tennis -h
# run TENNIS on an example dataset
mkdir test
cd test
./src/tennis -o tennis_example ../example/example.gtf
Usage
If installed with conda or pip, tennis
executable should be ready to use in $PATH
.
If installed manually, the tennis
executable is in ./src/
dir.
tennis [options] -o <output_prefix> <gtf_file>
The program outputs two files: output_prefix.stats
and output_prefix.pred.gtf
.
More about the output format is available here.
Required positional arguments:
gtf_file
: str
Input GTF file in standard format containing transcript annotations.
Optional arguments:
-h
, --help
-o
, --output_prefix
: str
Default: "tennis"
-p
, --PctIn_threshold
: float
A threshold in range [0, 1]. Predicted isoforms with PctIn value lower than this threshold will be filtered out. If -p 0.0
, all isoforms are retained.
Default: 0.5
-x
, --exclude_group_size
: int
Skip analysis of transcript groups that have more isoforms than this threshold.
Default: 100
-m
, --max_novel_isoform
: int
Maximum number of novel isoforms to predict per transcript group.
Default: 4
--time_out
: int
Time limit in seconds for each SAT solver instance.
Default: 900 (15 minutes)
Output Files
output_prefix.stats
:
Statistical summary. T1 (T2, T3, ...) is the collection of transcript groups that need 1 (2, 3, ..) novel isoforms to satisfy the evolution model.
output_prefix.pred.gtf
:
GTF format file with predicted novel isoforms.
More about the output format is available here.
Contributing
For bug reports or feature requests, please open an issue on the GitHub repository here.
License & Citation
TENNIS is freely available under BSD 3-Clause License.
Copyright (c) 2024, Xiaofei Carl Zang, Ke Chen, Mingfu Shao, and The Pennsylvania State University.
The preprint of TENNIS is available on bioRxiv here.
@article {TENNIS,
author = {Zang, Xiaofei Carl and Chen, Ke and Khan, Irtesam Mahmud and Shao, Mingfu},
title = {Augmenting Transcriptome Annotations through the Lens of Splicing Evolution},
year = {2024},
doi = {10.1101/2024.11.04.621892},
journal = {bioRxiv}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tennis_transcriptome-0.0.1.tar.gz
.
File metadata
- Download URL: tennis_transcriptome-0.0.1.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76b8f9df942904f73da0db584c9404c288d5b445e341dbecd514b466a3d774f3 |
|
MD5 | 777d80695e4d37405d1c026bfc468db4 |
|
BLAKE2b-256 | 142ecab99d9876601e044bff41662d78f095ed79ece768bf518d063a095a5b20 |
File details
Details for the file tennis_transcriptome-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: tennis_transcriptome-0.0.1-py3-none-any.whl
- Upload date:
- Size: 31.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e059f8b5f85cd0cf0ef82e8ca333cb2d47bd84936cf85fcd9b3751de9dbc293 |
|
MD5 | d03362822cf32e3ebf46d0ef968e8eb8 |
|
BLAKE2b-256 | ba86bc6f5f03bc9e853118c34ceb99fd327b4b55fdfabceddfe83b9d12937641 |