tincheck
Project description
Tincheck
Tincheck is a python package to calculate transcript Integrity Number (TIN) and Transcription overlap.
Transcripts with transcription overlap or uneven coverage can result in false positives in differential expression analysis. Transcript Integrity Number is a metric that calculates coverage evenness and can be used as a filtering criteria in RNA-Seq studies to improve the accuracy of results.
Transcript Integrity Number (TIN) calculation
TIN denotes how evenly a feature is covered by reads. By default, the script calculate the coverage evenness across a gene by considering the coverage across all exons. However, the script can calculate coverage evenness across a transcript or any other feature that is in the annotation file.
Inputs
- Alignment file in bam format
- Annotation file in GTFor GFF3 format
Output
Tab delimited textfile with TIN score calculated for each gene/transcript/any other feature specified in the input. An example is given below.
target_id eff_length S1_count S1_exp_tin S1_obs_tin
PF3D7_0102700 1683 670 100.0 70.9
PF3D7_0103700 1624 135 100.0 72.8
PF3D7_0107300 1581 4508 100.0 70.4
PF3D7_0107600 5702 4979 100.0 74.9
PF3D7_0107800 4424 924 100.0 78.8
How to install the script?
python setup.py install
It would be better to create a conda environment and install the script as shown below
conda create -n tincheck -y python=3.8
conda activate tincheck
Install additional requirements
conda install --file conda-requirements.txt
Finally, install the script in the conda environment as
python setup.py install
How to run the script?
The required inputs to the script is bam file(s) and annotation file in GTF or GFF format.
The annotation file should have a gene feature row present in it.
Given a bam file and an annotation file, the simplest way to run the script is
tincheck tin --ann data/ann.gtf data/sample.bam
How to calculate TIN across coding regions of a gene?
tincheck tin --ann data/ann.gtf --feat CDS data/sample.bam
How is tin calculated
Overlapping features specified by --feat
are merged together and coverage evenness is captured using Shannon's entropy formula (H).
This is then convered into TIN Score as
TIN = (100*exp(H))⁄length where H= Shannon's entropy formula.
Transcription overlaps
Transcription overlaps are flagged using overlap.py
script.
tincheck overlap --ann data/ann.gtf data/sample.bam >sample_overlap.txt
By default, genes are checked for overlap only if
gene-tin < tin-cutoff and gene-count > count-cutoff
ie, genes with enough counts but low tins are checked for transcription overlap
A gene is flagged to have a transcription overlap if either of its neighboring genes and the corresponding inter-genic region have a higher tin-score and read count than the gene of interest.
In stranded mode, if the count and tin values of neighboring genes and the intergenic region in the gene sense strand is comparable to the gene count and tin, then that gene is considered to have an overlap from neighboring transcripts.
More specifically in stranded mode, a gene is considered to have a neighboring transcript overlap, if either of the following conditions are satisfied.
1. left gene tin in gene sense strand >= gene tin and left intergenic tin in gene sense strand >=gene-tin
2. right gene tin in gene sense strand >= gene tin and right intergenic tin in gene sense strand >=gene-tin
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tincheck-0.1.0.tar.gz
.
File metadata
- Download URL: tincheck-0.1.0.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5aa89e13d5804a63f337f5eed4c8556103e8bf1815104f7aa378129723f521af |
|
MD5 | df83972a366d717db09a65ab90977ed6 |
|
BLAKE2b-256 | a32a40c795d590df75d781a8ff4d67ccac2f2c636138f34f32eff604208c8bca |
File details
Details for the file tincheck-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: tincheck-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51b4244f668397b49eaf08edcd157d2f3e8b1918ee13bdd1f009095a2780bcc3 |
|
MD5 | b2669c547468bd80aa1d6bfc20de207a |
|
BLAKE2b-256 | 4c9c9157dd388f4eec95ec31dad3735f550c89ae9398310642a31689c1e0e04d |