ScanExitronLR: a lightweight tool for the characterization and quantification of exitrons in long read RNA-seq data

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

ScanExitronLR

A computational workflow for exitron splicing identification in long-read RNA-seq data.

Installation

The recommended way to install ScanExitronLR is using pip:

pip install scanexitronlr

This will pull and install the latest stable release from PyPi. ScanExitronLR requires Python 3.7+. Thus you need to make sure that the pip is for python3 using e.g. which pip or using:

pip3 install scanexitronlr

To test your installation, run:

selr

You should see the version number, usage instructions and commands. (If you prefer a more descriptive command scanexitronlr also works.)

Usage

ScanExitronLR has two modes, extract and annotate. Use extract when calling exitrons in an alignment and annotate when annotating exitrons called using extract.

Extract

extract requires three inputs: (1) a BAM alignment file of long-reads containing the ts:A flag (provided by default by Minimap2), (2) a reference genome and (3) a sorted and bgzip'd gene annotation file. Currently only gtf files are supported.

To sort your gtf file, use the command:

awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k4,4n -k5,5n"}' in.gtf > out_sorted.gtf

To bgzip your gene annotation file, use:

bgzip in.gtf

bgzip is part of the htslib, which you most likely already have installed if you care about BAM files. Otherwise, you can get it here. It is important to note that if you download the latest GENCODE release it will be in the gzip form, not bgzip. You will need to run gzip -d and then bgzip.

ScanExitronLR utilizes the gffutils package, which requires an SQL-lite database of the annotation file. You do not need to provide such a file, as ScanExitronLR will create one if one is not found, though it may take ~20 minutes to build. It will be saved as your_annotation.gtf.gz.db in the same location as your annotation and will not need to be built again. In addition, we require a tabix index, and it will be created if one is not found. This should only take seconds. It will be saved as your_annotation.gtf.gz.tbi.

Thus, if you are running ScanExitronLR on a shared server and using a shared annotation database, you may not have writing privelages in the shared space. You will need to copy your annotation file to your local directory.

We have provided fully processed GTF files for Gencode V39 and TAIR10 for your convience.

To run ScanExitronLR in extract mode, simply run

selr extract ...

with the following parameters:

parameters
-i STR	REQUIRED: Input BAM file
-g STR	REQUIRED: Input genome reference (e.g. hg38.fa)
-r STR	REQUIRED: Input sorted and bgzip'd annotation reference (e.g. gencode_v38_sorted.gtf.gz).
-o STR	Output filename (e.g. bam_filename.exitron <- this is default)
-a/--ao INT	Reports only exitrons with AO of INT or above (default: 2).
-p/--pso	FLOAT
-c/--cores INT	Use INT cores (default: 1). Use as many as you can spare. Even large BAM files only use 4GB total memory on 10 cores.
-cp/--cluster-purity FLOAT	Reports only exitrons with cluster purity of FLOAT or above (default: 0).
-m/--mapq INT	Only considers reads with mapq score >= INT (default: 50)
-j/--jitter INT	Treat splice-sites with fuzzy boundry of +/- INT (default: 10).
-sr	Use this flag to skip the realignment step.
-sa	Use this flag to save isoform abundance files for downstream differential isoform usage analysis with LIQA. Files are of the form: input.isoform.exitrons, input.isoform.normals

Annotate

To run ScanExitronLR in annotate mode, simply run

selr annotate ...

with the following parameters:

parameters
-i STR	REQUIRED: Input exitron file, generated from selr extract
-g STR	REQUIRED: Input genome reference (e.g. hg38.fa)
-r STR	REQUIRED: Input sorted and gzip'd annotation reference (e.g. gencode_v38_sorted.gtf.gz).
-o STR	Output filename (e.g. bam_filename.exitron.annotation <- this is default)
-b/--bam-file STR	If specified, annotation includes read supported NMD status directly from alignments.
-arabidopsis	Use this flag if using alignments from Arabidopsis. See github page for annotation file/genome assumptions.

The output is a tab-separated file.

Example

See here for an example.

Contact

Please feel free to post any issues here on github.

Citation

TBD

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.9

Sep 9, 2022

1.1.8

Mar 28, 2022

1.1.7

Mar 28, 2022

This version

1.1.6

Mar 27, 2022

1.1.5

Mar 25, 2022

1.0 yanked

Mar 22, 2022

Reason this release was yanked:

Broken version

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scanexitronlr-1.1.6.tar.gz (1.8 MB view hashes)

Uploaded Mar 27, 2022 Source

Hashes for scanexitronlr-1.1.6.tar.gz

Hashes for scanexitronlr-1.1.6.tar.gz
Algorithm	Hash digest
SHA256	`c3dc770b417db9303f5931c84debaf02df0f74856ea4c858c531e01d22e4eeee`
MD5	`92dfee6eb4f2fa3c5eedb4d6731ce46c`
BLAKE2b-256	`beb08ce87675091d874cbff38aae5b8a620abfa43f15114fe8a8b02341d9d1e2`