Skip to main content

Cryptic Exon finder and splicing quantification

Project description

CryEx

(This repository is private for now) Paper Link:

CryEx is a Python-based pipeline designed to identify and quantify cryptic exons from RNA-Seq datasets. The pipeline leverages StringTie for transcript assembly and detects cryptic exons.

Highlights:

● A user-friendly python package for CryEx protocol to construct a comprehensive splicing landscape

● Identification and quantification of cryptic exons and annotated exons using RNA-Seq data

● Customizable filtering parameters to enable differential splicing analysis at different resolutions

● A framework for implementing data preprocessing, analysis, customization and visualization

CryEx Logo

Table of Contents 📚

Installation 🔧

Installation from the GitHub repository:

git clone https://github.com/giovanniquinones/CryEx
cd CryEx

# create a virtual environment from yaml file
conda env create -f cryex_env.yaml
conda activate CryEx_env

# install the package
pip install .
export PATH=/path/to/bin:$PATH 

Installation from PyPI:

pip install CryEx.v2

It is recommended to create a virtual environment before using pip install to obtain the dependencies.

Dependencies 📦

  • Python 3.8+
  • stringtie
  • multiprocess
  • numpy
  • pandas
  • pysam
  • subprocess

Usage 🚀

After installation, you can use the CryEx command line tools. Below is an example of how to run the pipeline:

# Check if installed successfully
CryEx_stringtie --help 

# Identify cryptic and annotated exons
CryEx_stringtie -f ${FOFN.tsv} -o ${EXONS.GTF}

# Calculate splice junction usage
CryEx_junctions -f ${FOFN.tsv} -o ${JXN.BED}

# Calculate PSI
CryEx_psi_calculator -f ${FOFN.tsv} -e ${EXONS.GTF} -j ${JXN.BED.GZ} -o {PSI.TSV} 

# Calculate diffential splicing
CryEx_diff -f ${FOFN.tsv} -p {PSI.TSV} -o {DIFF.tsv}

An example of this pipeline can be found in the test_data directory of this repository.

Input 📥

FOFN should be tab separated and have the following columns:

SAMPLE	    BAM	                    STAR_SJ_OUT	                    GROUP
sample1    /path/to/sample1.bam     /path/to/sample1.SJ.out.tab     CTRL
sample2    /path/to/sample2.bam     /path/to/sample2.SJ.out.tab     KD
sample3    /path/to/sample3.bam     /path/to/sample3.SJ.out.tab     CTRL
sample4	   /path/to/sample4.bam     /path/to/sample4.SJ.out.tab     KD

If STAR_SJ_OUT is not provided, fill in with na. For differential splicing, Cryex will use the 'GROUP' column.

Output 📤

  • CryEx_stringtie will output a standard GTF file with the identified cryptic and annotated exons.
  • CryEx_junctions will output a BED file with the splice junctions and their usage.
chr21	9826943	    9826984	    hte1,hte2,hte3,hte4     10,1,7,11	+
chr21	9827330	    9874067	    hte1,hte2,hte3,hte4     1,0,0,0     +
chr21	9907492	    9908277	    hte1,hte2,hte3,hte4     63,47,35,49	-
chr21	9907462	    9909046	    hte1,hte2,hte3,hte4     1,0,0,0     -
chr21	9908432	    9909046	    hte1,hte2,hte3,hte4     12,19,11,8	-
  • CryEx_psi_calculator will output a TSV file with the PSI values for each cryptic exon.
exon_type       chrom   exon_3ss    exon_5ss    strand  inclusion_n     exc_5ss exc_3ss exclusion_n SAMPLE  PSI
first_exon      chr21   9907191     9907492     -       97              9896772 9966321 1           r2      0.96
first_exon      chr21   9907191     9907492     -       67              9896772 9966321 0           r3      1.0
first_exon      chr21   9907191     9907492     -       99              9896772 9966321 0           r4      1.0
first_exon      chr21   9907191     9907492     -       97              9896772 9966321 3           r2      0.92
first_exon      chr21   9907191     9907492     -       67              9896772 9966321 0           r3      1.0
first_exon      chr21   9907191     9907492     -       99              9896772 9966321 0           r4      1.0
  • CryEx_diff will output a TSV file with the differential splicing results.
exon_type	exon_coords	                flanking_jxns	    LLR     Pvalue	    Sig	    DeltaPSI    DIFF    PSIGroup1	PSIGroup2	CovGroup1	CovGroup2	Pvalue_Adj
last_exon	chr21:45458057-45462429:+	45457808,45472160	2.49	2.57e-02	True	-0.54	    ctrl-kd	0.01,0.02	0.56,0.54	238,219	    285,323	    1.00e+00
last_exon	chr21:45458057-45462454:+	45457808,45472160	2.49	2.57e-02	True	-0.54	    ctrl-kd	0.01,0.02	0.56,0.54	238,219	    285,323	    1.00e+00
first_exon	chr21:45514756-45516594:+	45514114,45518237	2.332	3.08e-02	True	-0.26	    ctrl-kd	0.02,0.02	0.2,0.36	206,210	    74,50	    1.00e+00
first_exon	chr21:45515067-45516594:+	45514114,45518237	2.332	3.08e-02	True	-0.28	    ctrl-kd	0.02,0.02	0.23,0.38	206,210	    74,50	    1.00e+00
middle_exon	chr21:45516486-45516594:+	45514114,45518237	2.332	3.08e-02	True	-0.48	    ctrl-kd	0.07,0.07	0.48,0.62	206,210	    74,50	    1.00e+00
middle_exon	chr21:47914042-47914146:+	47910632,47916900	2.408	2.82e-02	True	-0.71	    ctrl-kd	0.02,0.09	0.71,0.82	160,121	    168,173	    1.00e+00

Troubleshooting 🛠️

Please submit your issues to the GitHub repository directly (under the Issues tab).

  • Problem 1:

It fails to ‘pip install’ and return this error message “module ‘pip._vendor.platformdirs’ has no attribute ‘user_cache_dir’”.

Potential solution:

• Try ‘pip3 install’ instead of ‘pip install’

  • Problem 2:

If fails to ‘CryEx_stringtie -f fofn.txt -o exons.gtf’ and return this error message “Segmentation fault”.

Potential solution:

• Run stringtie on each bam file one by one instead of multiple files in one run and concat the output GTF files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cryex_v2-2.0.2.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cryex_v2-2.0.2-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file cryex_v2-2.0.2.tar.gz.

File metadata

  • Download URL: cryex_v2-2.0.2.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for cryex_v2-2.0.2.tar.gz
Algorithm Hash digest
SHA256 e45c86a888be9503f5739af5a0d6262556bf8ebd8af387c7e86d33bdefb6882f
MD5 36154380fe3f37ae1994af1fbd83fa43
BLAKE2b-256 cde11c8fc2c3844f67f59ee0ae4c3f284d8fec27d478f1568135943ba258ebdf

See more details on using hashes here.

File details

Details for the file cryex_v2-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: cryex_v2-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for cryex_v2-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 04eab6ceba496dae2612e1f35eb13d089f12779cbbc81ea691f7fa3c12585a18
MD5 6664acf28312f16d6b82c6b9f562fbd4
BLAKE2b-256 9b740449e18b8a061ba17dff5fc7b67b6591973e2c76795d35454d0079a29831

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page