A package for estimating alternative polyadenylation events from scRNA-seq data.
Project description
SCAPE-APA: a package for estimating alternative polyadenylation events from scRNA-seq data
Installation
Method 1
conda config --append channels bioconda
conda config --append channels conda-forge
conda config --append channels anaconda
conda create -n scape_env python=3.11
conda activate scape_env
conda install anaconda::numpy
conda install anaconda::scipy
conda install anaconda::pandas
conda install anaconda::matplotlib
conda install anaconda::click
conda install anaconda::tomli-w
conda install anaconda::requests
conda install conda-forge::psutil
conda install conda-forge::tomli-w
conda install bioconda::bedtools
conda install bioconda::pybedtools
conda install bioconda::pysam
conda install bioconda::gffutils
pip install taichi
pip install scape-apa
Method 2
# Mac
conda env create -f mac_environment.yml
conda activate scape_env
Method 3
# Linux
conda env create -f linux_environment.yml
conda activate scape_env
pip install -r linux_requirements.txt
Method 4
# install locally
git clone https://github.com/chengl7-lab/scape.git
cd scape
pip install .
Commands
Command | Description |
---|---|
scape gen_utr_annotation | Generate UTR annotation. |
scape prepare_input | Prepare data per UTR. |
scape infer_pa | Parameters inference. |
scape merge_pa | Merge PA within junction per gene or UTR. |
scape cal_exp_pa_len | Calculate the expected length of PA. |
scape ex_pa_cnt_mat | Extract read count matrix. |
Get help information of scape
or scape commands
.
scape --help
scape gen_utr_annotation --help
Usage
gen_utr_annotation
Input Argument | Type | Required | Default | Description |
---|---|---|---|---|
--gff_file | TEXT | Yes | NA | The gff3 or gff3.gz file including annotation of gene body. |
--output_dir | TEXT | Yes | NA | Directory to save dataframe of selected UTR. |
--res_file_name | TEXT | Yes | NA | File Name of dataframe of the UTR annotation. The suffix .csv is automatically generated. |
--gff_merge_strategy | TEXT | No | merge | Method for processing overlapping regions. It follows merge_strategy in package gffutils. |
OUTPUT: An csv file including information of annotated 3UTR which is stored at {output_dir}/{res_file_name}.csv
.
prepare_input
Input Argument | Type | Required | Default | Description |
---|---|---|---|---|
--utr_file | TEXT | Yes | NA | UTR annotation file (dataframe, resulted from gen_utr_annotation). |
--cb_file | TEXT | Yes | NA | File of tsv.gz including all validated barcodes (by CellRanger). This file has one column of cell barcode which must be consistent with value of CB tag in bam_file file. |
--bam_file | TEXT | Yes | NA | Bam file that is used for searching reads over annotated UTR. |
--output_dir | TEXT | Yes | NA | Output directory to save pickle files of selected reads over annotated UTR. |
--chunksize | INTERGER | No | 1000 | Number of UTR regions included in each small pickle file, which contains the preprocessed input file for APA analysis. |
OUTPUT: Pickle files that include tuples (gene info, dataframe of parameter).
infer_pa
Input Argument | Type | Required | Default | Description |
---|---|---|---|---|
--input_pickle_file | TEXT | Yes | NA | Input pickle file (result of prepare_input) |
--output_dir | TEXT | Yes | NA | Directory to save output pickle files including PAS information over annotated UTR. |
--toml_para_file | TEXT | No | None | A TOML file (example) specifies user-defined parameters. |
--pre_para_pkl_file | TEXT | No | None | A pickle file with pre-specified pA sites and utr length, result file of scape analysis. |
OUTPUT: Pickle file including Parameters for each UTR region.
merge_pa
Input Argument | Type | Required | Default | Description |
---|---|---|---|---|
--output_dir | TEXT | Yes | NA | Directory which was used in previous steps to save output by prepare_input and infer_pa . |
--utr_merge | BOOLEAN | No | True | If True, PA sites from the same gene are merge. Otherwise, if False, PA sites from the same UTR are merged. |
OUTPUT: A single pickle file containing all UTRs of all genes is stored in output_dir/
. Its name is res.gene.pkl
if utr_merge=True
, otherwise, its name is res.utr.pkl
.
cal_exp_pa_len
Input Argument | Type | Required | Default | Description |
---|---|---|---|---|
--output_dir | TEXT | Yes | NA | Directory which was used in previous steps to save output by prepare_input and infer_pa . |
--cell_cluster_file | TEXT | No | - | An csv file containing two columns in order: cell barcode (CB) and respective group (cell_cluster_file). Its name will be included in the file name of final result. |
--res_pkl_file | TEXT | No | - | Name of res pickle file that contains PASs for calculating expected PA length. Its name will be included in the file name of final result. |
OUTPUT: exp_pa_len.csv
. It is a dataframe with 2 columns.
ex_pa_cnt_mat
Input Argument | Type | Required | Default | Description |
---|---|---|---|---|
--output_dir | TEXT | Yes | NA | Directory which was used in previous steps to save output by prepare_input and infer_pa . |
--res_pkl_file | TEXT | No | - | Name of res pickle file that contains PASs for calculating expected PA length. Its name will be included in the file name of final result. |
OUTPUT: An tsv.gz file named {res_pkl_file.cnt.tsv.gz} is stored in output_dir/
.
Demo & Tutorials
The data used can be downloaded from examples.
Citation
Guangzhao Cheng, Tien Le, Ran Zhou, and Lu Cheng. SCAPE-APA: a package for estimating alternative polyadenylation events from scRNA-seq data. bioRxiv 2024.03.12.584547; doi: https://doi.org/10.1101/2024.03.12.584547
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scape_apa-1.0.4.tar.gz
.
File metadata
- Download URL: scape_apa-1.0.4.tar.gz
- Upload date:
- Size: 43.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92d64a86fd634d7fd132e172fbaea0f58e9f4f8fd5163fee6266ebd5ab462be0 |
|
MD5 | ba9fffc801536506760d37dc89e2264c |
|
BLAKE2b-256 | 3d23f159656b8e3163cd87db4d0b6bce039665becac8b3fe26490b979db71cc0 |
File details
Details for the file scape_apa-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: scape_apa-1.0.4-py3-none-any.whl
- Upload date:
- Size: 45.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | af2b50c44df7298812ce49ad7132f55916749a0808fcd47ad0906bf4a0da0165 |
|
MD5 | 6204826a5ac858a242677286f5d86371 |
|
BLAKE2b-256 | 8e938096bea5194685464443f41e234d43c5efc8ca20b36def633e13047abab7 |