Skip to main content

A package for estimating alternative polyadenylation events from scRNA-seq data.

Project description

SCAPE-APA: a package for estimating alternative polyadenylation events from scRNA-seq data

Installation

Method 1

conda config --append channels bioconda 
conda config --append channels conda-forge 
conda config --append channels anaconda 
conda create -n scape_env python=3.11
conda activate scape_env

conda install anaconda::numpy
conda install anaconda::scipy
conda install anaconda::pandas
conda install anaconda::matplotlib
conda install anaconda::click
conda install anaconda::tomli-w
conda install anaconda::requests
conda install conda-forge::psutil
conda install conda-forge::tomli-w
conda install bioconda::bedtools
conda install bioconda::pybedtools
conda install bioconda::pysam
conda install bioconda::gffutils

pip install taichi
pip install scape-apa

Method 2

# Mac
conda env create -f mac_environment.yml
conda activate scape_env

Method 3

# Linux
conda env create -f linux_environment.yml
conda activate scape_env
pip install -r linux_requirements.txt

Method 4

# install locally
git clone https://github.com/chengl7-lab/scape.git
cd scape
pip install .

Commands

Command Description
scape gen_utr_annotation Generate UTR annotation.
scape prepare_input Prepare data per UTR.
scape infer_pa Parameters inference.
scape merge_pa Merge PA within junction per gene or UTR.
scape cal_exp_pa_len Calculate the expected length of PA.
scape ex_pa_cnt_mat Extract read count matrix.

Get help information of scape or scape commands.

scape --help
scape gen_utr_annotation --help

Usage

gen_utr_annotation

Input Argument Type Required Default Description
--gff_file TEXT Yes NA The gff3 or gff3.gz file including annotation of gene body.
--output_dir TEXT Yes NA Directory to save dataframe of selected UTR.
--res_file_name TEXT Yes NA File Name of dataframe of the UTR annotation. The suffix .csv is automatically generated.
--gff_merge_strategy TEXT No merge Method for processing overlapping regions. It follows merge_strategy in package gffutils.

OUTPUT: An csv file including information of annotated 3UTR which is stored at {output_dir}/{res_file_name}.csv.

prepare_input

Input Argument Type Required Default Description
--utr_file TEXT Yes NA UTR annotation file (dataframe, resulted from gen_utr_annotation).
--cb_file TEXT Yes NA File of tsv.gz including all validated barcodes (by CellRanger). This file has one column of cell barcode which must be consistent with value of CB tag in bam_file file.
--bam_file TEXT Yes NA Bam file that is used for searching reads over annotated UTR.
--output_dir TEXT Yes NA Output directory to save pickle files of selected reads over annotated UTR.
--chunksize INTERGER No 1000 Number of UTR regions included in each small pickle file, which contains the preprocessed input file for APA analysis.

OUTPUT: Pickle files that include tuples (gene info, dataframe of parameter).

infer_pa

Input Argument Type Required Default Description
--input_pickle_file TEXT Yes NA Input pickle file (result of prepare_input)
--output_dir TEXT Yes NA Directory to save output pickle files including PAS information over annotated UTR.
--toml_para_file TEXT No None A TOML file (example) specifies user-defined parameters.
--pre_para_pkl_file TEXT No None A pickle file with pre-specified pA sites and utr length, result file of scape analysis.

OUTPUT: Pickle file including Parameters for each UTR region.

merge_pa

Input Argument Type Required Default Description
--output_dir TEXT Yes NA Directory which was used in previous steps to save output by prepare_input and infer_pa.
--utr_merge BOOLEAN No True If True, PA sites from the same gene are merge. Otherwise, if False, PA sites from the same UTR are merged.

OUTPUT: A single pickle file containing all UTRs of all genes is stored in output_dir/. Its name is res.gene.pkl if utr_merge=True, otherwise, its name is res.utr.pkl.

cal_exp_pa_len

Input Argument Type Required Default Description
--output_dir TEXT Yes NA Directory which was used in previous steps to save output by prepare_input and infer_pa.
--cell_cluster_file TEXT No - An csv file containing two columns in order: cell barcode (CB) and respective group (cell_cluster_file). Its name will be included in the file name of final result.
--res_pkl_file TEXT No - Name of res pickle file that contains PASs for calculating expected PA length. Its name will be included in the file name of final result.

OUTPUT: exp_pa_len.csv. It is a dataframe with 2 columns.

ex_pa_cnt_mat

Input Argument Type Required Default Description
--output_dir TEXT Yes NA Directory which was used in previous steps to save output by prepare_input and infer_pa.
--res_pkl_file TEXT No - Name of res pickle file that contains PASs for calculating expected PA length. Its name will be included in the file name of final result.

OUTPUT: An tsv.gz file named {res_pkl_file.cnt.tsv.gz} is stored in output_dir/.

Demo & Tutorials

The data used can be downloaded from examples.

Citation

Guangzhao Cheng, Tien Le, Ran Zhou, and Lu Cheng. SCAPE-APA: a package for estimating alternative polyadenylation events from scRNA-seq data. bioRxiv 2024.03.12.584547; doi: https://doi.org/10.1101/2024.03.12.584547

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scape_apa-1.0.3.tar.gz (43.3 kB view details)

Uploaded Source

Built Distribution

scape_apa-1.0.3-py3-none-any.whl (45.3 kB view details)

Uploaded Python 3

File details

Details for the file scape_apa-1.0.3.tar.gz.

File metadata

  • Download URL: scape_apa-1.0.3.tar.gz
  • Upload date:
  • Size: 43.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for scape_apa-1.0.3.tar.gz
Algorithm Hash digest
SHA256 b57696b3a8f2fbd42d16945eed873a26096fd8e6552ee713ef1ad7423a265e81
MD5 45bf6a25c1a2a2b8644694df4fd78b28
BLAKE2b-256 7257f94d8b90aab647acfd049c2bcf8a153848baf9ba35af9d2383f5d54b706f

See more details on using hashes here.

File details

Details for the file scape_apa-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: scape_apa-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 45.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for scape_apa-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dab384a17dd684394367c307da400270c68f44a7e9abb1359e42920ef4bc6124
MD5 81a04d6e1f6096df0fb4e5dd1a934d68
BLAKE2b-256 ddb1dbe9a954ff80d4fc3a68dee5382b26431a979c5abaf34543488f69dc273e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page