XuLab package for single-cell RNA-seq data upstream processing.
Project description
hellobc
This is a toolkit for upstream analysis of scRNA-seq data developed by Youchun Xu's group, Tsinghua University.
The toolkit contains the following functions:
- Parsing user-defined barcode patterns
- Quantifying RNA
- Cell calling
- Generating count matrix
Preperation
Before using hibc, make sure you have installed STAR, featureCounts.
# install STAR
# install featureCounts
Installation
We recommend installing hibc in a conda environment.
conda create -n hellobc python=3.8
conda activate hellobc
conda install bioconda::seqkit
conda install bioconda::samtools
pip install hellobc
Get started
In order to run the whole pipeline of upstream analysis, you can create a new .py file with the following code:
from hibc.pipline.runpip import hellobc_pipeline
workst = "/mnt/sda/xxt/workst/sevci/20240517" # output directory
sampid = "ZT-136-ZT-136-1" # sample name
fq1 = "/mnt/sda/xxt/data/seqdata/biozt/ZT-136-ZT-136-1/ZT-136-ZT-136-1_R1.fastq.gz"
fq2 = "/mnt/sda/xxt/data/seqdata/biozt/ZT-136-ZT-136-1/ZT-136-ZT-136-1_R2.fastq.gz"
bcpat = '36B1[1:12; 19:30; 37:48] 8U1[49:56]' # barcode pattern of biocapital
# bcpat = '16B1[1:16] 12U1[17:28]' # barcode pattern of 10x chem_v2
# path to STAR, featureCounts. If not setted, the STAR and featureCounts in the environmental variables will be used.
star_path = '/mnt/sda/xxt/soft/STAR-2.6.1a/bin/Linux_x86_64_static/STAR'
fc_path = '/mnt/sda/xxt/soft/subread-2.0.6-Linux-x86_64/bin/featureCounts'
# human genome
gtf_dir = "/mnt/sda/xxt/data/seqdata/ref/refdata-gex-GRCh38-2020-A/genes/genes.gtf"
star_index_dir = "/mnt/sda/xxt/data/seqdata/ref/refdata-gex-GRCh38-2020-A/star"
# mouse genome
# mouse_gtf_dir = "/mnt/sda/xxt/data/seqdata/ref/refdata-gex-mm10-2020-A/genes/genes.gtf"
# mouse_star_index_dir = "/mnt/sda/xxt/data/seqdata/ref/refdata-gex-mm10-2020-A/star"
xu_pipeline(workst, sampid, fq1, fq2, bcpat, gtf_dir, star_index_dir, init_method='ordmag', min_umis=500, max_adj_pvalue=0.01, star_path=star_path, fc_path=fc_path, corenum=-1)
More details
Input format of barcode pattern
The composition of the barcode format string:
- Letters: 'B' means cell barcode, 'U' means umi.
- The number in front of the letter: total length (number of bases) of barcode or umi.
- The number after the letter: can only choose between 1 or 2. 1 means barcode/umi is in read1, 2 means read2.
- Contents in [ ]: position of each barcode/umi segment in the sequence, separated by ';'.
- For example:
# If you have 3 discrete barcode segments (each segment is 12nt) and 1 segment of 8nt umi, both barcode and umi are in read1:
bcpat = '36B1[1:12; 19:30; 37:48] 8U1[49:56]'
# If you have 1 segment of 16nt barcode and 1 segment of 12nt umi, both barcode and umi are in read1:
bcpat = '16B1[1:16] 12U1[17:28]'
Cell calling parameters setting
There are two parameters that can significantly affect the final number of cells identified:
- min_umis (default value: 500): The larger of the value, the fewer cells will be called.
- max_adj_pvalue (default value: 0.01): The smaller of the value, the fewer cells will be called.
- These 2 parameters can be setted when calling the 'hellobc_pipeline' function:
# For example, here we set the min_umis=100, max_adj_pvalue=0.01
hellobc_pipeline(workst, sampid, fq1, fq2, bcpat, gtf_dir, star_index_dir, min_umis=100, max_adj_pvalue=0.01)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file hellobc-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: hellobc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
562c4cf0001f21b2ca0e562e183108b1ccd7b3c808f20f9b9a92f1eb2f1cd918
|
|
MD5 |
ecf162c27c96d821f6ffaccaf1a64e26
|
|
BLAKE2b-256 |
3dd7617c79a0f194f6e9b0d81d84788c424cb467d0065be597485a49a60bfebe
|