GEXSCOPE Single cell analysis
Project description
CeleScope
CeleScope is a collection of bioinfomatics analysis pipelines to process SCOPE single cell data. Currently it can analyze:
- Single Cell RNA-Seq data
- Single Cell Immune Profiling(VDJ) data
Detailed docs can be found in wiki.
Hardware/Software Requirements
- minimum 32GB RAM(to run STAR aligner)
- conda
- git
Installation
- Clone repo
git clone https://gitee.com/singleron-rd/celescope.git
# or
git clone https://github.com/singleron-RD/CeleScope.git
- Install conda packages
cd CeleScope
conda create -n celescope
conda activate celescope
conda install --file conda_pkgs.txt --channel conda-forge --channel bioconda --channel r --channel imperial-college-research-computing
- Install celescope
pip install celescope
# Use pypi mirror to accelerate downloading if you are in china
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple celescope
- Install Beta version(optional)
# If you want to use Beta version of celescope
python setup.py install
Reference genome
Homo sapiens
mkdir -p hs/ensembl_99
cd hs/ensembl_99
wget ftp://ftp.ensembl.org/pub/release-99/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/Homo_sapiens.GRCh38.99.gtf.gz
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.99.gtf.gz
conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 Homo_sapiens.GRCh38.99.gtf /dev/stdout | \
awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > Homo_sapiens.GRCh38.99.refFlat
STAR \
--runMode genomeGenerate \
--runThreadN 6 \
--genomeDir ./ \
--genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--sjdbGTFfile Homo_sapiens.GRCh38.99.gtf \
--sjdbOverhang 100
Mus musculus
mkdir -p mmu/ensembl_99
cd mmu/ensembl_99
wget ftp://ftp.ensembl.org/pub/release-99/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/mus_musculus/Mus_musculus.GRCm38.99.gtf.gz
gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
gunzip Mus_musculus.GRCm38.99.gtf.gz
conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 Mus_musculus.GRCm38.99.gtf /dev/stdout | \
awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > Mus_musculus.GRCm38.99.refFlat
STAR \
--runMode genomeGenerate \
--runThreadN 6 \
--genomeDir ./ \
--genomeFastaFiles Mus_musculus.GRCm38.dna.primary_assembly.fa \
--sjdbGTFfile Mus_musculus.GRCm38.99.gtf \
--sjdbOverhang 100
Quick start
Single cell RNA-Seq
- Prepare mapfile
Mapfile is a tab-delimited text file(.tsv) containing at least three columns. Each line of mapfile represents a pair of fastq files(Read 1 and Read 2).
First column: Fastq file prefix. Fastq files must be gzipped.
Second column: Fastq directory.
Third column: Sample name, which is the prefix of all generated files. One sample can have multiple fastq files.
Fourth column: Optional, force cell number (scRNA-Seq) or match_dir (scVDJ).
Sample mapfile:
$cat ./my.mapfile
R2007197 /SGRNJ/DATA_PROJ/dir1 sample1
R2007199 /SGRNJ/DATA_PROJ/dir2 sample1
R2007198 /SGRNJ/DATA_PROJ/dir1 sample2
$ls /SGRNJ/DATA_PROJ/dir1
R2007198_L2_2.fq.gz
R2007198_L2_1.fq.gz
R2007197_L2_2.fq.gz
R2007197_L2_1.fq.gz
$ls /SGRNJ/DATA_PROJ/dir2
R2007199_L2_2.fq.gz
R2007199_L2_1.fq.gz
- Run
multi_rna
to create shell scripts
conda activate celescope
multi_rna \
--mapfile ./my.mapfile \
--genomeDir {some path}/hs/ensembl_99 \
--thread 8 \
--mod shell
--mapfile
Required, mapfile path.
--genomeDir
Required, genomeDir directory.
--thread
Maximum number of threads to use, default=4.
--mod
Create "sjm"(simple job manager https://github.com/StanfordBioinformatics/SJM) or "shell" scripts.
Shell scripts will be created in ./shell
directory, one script per sample. The shell scripts contains all the steps that need to be run.
- Run shell scripts under current directory
sh ./shell/{sample}.sh
Single Cell VDJ
Running single Cell VDJ is almost the same as running single Cell RNA-Seq, except that the arguments of multi_vdj
are somewhat different.
- Prepare mapfile
If you have paired single cell RNA-seq and VDJ samples, the single cell RNA-Seq directory after running CeleScope is called matched_dir
. You can write matched_dir's path as the fourth column of mapfile(optional).
R2007197 /SGRNJ/DATA_PROJ/dir sample1 /SGRNJ/Projects/sample1
- Run
multi_vdj
to create shell scripts
conda activate celescope
multi_vdj \
--mapfile ./my.mapfile \
--type TCR \
--thread 8 \
--mod shell \
--type
Required. TCR or BCR.
- Run shell scripts under current directory
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file celescope-1.1.9.tar.gz
.
File metadata
- Download URL: celescope-1.1.9.tar.gz
- Upload date:
- Size: 2.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3af0eec41e4d72c918444c7c266856a60cfa4ea0f8c410ab05b9300dd3e9e465 |
|
MD5 | e460b21e083d8d7f86315a70a21f55f8 |
|
BLAKE2b-256 | f0e42abb3551ec4d6fc475227bb4333a37b0831a6f50e428692a6be816235627 |
File details
Details for the file celescope-1.1.9-py3.6.egg
.
File metadata
- Download URL: celescope-1.1.9-py3.6.egg
- Upload date:
- Size: 2.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 076b1759c321e8a45de4533f67814d8bc64d7ffc7edd79e1da6ac7d602b49e86 |
|
MD5 | 64e21d34c9345dbb5ff78f050d5dfe38 |
|
BLAKE2b-256 | a5d64b39317c812e5e726ceeaa7c4c1bc914a044d2c959f3e6b6b5634150a6eb |
File details
Details for the file celescope-1.1.9-py3-none-any.whl
.
File metadata
- Download URL: celescope-1.1.9-py3-none-any.whl
- Upload date:
- Size: 2.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea905dd802401b43a90fa5989185414e7d06fc1240557e671a3f270b80df6031 |
|
MD5 | f41ffce8c271c73a16b1764c50d8a2ff |
|
BLAKE2b-256 | b370b8cb10e6b72b0c20f9f2d3df4f34216f7737496cfba47c7a39f5a9a38873 |