Skip to main content

GEXSCOPE Single cell analysis

Project description

CeleScope

CeleScope is a collection of bioinfomatics analysis pipelines to process SCOPE single cell data. Currently it can analyze:

  • Single Cell RNA-Seq data
  • Single Cell Immune Profiling(VDJ) data

Detailed docs can be found in wiki.

Hardware/Software Requirements

  • minimum 32GB RAM(to run STAR aligner)
  • conda
  • git

Installation

  1. Clone repo
git clone https://gitee.com/singleron-rd/celescope.git
# or 
git clone https://github.com/singleron-RD/CeleScope.git
  1. Install conda packages
cd CeleScope
conda create -n celescope
conda activate celescope
conda install --file conda_pkgs.txt --channel conda-forge --channel bioconda --channel r --channel imperial-college-research-computing
  1. Install celescope
pip install celescope
# Use pypi mirror to accelerate downloading if you are in china
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple celescope
  1. Install Beta version(optional)
# If you want to use Beta version of celescope
python setup.py install

Reference genome

Homo sapiens

mkdir -p hs/ensembl_99
cd hs/ensembl_99

wget ftp://ftp.ensembl.org/pub/release-99/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/Homo_sapiens.GRCh38.99.gtf.gz

gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.99.gtf.gz

conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 Homo_sapiens.GRCh38.99.gtf /dev/stdout | \
    awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > Homo_sapiens.GRCh38.99.refFlat

STAR \
    --runMode genomeGenerate \
    --runThreadN 6 \
    --genomeDir ./ \
    --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa \
    --sjdbGTFfile Homo_sapiens.GRCh38.99.gtf \
    --sjdbOverhang 100

Mus musculus

mkdir -p mmu/ensembl_99
cd mmu/ensembl_99

wget ftp://ftp.ensembl.org/pub/release-99/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/mus_musculus/Mus_musculus.GRCm38.99.gtf.gz

gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz 
gunzip Mus_musculus.GRCm38.99.gtf.gz

conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 Mus_musculus.GRCm38.99.gtf /dev/stdout | \
    awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > Mus_musculus.GRCm38.99.refFlat

STAR \
    --runMode genomeGenerate \
    --runThreadN 6 \
    --genomeDir ./ \
    --genomeFastaFiles Mus_musculus.GRCm38.dna.primary_assembly.fa \
    --sjdbGTFfile Mus_musculus.GRCm38.99.gtf \
    --sjdbOverhang 100

Quick start

Single cell RNA-Seq

  1. Prepare mapfile

Mapfile is a tab-delimited text file(.tsv) containing at least three columns. Each line of mapfile represents a pair of fastq files(Read 1 and Read 2).

First column: Fastq file prefix. Fastq files must be gzipped.

Second column: Fastq directory.

Third column: Sample name, which is the prefix of all generated files. One sample can have multiple fastq files.

Fourth column: Optional, force cell number (scRNA-Seq) or match_dir (scVDJ).

Sample mapfile:

$cat ./my.mapfile
R2007197    /SGRNJ/DATA_PROJ/dir1	sample1
R2007199    /SGRNJ/DATA_PROJ/dir2	sample1
R2007198    /SGRNJ/DATA_PROJ/dir1   sample2

$ls /SGRNJ/DATA_PROJ/dir1
R2007198_L2_2.fq.gz
R2007198_L2_1.fq.gz
R2007197_L2_2.fq.gz
R2007197_L2_1.fq.gz

$ls /SGRNJ/DATA_PROJ/dir2
R2007199_L2_2.fq.gz
R2007199_L2_1.fq.gz
  1. Run multi_rna to create shell scripts
conda activate celescope
multi_rna \
 --mapfile ./my.mapfile \
 --genomeDir {some path}/hs/ensembl_99 \
 --thread 8 \
 --mod shell

--mapfile Required, mapfile path.

--genomeDir Required, genomeDir directory.

--thread Maximum number of threads to use, default=4.

--mod Create "sjm"(simple job manager https://github.com/StanfordBioinformatics/SJM) or "shell" scripts.

Shell scripts will be created in ./shell directory, one script per sample. The shell scripts contains all the steps that need to be run.

  1. Run shell scripts under current directory

sh ./shell/{sample}.sh

Single Cell VDJ

Running single Cell VDJ is almost the same as running single Cell RNA-Seq, except that the arguments of multi_vdj are somewhat different.

  1. Prepare mapfile

If you have paired single cell RNA-seq and VDJ samples, the single cell RNA-Seq directory after running CeleScope is called matched_dir. You can write matched_dir's path as the fourth column of mapfile(optional).

R2007197    /SGRNJ/DATA_PROJ/dir    sample1 /SGRNJ/Projects/sample1
  1. Run multi_vdj to create shell scripts
conda activate celescope
multi_vdj \
 --mapfile ./my.mapfile \
 --type TCR \
 --thread 8 \
 --mod shell \

--type Required. TCR or BCR.

  1. Run shell scripts under current directory

Project details


Release history Release notifications | RSS feed

This version

1.1.9

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celescope-1.1.9.tar.gz (2.5 MB view details)

Uploaded Source

Built Distributions

celescope-1.1.9-py3.6.egg (2.8 MB view details)

Uploaded Source

celescope-1.1.9-py3-none-any.whl (2.6 MB view details)

Uploaded Python 3

File details

Details for the file celescope-1.1.9.tar.gz.

File metadata

  • Download URL: celescope-1.1.9.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.7

File hashes

Hashes for celescope-1.1.9.tar.gz
Algorithm Hash digest
SHA256 3af0eec41e4d72c918444c7c266856a60cfa4ea0f8c410ab05b9300dd3e9e465
MD5 e460b21e083d8d7f86315a70a21f55f8
BLAKE2b-256 f0e42abb3551ec4d6fc475227bb4333a37b0831a6f50e428692a6be816235627

See more details on using hashes here.

File details

Details for the file celescope-1.1.9-py3.6.egg.

File metadata

  • Download URL: celescope-1.1.9-py3.6.egg
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.7

File hashes

Hashes for celescope-1.1.9-py3.6.egg
Algorithm Hash digest
SHA256 076b1759c321e8a45de4533f67814d8bc64d7ffc7edd79e1da6ac7d602b49e86
MD5 64e21d34c9345dbb5ff78f050d5dfe38
BLAKE2b-256 a5d64b39317c812e5e726ceeaa7c4c1bc914a044d2c959f3e6b6b5634150a6eb

See more details on using hashes here.

File details

Details for the file celescope-1.1.9-py3-none-any.whl.

File metadata

  • Download URL: celescope-1.1.9-py3-none-any.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.7

File hashes

Hashes for celescope-1.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 ea905dd802401b43a90fa5989185414e7d06fc1240557e671a3f270b80df6031
MD5 f41ffce8c271c73a16b1764c50d8a2ff
BLAKE2b-256 b370b8cb10e6b72b0c20f9f2d3df4f34216f7737496cfba47c7a39f5a9a38873

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page