Skip to main content

GEXSCOPE Single cell analysis

Project description

CeleScope

GEXSCOPE Single Cell Analysis Pipelines
Chinese Docs(中文文档): https://gitee.com/singleron-rd/celescope/wikis/

Requirements

  • conda
  • git
  • minimum 32GB RAM(to run STAR aligner)

Installation

  1. Clone repo
git clone https://github.com/singleron-RD/CeleScope.git
# If github is blocked
git clone https://gitee.com/singleron-rd/celescope.git
  1. Install conda packages
cd CeleScope
conda create -n celescope
conda activate celescope
conda install --file conda_pkgs.txt --channel conda-forge --channel bioconda --channel r --channel imperial-college-research-computing
  1. Install celescope
pip install celescope
# Use pypi mirror to accelerate downloading if you are in china
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple celescope
  1. Install Beta version(optional)
# If you want to use Beta version of celescope
python setup.py install

Reference genome

Homo sapiens

mkdir -p hs/ensembl_99
cd hs/ensembl_99

wget ftp://ftp.ensembl.org/pub/release-99/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/Homo_sapiens.GRCh38.99.gtf.gz

gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.99.gtf.gz

conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 Homo_sapiens.GRCh38.99.gtf /dev/stdout | \
    awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > Homo_sapiens.GRCh38.99.refFlat

STAR \
    --runMode genomeGenerate \
    --runThreadN 6 \
    --genomeDir ./ \
    --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa \
    --sjdbGTFfile Homo_sapiens.GRCh38.99.gtf \
    --sjdbOverhang 100

Mus musculus

mkdir -p mmu/ensembl_99
cd mmu/ensembl_99

wget ftp://ftp.ensembl.org/pub/release-99/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/mus_musculus/Mus_musculus.GRCm38.99.gtf.gz

gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz 
gunzip Mus_musculus.GRCm38.99.gtf.gz

conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 Mus_musculus.GRCm38.99.gtf /dev/stdout | \
    awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > Mus_musculus.GRCm38.99.refFlat

STAR \
    --runMode genomeGenerate \
    --runThreadN 6 \
    --genomeDir ./ \
    --genomeFastaFiles Mus_musculus.GRCm38.dna.primary_assembly.fa \
    --sjdbGTFfile Mus_musculus.GRCm38.99.gtf \
    --sjdbOverhang 100

Usage

Single cell RNA-Seq

  1. Prepare mapfile

mapfile is a tab-delimited text file(.tsv) containing at least three columns. Each line of mapfile represents a pair of fastq files(Read 1 and Read 2).

First column: Fastq file prefix. Fastq files must be gzipped.

Second column: Fastq directory.

Third column: Sample name, which is the prefix of all generated files. One sample can have multiple fastq files.

Fourth column: Optional, force cell number (scRNA-Seq) or match_dir (scVDJ).

Sample mapfile:

$cat ./my.mapfile
R2007197    /SGRNJ/DATA_PROJ/dir1	sample1
R2007199    /SGRNJ/DATA_PROJ/dir2	sample1
R2007198    /SGRNJ/DATA_PROJ/dir1   sample2

$ls /SGRNJ/DATA_PROJ/dir1
R2007198_L2_2.fq.gz
R2007198_L2_1.fq.gz
R2007197_L2_2.fq.gz
R2007197_L2_1.fq.gz

$ls /SGRNJ/DATA_PROJ/dir2
R2007199_L2_2.fq.gz
R2007199_L2_1.fq.gz
  1. Run multi_rna to create shell scripts
conda activate celescope
multi_rna \
 --mapfile ./my.mapfile \
 --genomeDir {some path}/hs/ensembl_99 \
 --thread 8 \
 --mod shell

--mapfile Required, mapfile path.

--genomeDir Required, genomeDir directory.

--thread Maximum number of threads to use, default=4.

--mod Create "sjm"(simple job manager https://github.com/StanfordBioinformatics/SJM) or "shell" scripts.

Shell scripts will be created in ./shell directory, one script per sample. The shell scripts contains all the steps that need to be run.

  1. Run shell scripts under current directory

sh ./shell/{sample}.sh

Single Cell VDJ

Running single Cell VDJ is almost the same as running single Cell RNA-Seq, except that the arguments of multi_vdj are somewhat different.

  1. Prepare mapfile

If you have paired single cell RNA-seq and VDJ samples, the single cell RNA-Seq directory after running CeleScope is called matched_dir. You can write matched_dir's path as the fourth column of mapfile(optional).

R2007197    /SGRNJ/DATA_PROJ/dir    sample1 /SGRNJ/Projects/sample1
  1. Run multi_vdj to create shell scripts
conda activate celescope
multi_vdj \
 --mapfile ./my.mapfile \
 --type TCR \
 --thread 8 \
 --mod shell \

--type Required. TCR or BCR.

  1. Run shell scripts under current directory

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celescope-1.1.8.tar.gz (1.4 MB view details)

Uploaded Source

Built Distributions

celescope-1.1.8-py3.6.egg (1.7 MB view details)

Uploaded Source

celescope-1.1.8-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file celescope-1.1.8.tar.gz.

File metadata

  • Download URL: celescope-1.1.8.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.7

File hashes

Hashes for celescope-1.1.8.tar.gz
Algorithm Hash digest
SHA256 e0d022dcae03b2a513ff61e5155a4717e3a5058212c50d6c0de094deb3828338
MD5 72bb586a0d4eb7fdc7c162c801328c50
BLAKE2b-256 e9c1203f1905451f87b74bf181925599641bcc70a60bfcca099a154e5f192834

See more details on using hashes here.

File details

Details for the file celescope-1.1.8-py3.6.egg.

File metadata

  • Download URL: celescope-1.1.8-py3.6.egg
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.7

File hashes

Hashes for celescope-1.1.8-py3.6.egg
Algorithm Hash digest
SHA256 9f2a5dc5410647b9b0ac1a2b6ca83969032d8dae1de4b2307e27f3ca7b609710
MD5 70f2c17d184fa8ba7dd46020c40de240
BLAKE2b-256 b3ed487e24da3a64fa5d20ef8f50181f73ab39ef28f002653399ea9d22297ad2

See more details on using hashes here.

File details

Details for the file celescope-1.1.8-py3-none-any.whl.

File metadata

  • Download URL: celescope-1.1.8-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.7

File hashes

Hashes for celescope-1.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 c1f2120656946e776b8862d346b3b93119031d8ee1f1ad01b84526c00bfb7f22
MD5 b329b43e2c8103d709d15432d3eeb6cd
BLAKE2b-256 09b71770c78eb9c018df0cc162916403b18120cb36a8c7016be0deac9d4de6a8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page