GEXSCOPE Single cell analysis
Project description
CeleScope
GEXSCOPE Single Cell Analysis Tool Kit
中文文档
Requirements
- conda
- git
- minimum 32GB RAM(to run STAR aligner)
Installation
git clone https://github.com/zhouyiqi91/CeleScope.git
- add channels to ~/.condarc
channels:
- conda-forge
- bioconda
- r
- defaults
- imperial-college-research-computing
- install conda packages
cd CeleScope
conda create -n celescope
conda activate celescope
conda install --file conda_pkgs.txt
- install celescope
pip install celescope
# if you are in china, you can use pypi mirror to accelerate downloading
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple celescope
- install Beta version(optional)
# if you want to use Beta version of celescope
python setup.py install
Reference genome
Homo sapiens
wget ftp://ftp.ensembl.org/pub/release-99/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/Homo_sapiens.GRCh38.99.gtf.gz
mkdir -p references/Homo_sapiens/Ensembl/GRCh38
gzip -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > references/Homo_sapiens/Ensembl/GRCh38/Homo_sapiens.GRCh38.fa
gzip -c -d Homo_sapiens.GRCh38.99.gtf.gz > references/Homo_sapiens/Ensembl/GRCh38/Homo_sapiens.GRCh38.99.gtf
conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 references/Homo_sapiens/Ensembl/GRCh38/Homo_sapiens.GRCh38.99.gtf /dev/stdout | \
awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > references/Homo_sapiens/Ensembl/GRCh38/Homo_sapiens.GRCh38.99.refFlat
STAR \
--runMode genomeGenerate \
--runThreadN 6 \
--genomeDir references/Homo_sapiens/Ensembl/GRCh38 \
--genomeFastaFiles references/Homo_sapiens/Ensembl/GRCh38/Homo_sapiens.GRCh38.fa \
--sjdbGTFfile references/Homo_sapiens/Ensembl/GRCh38/Homo_sapiens.GRCh38.99.gtf \
--sjdbOverhang 100
Mus musculus
wget ftp://ftp.ensembl.org/pub/release-99/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/mus_musculus/Mus_musculus.GRCm38.99.gtf.gz
mkdir -p references/Mus_musculus/Ensembl/GRCm38
gzip -c -d Mus_musculus.GRCm38.dna.primary_assembly.fa.gz > references/Mus_musculus/Ensembl/GRCm38/Mus_musculus.GRCm38.fa
gzip -c -d Mus_musculus.GRCm38.99.gtf.gz > references/Mus_musculus/Ensembl/GRCm38/Mus_musculus.GRCm38.99.gtf
conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 references/Mus_musculus/Ensembl/GRCm38/Mus_musculus.GRCm38.99.gtf /dev/stdout | \
awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > references/Mus_musculus/Ensembl/GRCm38/Mus_musculus.GRCm38.99.refFlat
STAR \
--runMode genomeGenerate \
--runThreadN 6 \
--genomeDir references/Mus_musculus/Ensembl/GRCm38 \
--genomeFastaFiles references/Mus_musculus/Ensembl/GRCm38/Mus_musculus.GRCm38.fa \
--sjdbGTFfile references/Mus_musculus/Ensembl/GRCm38/Mus_musculus.GRCm38.99.gtf \
--sjdbOverhang 100
Usage
Single cell RNA-Seq
conda activate celescope
celescope rna run\
--fq1 ./data/R2005212_L1_1.fq.gz\
--fq2 ./data/R2005212_L1_2.fq.gz\
--chemistry auto\
--genomeDir /SGR/references/Homo_sapiens/Ensembl/GRCh38\
--sample R2005212\
--thread 4\
--fq1
Required. gzipped FASTQ read 1 file path
--fq2
Required. gzipped FASTQ read 2 file path
--chemistry
Required. default=auto detection
--genomeDir
Required. reference genome directory path
--sample
Required. sample name
--thread
Required. number of threads
Single Cell VDJ
conda activate celescope
celescope vdj run\
--fq1 {vdj fq1.gz}\
--fq2 {vdj fq2.gz}\
--sample {sample name}\
--chemistry auto\
--thread 4\
--type {TCR or BCR}
--match_dir {match_dir}\
--type
Required. TCR or BCR
--match_dir
Optional. Matched scRNA-Seq directory after running CeleScope
Single Cell Multiplexing
conda activate celescope
celescope smk run\
--fq1 {smk fq1.gz}\
--fq2 {smk fq2.gz}\
--sample {sample name}\
--chemistry auto\
--SMK_pattern L25C45\
--SMK_barcode {SMK barcode fasta}\
--SMK_linker {SMK linker fasta}\
--match_dir {match_dir}\
--dim 2\
--combine_cluster {combine_cluster.tsv}
SMK_pattern
Required. L25C45 means 25 bp linker + 45 bp cell barcode
abbreviations:
C: cell barcode
U: UMI
T: polyT
L: linker
--SMK_barcode
Required. SMK tag fasta file
--SMK_linker
Required. SMK linker fasta file
--match_dir
Required. Matched scRNA-Seq directory after running CeleScope
--dim
Required. SMK dimension
--combine_cluster
Optional. Conbine cluster tsv file.
first column: original cluster number
second column: combined cluster number
$cat SMK_barcode.fasta
>SMK1
ATTCAAGGGCAGCCGCGTCACGATTGGATACGACTGTTGGACCGG
>SMK2
TGGATGGGATAAGTGCGTGATGGACCGAAGGGACCTCGTGGCCGG
>SMK3
CGGCTCGTGCTGCGTCGTCTCAAGTCCAGAAACTCCGTGTATCCT
>SMK4
ATTGGGAGGCTTTCGTACCGCTGCCGCCACCAGGTGATACCCGCT
>SMK5
CTCCCTGGTGTTCAATACCCGATGTGGTGGGCAGAATGTGGCTGG
>SMK6
TTACCCGCAGGAAGACGTATACCCCTCGTGCCAGGCGACCAATGC
$cat SMK_linker.fasta
>smk_linker
GTTGTCAAGATGCTACCGTTCAGAG
$cat combine_cluster.tsv
1 1
2 2
3 2
4 2
5 3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file celescope-1.1.7-py3-none-any.whl
.
File metadata
- Download URL: celescope-1.1.7-py3-none-any.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef9d8f15aa406f73dcc20395e28796cf690c90a61790d856ef91a9a0507529ca |
|
MD5 | 9227ccda0d9b81698da399464264c178 |
|
BLAKE2b-256 | 4137e5b4eac4b831d4578ec7a80f8caaf5deb789c6ef416b0c9f774a30fa9d65 |