Skip to main content

Pipeline for Processing RNA-Seq datasets

Project description

# DropRNA

## Install baseq_drops
We need python3 and a package called: baseq_drops, which could be installed by:

pip install baseqdrops

After install, you will have a runnable command `baseq-Drop`

## Config file

The pipeline need the following software or resources:

+ `star`: STAR software, for fast alignment of RNA-Seq data;
+ `samtools`: Sorting bam file;
+ `whitelistDir`: The barcode whitelist files for indrop and 10X should be placed under whitelistDir.
These files can be downloaded from XXX.
+ `cellranger_ref_<genome>`: The key process of read alignment and tagging to genes
are inspired and borrowed from the open source cellranger pipeline
(https://github.com/10XGenomics/cellranger).
The refernces of genome index and transcriptome can be downloaded
from https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest.
In the config file, the directory of cellrange references is named as `cellranger_<genome>`.

While running command, the configures are recorded in the file called `config_drops.ini`:

[Drops]
samtools = /path/to/samtools
star = /path/to/STAR
whitelistDir = /path/to/whitelist_file_directory
cellranger_hg38 = /path/to/reference/refdata-cellranger-GRCh38-1.2.0/

## Process Steps
1. `Extract the Cell Barcode` Counting the number of each kinds of barcode; this will genrate a barcode_count.<sample>.csv;
2. `Cell Barcode correction and filtering` Correcting the cell barcode with 1bp mismatch, filtering the barcode with min number of reads;
3. `Split the reads of valid Cell Barcodes` The raw pair-end raw reads are splitted to 16 single end files for multiprocessing according to the 2bp prefix of barcode; For example, we will get: split.<sample>.<AA|AT|AC|AG...|GG>.fq
4. `Star Alignment` Fastq files runs at the same time; The bam file sorted by sequence header is generated;
5. `Reads tagging` Tagging the reads alignment position to the corresponding gene name
6. `Genrating UMI table`

## Run Command

The main config is:

+ `--config`: config file;
+ `--genome/-g`: genome version;
+ `--protocol`: [10X|indrop|dropseq]
+ `--minreads`: Minimum reads for a barcode
+ `--name/-n` : Sample name
+ `--fq1/-1`: Read 1
+ `--fq2/-2`: Read 2
+ `--top_million_reads`: How many million reads to use, mainly for testing pipeline with fraction of reads (default 1000)
+ `--dir/-d`: output path

If you config the: `cellranger_ref_hg38` you can run the following:

baseqdrops run_pipe --config ./config_drops.ini -g hg38 -p 10X --minreads 10000 -n 10X_test -1 10x_1.1.fq.gz -2 10x.2.fq.gz -d ./

### For older version 10X results
The cell barcode length is 15 and UMI length is 5.

baseqdrops run_pipe --config ./config_drops.ini -g hg38 -p 10X --minreads 10000 -n 10X_test -1 10x_1.1.fq.gz -2 10x.2.fq.gz -d ./



Project details


Release history Release notifications | RSS feed

This version

1.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baseqRNA-1.5.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

baseqRNA-1.5-py2.py3-none-any.whl (24.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file baseqRNA-1.5.tar.gz.

File metadata

  • Download URL: baseqRNA-1.5.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/38.4.0 requests-toolbelt/0.8.0 tqdm/4.21.0 CPython/3.6.4

File hashes

Hashes for baseqRNA-1.5.tar.gz
Algorithm Hash digest
SHA256 152adebd57aa184ada3c9ad033a27324e52fbce384d31ecf007985c68b138f0f
MD5 08ab0b486ff2fcd51f9c277f883adab0
BLAKE2b-256 fd7cbcaee938ae69df7feb988b631477b6c85cdc20a217b56dae18fdd3efe5af

See more details on using hashes here.

File details

Details for the file baseqRNA-1.5-py2.py3-none-any.whl.

File metadata

  • Download URL: baseqRNA-1.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/38.4.0 requests-toolbelt/0.8.0 tqdm/4.21.0 CPython/3.6.4

File hashes

Hashes for baseqRNA-1.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 56530794332cabb3527ee98372967f11cfd7ff228f429a3276cf48a01f3a2f0c
MD5 f039d9a489a4dd7259bd89b2c7c13322
BLAKE2b-256 56da152269c62bcd76786056408cdb51027daf54c4ae1305140eaf9a6f2a4e4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page