baseqRNA

Pipeline for Processing RNA-Seq datasets

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Software Development :: Build Tools

Project description

# DropRNA

## Install baseq_drops
We need python3 and a package called: baseq_drops, which could be installed by:

pip install baseqdrops

After install, you will have a runnable command `baseq-Drop`

## Config file

The pipeline need the following software or resources:

+ `star`: STAR software, for fast alignment of RNA-Seq data;
+ `samtools`: Sorting bam file;
+ `whitelistDir`: The barcode whitelist files for indrop and 10X should be placed under whitelistDir.
These files can be downloaded from XXX.
+ `cellranger_ref_<genome>`: The key process of read alignment and tagging to genes
are inspired and borrowed from the open source cellranger pipeline
(https://github.com/10XGenomics/cellranger).
The refernces of genome index and transcriptome can be downloaded
from https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest.
In the config file, the directory of cellrange references is named as `cellranger_<genome>`.

While running command, the configures are recorded in the file called `config_drops.ini`:

[Drops]
samtools = /path/to/samtools
star = /path/to/STAR
whitelistDir = /path/to/whitelist_file_directory
cellranger_hg38 = /path/to/reference/refdata-cellranger-GRCh38-1.2.0/

## Process Steps
1. `Extract the Cell Barcode` Counting the number of each kinds of barcode; this will genrate a barcode_count.<sample>.csv;
2. `Cell Barcode correction and filtering` Correcting the cell barcode with 1bp mismatch, filtering the barcode with min number of reads;
3. `Split the reads of valid Cell Barcodes` The raw pair-end raw reads are splitted to 16 single end files for multiprocessing according to the 2bp prefix of barcode; For example, we will get: split.<sample>.<AA|AT|AC|AG...|GG>.fq
4. `Star Alignment` Fastq files runs at the same time; The bam file sorted by sequence header is generated;
5. `Reads tagging` Tagging the reads alignment position to the corresponding gene name
6. `Genrating UMI table`

## Run Command

The main config is:

+ `--config`: config file;
+ `--genome/-g`: genome version;
+ `--protocol`: [10X|indrop|dropseq]
+ `--minreads`: Minimum reads for a barcode
+ `--name/-n` : Sample name
+ `--fq1/-1`: Read 1
+ `--fq2/-2`: Read 2
+ `--top_million_reads`: How many million reads to use, mainly for testing pipeline with fraction of reads (default 1000)
+ `--dir/-d`: output path

If you config the: `cellranger_ref_hg38` you can run the following:

baseqdrops run_pipe --config ./config_drops.ini -g hg38 -p 10X --minreads 10000 -n 10X_test -1 10x_1.1.fq.gz -2 10x.2.fq.gz -d ./

### For older version 10X results
The cell barcode length is 15 and UMI length is 5.

baseqdrops run_pipe --config ./config_drops.ini -g hg38 -p 10X --minreads 10000 -n 10X_test -1 10x_1.1.fq.gz -2 10x.2.fq.gz -d ./

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

This version

1.5

Jan 17, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baseqRNA-1.5.tar.gz (16.7 kB view details)

Uploaded Jan 17, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

baseqRNA-1.5-py2.py3-none-any.whl (24.0 kB view details)

Uploaded Jan 17, 2019 Python 2Python 3

File details

Details for the file baseqRNA-1.5.tar.gz.

File metadata

Download URL: baseqRNA-1.5.tar.gz
Upload date: Jan 17, 2019
Size: 16.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/38.4.0 requests-toolbelt/0.8.0 tqdm/4.21.0 CPython/3.6.4

File hashes

Hashes for baseqRNA-1.5.tar.gz
Algorithm	Hash digest
SHA256	`152adebd57aa184ada3c9ad033a27324e52fbce384d31ecf007985c68b138f0f`
MD5	`08ab0b486ff2fcd51f9c277f883adab0`
BLAKE2b-256	`fd7cbcaee938ae69df7feb988b631477b6c85cdc20a217b56dae18fdd3efe5af`

See more details on using hashes here.

File details

Details for the file baseqRNA-1.5-py2.py3-none-any.whl.

File metadata

Download URL: baseqRNA-1.5-py2.py3-none-any.whl
Upload date: Jan 17, 2019
Size: 24.0 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/38.4.0 requests-toolbelt/0.8.0 tqdm/4.21.0 CPython/3.6.4

File hashes

Hashes for baseqRNA-1.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`56530794332cabb3527ee98372967f11cfd7ff228f429a3276cf48a01f3a2f0c`
MD5	`f039d9a489a4dd7259bd89b2c7c13322`
BLAKE2b-256	`56da152269c62bcd76786056408cdb51027daf54c4ae1305140eaf9a6f2a4e4a`

See more details on using hashes here.

baseqRNA 1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes