pyrpipe
Project description
pyrpipe: python rna-seq pipeliner
Introduction
pyrpipe (Pronounced as "pyre-pipe") is a python package to easily develop bioinformatic or any other computational pipelines in pure python. pyrpipe provides an easy-to-use framework for importing any UNIX command in python. pyrpipe comes with specialized classes and functions to easily code RNA-Seq processing workflows. Pipelines in pyrpipe can be created and extended by integrating third-party tools, executable scripts, or python libraries in an object oriented manner.
Preprint is available here
Read the docs here
NOTE: Due to change in API designs, pyrpipe version 0.0.5 and above is not compatible with lower versions. All the tutorials and documentation have been updated to reflect v0.0.5.
What it does
Allows fast and easy development of bioinformatics pipelines in python by providing
- a high level api to popular RNA-Seq processing tools -- downloading, trimming, alignment, quantificantion and assembly
- optimizes program parameters based on the data
- a general framework to execute any linux command from python
- comprehensive logging features to log all the commands, output and their return status
- report generating features for easy sharing, reproducing, benchmarking and debugging
Key Features (version 0.0.5)
- Import any UNIX executable command/tool in python
- Dry-run feature to check dependencies and commands before execution
- Flexible and robust handling of options and arguments (both Linux and Java style options)
- Auto load command options from .yaml files
- Easily override threads and memory options using global values
- Extensive logging for all the commands
- Automatically verify Integrity of output targets
- Resume feature to restart pipelines/jobs from where interrupted
- Create reports, MultiQC reports for bioinformatic pipelines
- Easily integrated into workflow managers like Snakemake and NextFlow (to schedule jobs, scale jobs, identify paralellel steps in pipelines)
What it CAN NOT do by itself
- Schedule jobs
- Scale jobs on HPC/cloud
- Identify parallel steps in pipelines
Prerequisites
- python 3.6 or higher
- OS: Linux, Mac
API to RNA-Seq tools include:
Tool | Purpose |
---|---|
SRA Tools (v. 2.9.6 or higher) | SRA access |
Trimgalore | Trimming |
BBDuk | Trimming |
Hisat2 | Alignment |
STAR | Alignment |
Bowtie2 | Alignment |
Kallisto | Quantification |
Salmon | Quantification |
Stringtie | Transcript Assembly |
Cufflinks | Transcript Assembly |
Samtools | Tools |
Examples
Get started with the basic tutorial. Read the documentation here. Several examples are provided here
Download, trim and align RNA-Seq data
Following python code downloads data from SRA, uses Trim Galore to trim the fastq files and STAR to align reads. More detailed examples are provided here
from pyrpipe.sra import SRA
from pyrpipe.qc import Trimgalore
from pyrpipe.mapping import Star
trimgalore = Trimgalore(threads=8)
star = Star(index='data/index',threads=4)
for srr in ['SRR976159','SRR978411','SRR971778']:
SRA(srr).trim(trimgalore).align(star)
Import a Unix command
This simple example imports and runs the Unix grep
command. See this for more examples.
>>> from pyrpipe.runnable import Runnable
>>> grep=Runnable(command='grep')
>>> grep.run('query1','file1.txt',verbose=True)
>>> grep.run('query2','file2.txt',verbose=True)
Installation
Please follow these instructions:
To create a new Conda environment (recommended):
NOTE: You need to install the third-party tools to work with pyrpipe. We recomend installing these through bioconda where possible. An example of setting up the environment using conda is provided below. It is best to share your conda environment files with pyrpipe scripts to ensure reproducibility.
- Download and install Conda
conda create -n pyrpipe python=3.8
conda activate pyrpipe
conda install -c bioconda pyrpipe star=2.7.7a sra-tools=2.10.9 stringtie=2.1.4 trim-galore=0.6.6
The above command will install pyrpipe and the required tools inside a conda environment. Alternatively, use the conda environment.yaml file provided in this repository and build the conda environment by running
conda env create -f environment.yaml
Install latest stable version
Through conda
conda install -c bioconda pyrpipe
Through PIP
pip install pyrpipe --upgrade
If above command fails due to dependency issues, try:
- Download the requirements.txt
pip install -r requirements.txt
pip install pyrpipe
To run tests:
- Download the test set (direct link)
pip install pytest
- To build test_environment. Please READ THIS
- From pyrpipe root directory, run
pytest tests/test_*
Install dev version
git clone https://github.com/urmi-21/pyrpipe.git
pip install -r pyrpipe/requirements.txt
pip install -e path_to/pyrpipe
#Running tests; From pyrpipe root perform
#To build test_environment (This will download tools):
cd tests ; . ./build_test_env.sh
#in same terminal
py.test tests/test_*
Setting NCBI SRA-Tools
If you face problems with downloading data from SRA, try configuring the SRA-Tools.
Use vdb-config -i
to configure SRA Toolkit. Make sure that:
- Under the TOOLS tab, prefetch downloads to is set to public user-repository
- Under the CACHE tab, location of public user-repository is not empty
Use the following pyrpipe_diagnostic command to test if SRA-Tools are setup properly
pyrpipe_diagnostic test
Funding
This work is funded in part by the National Science Foundation award IOS 1546858, "Orphan Genes: An Untapped Genetic Reservoir of Novel Traits".
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyrpipe-0.0.5.tar.gz
.
File metadata
- Download URL: pyrpipe-0.0.5.tar.gz
- Upload date:
- Size: 55.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/42.0.0 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71ab3d654f30a870013297b4978079ebf8b27216d6b9c50306646551096bd354 |
|
MD5 | e6fad72de4d0a01788c4c4a316f2a2d6 |
|
BLAKE2b-256 | 14832fbb6624a4f89d20125f7f065ba441cd50b1d97e6effbcf7bef0ae738ef9 |