cdpipelines

Various bioinformatics pipelines.

These details have not been verified by PyPI

Project links

Development Status
- 2 - Pre-Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

This repository holds various bioinformatics pipelines.

Dependencies

Many of the pipeline dependencies can be obtained using the prepare submodule (see below). Additionally, a working Python environment is needed along with some of the common scientific python packages. I recommend using Anaconda python since it includes most of the needed packages. If you are using Anaconda, I’d recommend making new environments for different projects. Besides the default Anaconda packages, you will need the following (available through conda or pip):

HTSeq
pandas
pysam (this is available through conda but currently it’s an old version so you have to get it using pip)
PyVCF

rpy2

Installing rpy2 can be tricky. Different versions of R and rpy2 don’t work well together, so it’s recommended to make a local installation of R using the prepare submodule and compile rpy2 against this installation. You can install R using prepare.download_r and install rpy2 using prepare.download_install_rpy2. prepare.download_install_rpy2 will prompt you to set your PATH, LDFLAGS, and LD_LIBRARY_PATH to correctly install rpy2. After installing rpy2, you need to set your PATH and LD_LIBRARY_PATH using these commands for every bash session where you want to use this rpy2. I’d recommend putting the commands in a file that you source every time you load the project’s Anaconda environment.

Submodules

general

general contains methods used in multiple pipelines. Some pipelines use similar but different versions of some methods, so the pipelines will have their own versions of those methods. Sometimes it may make sense to add options to a particular method that is used in multiple pipelines (where each pipeline has slightly different versions) and add the method into general.

prepare

The prepare module contains functions for downloading various software and reference files needed for the different pipelines.

rnaseq

This pipeline currently starts from fastq files and has two steps. For detailed information on each step, so the docstrings for each method. The first step is align_and_sort which (optionally) removes duplicates, aligns the reads, and makes coverage bigwig files for use with the UCSC genome browser or IGV. The read alignments are output in both genomic and transcriptomic coordinates. The second step is get_counts which counts reads overlapping genes for gene differential expression and exonic bins for use with DEXSeq.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 2 - Pre-Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

0.0.7

Dec 13, 2016

0.0.6

Dec 13, 2016

This version

0.0.5

Jun 9, 2016

0.0.4

Apr 29, 2016

0.0.3

Apr 27, 2016

0.0.2

Apr 18, 2016

0.0.1

Apr 5, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdpipelines-0.0.5.tar.gz (57.9 kB view details)

Uploaded Jun 9, 2016 Source

File details

Details for the file cdpipelines-0.0.5.tar.gz.

File metadata

Download URL: cdpipelines-0.0.5.tar.gz
Upload date: Jun 9, 2016
Size: 57.9 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for cdpipelines-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`4414177817d1d25ad6516ac3c1a61300b40e1434ef219987cfa8c944b1c0fa25`
MD5	`be3b4c48166b6751604aa4656d0764d1`
BLAKE2b-256	`969e9d71933e6d5499e4a805d431ec4496bc46779ce63a6aa9ed6cbe381cd502`