Sequencing pipeline
Project description
# RNAseq Analysis Pipeline
## Usage
Execute `main.py` as follows:
```
$ python main.py --help
Usage: seqpipe [OPTIONS]
Create read-genome matrix and compute all read alignments. Subsequently,
apply various scripts and aggregate results.
Options:
-r, --read PATH Path to read file/directory. [required]
-g, --genome PATH Path to genome file/directory. [required]
-o, --output DIRECTORY Directory to save results to.
--scripts / --no-scripts Whether to execute scripts or not.
-m, --min-read-len INTEGER Minimal read length.
-M, --max-read-len INTEGER Maximal read length.
-b, --bowtie-args TEXT Extra arguments for bowtie.
-t, --threads INTEGER How many threads to run in.
--help Show this message and exit.
```
This will create a `mapping_results_*` directory which contains two directories:
* `runs` stores all data related to each individual read file
* `results` contains data generated by scripts from the `scripts` folder
## Extras
Additional useful scripts are contained in `extra`.
The entry point is `main.py` (check `python ./extra/main.py --help` for help).
The respective individual files are:
* `sequential_pipeline.sh`
* map length-filtered reads against multiple genomes in succession
* `plot_sequential_data.py`
* visualize data obtained from sequential pipeline
* `plot_expression_differences.py`
* visualize differences in RNAseq expression levels over pairs of samples
* `utils.py`
* Various helper methods
* `mapping_overview.py`
* Plot various statistics
## Dependencies
Tools:
* cutadapt
* fastqc
* [more info](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/)
* bowtie2
* samtools
* [specifications](https://samtools.github.io/hts-specs/SAMv1.pdf)
* bedtools
* moreutils
Languages:
* bash
* python
* numpy
* pandas
* seaborn
* matplotlib
* tqdm
* biopython
* pysam
* joblib
* click
* sh
* colorama
## Development notes
Create dev-builds with:
```bash
$ pip install --user -e .
```
Run tests using:
```bash
$ tox
```
Release new package as follows:
```bash
$ python setup.py sdist bdist_wheel
```
## Usage
Execute `main.py` as follows:
```
$ python main.py --help
Usage: seqpipe [OPTIONS]
Create read-genome matrix and compute all read alignments. Subsequently,
apply various scripts and aggregate results.
Options:
-r, --read PATH Path to read file/directory. [required]
-g, --genome PATH Path to genome file/directory. [required]
-o, --output DIRECTORY Directory to save results to.
--scripts / --no-scripts Whether to execute scripts or not.
-m, --min-read-len INTEGER Minimal read length.
-M, --max-read-len INTEGER Maximal read length.
-b, --bowtie-args TEXT Extra arguments for bowtie.
-t, --threads INTEGER How many threads to run in.
--help Show this message and exit.
```
This will create a `mapping_results_*` directory which contains two directories:
* `runs` stores all data related to each individual read file
* `results` contains data generated by scripts from the `scripts` folder
## Extras
Additional useful scripts are contained in `extra`.
The entry point is `main.py` (check `python ./extra/main.py --help` for help).
The respective individual files are:
* `sequential_pipeline.sh`
* map length-filtered reads against multiple genomes in succession
* `plot_sequential_data.py`
* visualize data obtained from sequential pipeline
* `plot_expression_differences.py`
* visualize differences in RNAseq expression levels over pairs of samples
* `utils.py`
* Various helper methods
* `mapping_overview.py`
* Plot various statistics
## Dependencies
Tools:
* cutadapt
* fastqc
* [more info](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/)
* bowtie2
* samtools
* [specifications](https://samtools.github.io/hts-specs/SAMv1.pdf)
* bedtools
* moreutils
Languages:
* bash
* python
* numpy
* pandas
* seaborn
* matplotlib
* tqdm
* biopython
* pysam
* joblib
* click
* sh
* colorama
## Development notes
Create dev-builds with:
```bash
$ pip install --user -e .
```
Run tests using:
```bash
$ tox
```
Release new package as follows:
```bash
$ python setup.py sdist bdist_wheel
```