A cli for running multiple pbs/qsub jobs with HTSeq's htseq-count script on a cluster.
Project description
[![Build Status](https://travis-ci.org/datasnakes/htseq-count-cluster.svg?branch=master)](https://travis-ci.org/datasnakes/htseq-count-cluster)
# htseq-count-cluster
A cli wrapper for running [htseq](https://github.com/simon-anders/htseq)'s `htseq-count` on a cluster.
View [documentation](https://tinyurl.com/yb7kz7zz).
## Install
`pip install HTSeqCountCluster`
## Features
- For use with large datasets (we've previously used a dataset of 120 different human samples)
- For use with SGE/SGI cluster systems
- Submits multiple jobs
- Command line interface/script
- Merges counts files into one counts table/csv file
- Uses `accepted_hits.bam` file output of `tophat`
### Examples
#### Run htseq-count-cluster
After generating bam output files from tophat, instead of using HTSeq's htseq count, you
can use our `htseq-count-cluster` script. This script is intended for use with
clusters that are using pbs (qsub) for job monitoring.
```bash
htseq-count-cluster -p path/to/bam-files/ -f samples.csv -g genes.gtf -o path/to/cluster-output/
```
| Argument | Description | Required |
|:--------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|
| `-p` | This is the path of your .bam files. Presently, this script looks for a folder that is the sample name and searches for an accepted_hits.bam file (tophat output). | Yes |
| `-i` | You should have a csv file list of your samples or folder names (no header). | Yes |
| `-g` | This should be the path to your genes.gtf file. | Yes |
| `-o` | This should be an existing directory for your output counts files. | Yes |
| `-e` |
This script uses logzero so there will be color coded logging information to your shell.
A common linux practice is to use `screen` to create a new shell and run a program
so that if it does produce output to the stdout/shell, the user can exit that particular
shell without the program ending and utilize another shell.
##### Help message output for `htseq-count-cluster`
```
usage: htseq-count-cluster [-h] -p INPATH -f INFILE -g GTF -o OUTPATH
[-e EMAIL]
This is a command line wrapper around htseq-count.
optional arguments:
-h, --help show this help message and exit
-p INPATH, --inpath INPATH
Path of your samples/sample folders.
-f INFILE, --infile INFILE
Name or path to your input csv file.
-g GTF, --gtf GTF Name or path to your gtf/gff file.
-o OUTPATH, --outpath OUTPATH
Directory of your output counts file. The counts file
will be named.
-e EMAIL, --email EMAIL
Email address to send script completion to.
*Ensure that htseq-count is in your path.
```
#### Merge output counts files
In order to prep your data for `DESeq2`, `limma` or `edgeR`, it's best to have 1 merged
counts file instead of multiple files produced from the `htseq-count-cluster` script. We offer this
as a standalone script as it may be useful to keep those files separate.
```bash
merge-counts -d path/to/cluster-output/
```
##### Help message for `merge-counts`
```
usage: merge-counts [-h] -d DIRECTORY
Merge multiple counts tables into 1 counts .csv file.
Your output file will be named: merged_counts_table.csv
optional arguments:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Path to folder of counts files.
```
## ToDo
- [ ] Monitor jobs.
- [ ] Enhance wrapper input for other use cases.
- [ ] Add example data.
## Maintainers
Shaurita Hutchins | [@sdhutchins](https://github.com/sdhutchins) | [✉](mailto:sdhutchins@outlook.com)
Rob Gilmore | [@grabear](https://github.com/grabear) | [✉](mailto:robgilmore127@gmail.com)
## Help
Please feel free to [open an issue](https://github.com/datasnakes/htseq-count-cluster/issues/new) if you have a question/feedback/problem
or [submit a pull request](https://github.com/datasnakes/htseq-count-cluster/compare) to add a feature/refactor/etc. to this project.
# htseq-count-cluster
A cli wrapper for running [htseq](https://github.com/simon-anders/htseq)'s `htseq-count` on a cluster.
View [documentation](https://tinyurl.com/yb7kz7zz).
## Install
`pip install HTSeqCountCluster`
## Features
- For use with large datasets (we've previously used a dataset of 120 different human samples)
- For use with SGE/SGI cluster systems
- Submits multiple jobs
- Command line interface/script
- Merges counts files into one counts table/csv file
- Uses `accepted_hits.bam` file output of `tophat`
### Examples
#### Run htseq-count-cluster
After generating bam output files from tophat, instead of using HTSeq's htseq count, you
can use our `htseq-count-cluster` script. This script is intended for use with
clusters that are using pbs (qsub) for job monitoring.
```bash
htseq-count-cluster -p path/to/bam-files/ -f samples.csv -g genes.gtf -o path/to/cluster-output/
```
| Argument | Description | Required |
|:--------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|
| `-p` | This is the path of your .bam files. Presently, this script looks for a folder that is the sample name and searches for an accepted_hits.bam file (tophat output). | Yes |
| `-i` | You should have a csv file list of your samples or folder names (no header). | Yes |
| `-g` | This should be the path to your genes.gtf file. | Yes |
| `-o` | This should be an existing directory for your output counts files. | Yes |
| `-e` |
This script uses logzero so there will be color coded logging information to your shell.
A common linux practice is to use `screen` to create a new shell and run a program
so that if it does produce output to the stdout/shell, the user can exit that particular
shell without the program ending and utilize another shell.
##### Help message output for `htseq-count-cluster`
```
usage: htseq-count-cluster [-h] -p INPATH -f INFILE -g GTF -o OUTPATH
[-e EMAIL]
This is a command line wrapper around htseq-count.
optional arguments:
-h, --help show this help message and exit
-p INPATH, --inpath INPATH
Path of your samples/sample folders.
-f INFILE, --infile INFILE
Name or path to your input csv file.
-g GTF, --gtf GTF Name or path to your gtf/gff file.
-o OUTPATH, --outpath OUTPATH
Directory of your output counts file. The counts file
will be named.
-e EMAIL, --email EMAIL
Email address to send script completion to.
*Ensure that htseq-count is in your path.
```
#### Merge output counts files
In order to prep your data for `DESeq2`, `limma` or `edgeR`, it's best to have 1 merged
counts file instead of multiple files produced from the `htseq-count-cluster` script. We offer this
as a standalone script as it may be useful to keep those files separate.
```bash
merge-counts -d path/to/cluster-output/
```
##### Help message for `merge-counts`
```
usage: merge-counts [-h] -d DIRECTORY
Merge multiple counts tables into 1 counts .csv file.
Your output file will be named: merged_counts_table.csv
optional arguments:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Path to folder of counts files.
```
## ToDo
- [ ] Monitor jobs.
- [ ] Enhance wrapper input for other use cases.
- [ ] Add example data.
## Maintainers
Shaurita Hutchins | [@sdhutchins](https://github.com/sdhutchins) | [✉](mailto:sdhutchins@outlook.com)
Rob Gilmore | [@grabear](https://github.com/grabear) | [✉](mailto:robgilmore127@gmail.com)
## Help
Please feel free to [open an issue](https://github.com/datasnakes/htseq-count-cluster/issues/new) if you have a question/feedback/problem
or [submit a pull request](https://github.com/datasnakes/htseq-count-cluster/compare) to add a feature/refactor/etc. to this project.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file HTSeqCountCluster-1.3-py3-none-any.whl
.
File metadata
- Download URL: HTSeqCountCluster-1.3-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b15ec712b2198589f95a1a8f317ebbbc6c3faf4bff2693e735939aa471cb399f |
|
MD5 | b62acd8ced77f4583fe6f235ff3ad4ac |
|
BLAKE2b-256 | 752957823375630962e48560163ca9a466fe6c1d2655487fef72b4ac0d2d02df |