A cli for running multiple pbs/qsub jobs with HTSeq's htseq-count script on a cluster.

These details have not been verified by PyPI

Project links

Project description

htseq-count-cluster

A cli wrapper for running htseq's htseq-count on a cluster.

Install

Requires Python 3.9 or higher.

pip install HTSeqCountCluster

Features

For use with large datasets (we've previously used a dataset of 120 different human samples)
For use with SGE/SGI cluster systems
Submits multiple jobs
Command line interface/script
Merges counts files into one counts table/csv file
Uses accepted_hits.bam file output of tophat

Examples

Run htseq-count-cluster

After generating bam output files from tophat, instead of using HTSeq's htseq-count, you can use our htseq-count-cluster script. This script is intended for use with clusters that are using pbs (qsub) for job monitoring.

Our default htseq-count command is htseq-count -f bam -s no file.bam file.gtf -o htseq.out. This command does not take into account any strandedness (-s no) for the input bamfiles (-f bam) and uses the default union mode. For the default mode union, only the aligned read determines how the read pair is counted.

Legacy mode (still supported):

htseq-count-cluster -p path/to/bam-files/ -f samples.csv -g genes.gtf -o path/to/cluster-output/

New subcommand mode:

htseq-count-cluster run -p path/to/bam-files/ -f samples.csv -g genes.gtf -o path/to/cluster-output/

Argument	Description	Required
`-p`	This is the path of your .bam files. Presently, this script looks for a folder that is the sample name and searches for an accepted_hits.bam file (tophat output).	Yes
`-f`	You should have a csv file list of your samples or folder names (no header).	Yes
`-g`	This should be the path to your genes.gtf file.	Yes
`-o`	This should be an existing directory for your output counts files.	Yes
`-e`	Email address to send script completion notifications to.	No

This script uses logzero so there will be color coded logging information to your shell.

A common linux practice is to use screen to create a new shell and run a program so that if it does produce output to the stdout/shell, the user can exit that particular shell without the program ending and utilize another shell.

Help message output for `htseq-count-cluster`

usage: htseq-count-cluster [-h] COMMAND ...

This is a command line wrapper around htseq-count.

positional arguments:
  COMMAND
    run                 Run htseq-count jobs on a cluster
    merge               Merge multiple counts tables into one CSV file

optional arguments:
  -h, --help            show this help message and exit

*Ensure that htseq-count is in your path.

For the run subcommand:

usage: htseq-count-cluster run [-h] -p INPATH -f INFILE -g GTF -o OUTPATH [-e EMAIL]

Submit multiple htseq-count jobs to a cluster.

optional arguments:
  -h, --help            show this help message and exit
  -p INPATH, --inpath INPATH
                        Path of your samples/sample folders.
  -f INFILE, --infile INFILE
                        Name or path to your input csv file.
  -g GTF, --gtf GTF     Name or path to your gtf/gff file.
  -o OUTPATH, --outpath OUTPATH
                        Directory of your output counts file. The counts file
                        will be named.
  -e EMAIL, --email EMAIL
                        Email address to send script completion to.

Merge output counts files

In order to prep your data for DESeq2, limma or edgeR, it's best to have 1 merged counts file instead of multiple files produced from the htseq-count-cluster script.

Using the merge subcommand:

htseq-count-cluster merge -d path/to/cluster-output/

Or using the standalone command (still available):

merge-counts -d path/to/cluster-output/

Help message for `merge` subcommand

usage: htseq-count-cluster merge [-h] -d DIRECTORY

Merge multiple counts tables into 1 counts .csv file.

Your output file will be named:  merged_counts_table.csv

optional arguments:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory DIRECTORY
                        Path to folder of counts files.

ToDo

Monitor jobs.
Enhance wrapper input for other use cases.
Add example data.

Maintainers

Shaurita Hutchins | @sdhutchins | ✉
Rob Gilmore | @grabear | ✉

Help

Please feel free to open an issue if you have a question/feedback/problem or submit a pull request to add a feature/refactor/etc. to this project.

Citation

Simon Anders, Paul Theodor Pyl, Wolfgang Huber; HTSeq—a Python framework to work with high-throughput sequencing data, Bioinformatics, Volume 31, Issue 2, 15 January 2015, Pages 166–169, https://doi.org/10.1093/bioinformatics/btu638

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.5

Dec 23, 2025

1.3

May 16, 2018

1.2

May 15, 2018

1.1

May 15, 2018

1.0

May 15, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htseqcountcluster-1.5.tar.gz (16.1 kB view details)

Uploaded Dec 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

htseqcountcluster-1.5-py3-none-any.whl (19.5 kB view details)

Uploaded Dec 23, 2025 Python 3

File details

Details for the file htseqcountcluster-1.5.tar.gz.

File metadata

Download URL: htseqcountcluster-1.5.tar.gz
Upload date: Dec 23, 2025
Size: 16.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for htseqcountcluster-1.5.tar.gz
Algorithm	Hash digest
SHA256	`006322fb7379d61c2a5019149751a41ac5d7343f1d8d6f40063b2dc71e9f0875`
MD5	`bde2f3c71bd8dbb7aa64652719e0a635`
BLAKE2b-256	`78b68343f1f0623d6d22ff302a86e1d2c314eecbe63c5d389edbecf1206801bf`

See more details on using hashes here.

File details

Details for the file htseqcountcluster-1.5-py3-none-any.whl.

File metadata

Download URL: htseqcountcluster-1.5-py3-none-any.whl
Upload date: Dec 23, 2025
Size: 19.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for htseqcountcluster-1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5d9f1289de1b60288f15730d0acffa90718bdf29ac643ffc2b29e1d28335aa46`
MD5	`656d16aa8ddff850202ad3614f11cf24`
BLAKE2b-256	`9fe48c69dcf56f07b599f9474fa898d99ac13154ace458b074a3c1e506365905`

See more details on using hashes here.

HTSeqCountCluster 1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

htseq-count-cluster

Install

Features

Examples

Run htseq-count-cluster

Help message output for `htseq-count-cluster`

Merge output counts files

Help message for `merge` subcommand

ToDo

Maintainers

Help

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

HTSeqCountCluster 1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

htseq-count-cluster

Install

Features

Examples

Run htseq-count-cluster

Help message output for htseq-count-cluster

Merge output counts files

Help message for merge subcommand

ToDo

Maintainers

Help

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Help message output for `htseq-count-cluster`

Help message for `merge` subcommand