Skip to main content

Runs a variety of bioinformatics tools on O2 (HMS' HPC), such as long read alignment and QC

Project description

o2-processing-utils

Tool for running and logging various bioinformatics tools on Harvard Medical School's high performance computing cluster O2.

Installation

Conda environment

To install the package, you can first install conda and then create a conda environment using the provided .yml file with the following command: conda env create -f o2_processing_utils.yml. The environment can then be activated with the command conda activate o2_processing_utils.

Manual installation

You can also set up the environment manually. Run pip install o2_processing_utils to install the package. You also need to install samtools, pbmm2 1.13.*, and at least Python 3.8.

Usage

After installation, first ensure that the environment variable O2_PROCESSING_CONFIG is set to the absolute path of the .json file containing the path to the reference file, the path to the log file, and the resources to be allocated to the Slurm job running pbmm2. You can set the environment variable using the command

export O2_PROCESSING_CONFIG=/PATH_TO_CONFIG/config.json

An example configuration file (config.example.json) is provided.

To analyze a PacBio HiFi/Fiber-Seq unaligned BAM, repeatedly run the following command from the command line:

o2p-run-pbmm2-workflow -b <input.bam>

Each time the command is run, a single step will be performed on the file or files of interest; as a result, you will have to run the same command several times on the same file. Repeat runs of the command will automatically perform the next analysis if the previous step was completed successfully. The workflow for a file is finished once you receive the message "The workflow is complete for file {file_name}. Nothing else is done for this file". The steps that are run include alignment, performing basic checks, gathering QC metrics, and parsing QC metrics. The tool will automatically submit Slurm jobs for the steps if needed. To analyze multiple samples simultaneously, the user can pass the folder path containing the unaligned BAM files as an argument to the command with the -f flag.

The following commands are provided for additional functionality:

Command Description
o2p-print-config Print out O2_PROCESSING_CONFIG.
o2p-reset-pbmm2-workflow Reset a given workflow step for a given BAM file. This command only works for workflow runs that are incomplete.
o2p-print-qc-file Print out a specified qc file in a more human-readable format.
o2p-create-summary-qc-file Generate a summary QC file from a set of individual .qc files.
o2p-search-log Search the log for a given string.

For additional information, you can type any of the following commands into the command line followed by the flag --help. If you forget any of the available commands, you can also type o2p- into the command line and then hit TAB twice. This will display all of the available functions.

The currently supported alignment tools are:

  • pbmm2 v1.13.* (employs minimap2 v2.26)

The currently supported QC tools are:

  • samtools (samtools stats)

Development

To develop this package, clone this repo, make sure poetry is installed on your system and run make install.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

o2_processing_utils-0.1.3.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

o2_processing_utils-0.1.3-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file o2_processing_utils-0.1.3.tar.gz.

File metadata

  • Download URL: o2_processing_utils-0.1.3.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.12.0 Darwin/22.6.0

File hashes

Hashes for o2_processing_utils-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d36a3fd9dc3c8a68a7cf3d1c3ad93fd9fae7c8316cb1b138f2913de36dbebd6b
MD5 3ff6b369aa3cc5415f021e140fd079e4
BLAKE2b-256 efb234a41f1698d4fa25714471cb8a84b2360c5e288f0fcce938474cfeee622a

See more details on using hashes here.

File details

Details for the file o2_processing_utils-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for o2_processing_utils-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e8bf813c94c41ce84b30995322f69eb69bd004c8884d089b24342ba731f459a5
MD5 eb6a6614dada175f88c01138910b91b6
BLAKE2b-256 65e78a336f80a36ba258c4298f1daf093b2bd1ee6c844eee32b6e36231c9c612

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page