Skip to main content

From a BAM, convert each readgroup to a json/tsv object needed to create a GDC Read Group node.

Project description


PyPI version


This package will extract the Read Group header lines from a BAM file, and convert the contained metadata to a json or tsv file with appropriate values applied for creation of a Read Group node in the NCI's Genomic Data Commons (GDC). Optionally, it will take no input, and output a template which may be edited to create a submission to the GDC.

The generated file may contain some fields marked REQUIRED<type>, which indicates these fields could not be generated from the supplied BAM file. In this case, the user must apply their own desired values to the generated json. The <type> must be as indicated in the generated json file. For details, see the column Acceptable Types or Values at the GDC Data Dictionary Viewer.

Other fields are optional, and are marked OPTIONAL<type>. If these fields could not be generated from the supplied BAM file, they may be filled in as appropriate or removed.


The tool will only run on complete BAM files - files which contain the suffix .bam.

If the BAM is truncated, the error

    OSError: no BGZF EOF marker; file may be truncated

will be generated, and no json will be produced.


There are 2 ways to install gdc-readgroups

pip install from pypi

gdc-readgroups may be used as a pip installed python package.

If you would like to install the package as root, for all users, run

sudo pip install gdc-readgroups

If you would like to install the package only for a local user, run

pip install gdc-readgroups --user

Build a Docker Image

The github repository for this package contains a Dockerfile, which may be used to build an image containing the package and all prerequisites. There are two ways to build the image.

  1. Using docker directly.

    docker build -t gdc-readgroups .
  2. Using cwltool to build an image, and then run it, in one command.

    In this case the cwl tool will expect a BAM input, and produce a json output. To install the reference CWL engine, run

    pip install cwltool --user

    Then to build the gdc-readgroups Docker Image and run the Container, run

    cwltool gdc-readgroups.cwl --INPUT <your bam file>

    The above command will only build the Docker Image if it does not exist on the system. After the build is performed once, the image will remain on your system, and the next cwltool run will skip the build step.


gdc-readgroups has two main modes: bam-mode and template-mode.


In bam-mode, a path to a BAM file must be supplied as input. By default, bam-mode will output a json file, but optionally may output a tsv file.

The command to run the pip installed package is

gdc-readgroups bam-mode --bam_path <your bam file>

The generated json will be placed in the current working directory and have a filename of <bam basename>.json. Any error messages will be written to stdout.

To output a tsv file, run

gdc-readgroups bam-mode --bam_path <your bam file> --output-format tsv

The generated tsv file will be placed in your current working directory, and be of the form <bam basename>.tsv


In template-mode, no input is supplied, and two empty records are output within one file, either in json or tsv format.

To generate a json template, run

gdc-readgroups template-mode

The output will be placed in the current working directory and have a filename of gdc_readgroups.json

To generate a tsv template, run

gdc-readgroups template-mode --output-format tsv

The output will be placed in the current working directory and have a filename of gdc_readgroups.tsv

Project details

Release history Release notifications | RSS feed

This version


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gdc_readgroups-0.4.tar.gz (9.7 kB view hashes)

Uploaded source

Built Distribution

gdc_readgroups-0.4-py2.py3-none-any.whl (23.6 kB view hashes)

Uploaded py2 py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page