Skip to main content

Dedicated caller for DUX4 rearrangements from whole genome sequencing data.

Project description

Pelops

Dedicated caller for DUX4 rearrangements from whole genome sequencing data.

Citing

Pelops is based on a method first described in:

Ryan, S.L., Peden, J.F., Kingsbury, Z. et al. Whole genome sequencing provides comprehensive genetic testing in childhood B-cell acute lymphoblastic leukaemia. Leukemia 37, 518–528 (2023). https://doi.org/10.1038/s41375-022-01806-8

Pelops itself is described and validated in:

Grobecker, P., Mijuskovic, M., et al. Pelops: A dedicated caller for DUX4 rearrangements from short-read whole genome sequencing data. In preparation (2024)

Prerequisites

  • Python 3.7 and above.

Installation

You can install the latest stable released version of Pelops using pip

pip install ilmn-pelops --upgrade

Note: pip/pypi will prefer stable versions (i.e. 0.7.0) over subsequent beta releases (i.e. 0.8.0b1). If you need to install a beta version for testing, please first uninstall your current version of pelops.

pip uninstall ilmn-pelops

Usage

Pelops is a tool with a command line interface (cli). Discover its usage with

pelops --help

Calling DUX4-rearrangements

To call DUX4-rearrangements from a BAM/CRAM file, use the dux4r subcommand. To see all available options run

pelops dux4r --help

Inputs

The input to Pelops is a short-read whole-genome sequencing BAM or CRAM file from a tumour sample, aligned to the GRCh38 reference genome. The BAM/CRAM file needs to be indexed. Pelops was tested on alignments by DRAGEN (version 4.0.3), bwa (version 0.7.17), and Isaac (version SAAC01325.18.01.29).

Systematic noise BED file

To increase specificity when calling non-IGH DUX4-rearrangements, we recommend using a systematic noise BED file. This file contains genomic regions that will be ignored by Pelops when identifying candidate regions involved in a DUX4-rearrangement. Since such regions can be specific to the read alignment tool, reference genome, sequencing protocol, and cancer type analysed, we recommend creating a separate systematic noise BED file for each project. One way to obtain these genomic regions would be to run Pelops on a panel of normal samples, which are guaranteed to have no DUX4-rearrangements, and generate a list of false-positive calls.

Outputs

Pelops outputs results in a JSON file, and optionally exports supporting reads in SAM files.

Description of JSON output

The top level of the JSON contains information about pelops (assumed genome reference, version, name, and CLI command). It also contains information about the input file (number of unique and mapped reads - which can be a user input). Finally, it contains a list of rearrangements investigated by pelops.

{
    "reference": "GRCh38",
    "unique_mapped_reads": 1000000000,
    "rearrangements": [...],
    "program_name": "pelops",
    "version": "0.5.0",
    "cli_command": "pelops dux4r --total-number-reads 1000000000 --export . test.bam"
}

The rearrangements consist of a unique ID, genomic region sets "A" and "B", and the evidence for the rearrangement between these two regions. For the command pelops dux4r, ID 01 always corresponds to rearrangements between the core DUX4 regions and IGH, while ID 02 corresponds to rearrangements of the extended DUX4 regions with IGH. IDs 03 and beyond are potential rearrangements of the core DUX4 region with other genomic regions (marked as UNNAMED); there can be a variable number of them.

{
  "rearrangements": [
    {
      "id": "01",
      "A": {"name": "CoreDUX4"...},
      "B": {"name": "IGH"...},
      "evidence": {...}
    },
    {
      "id": "02",
      "A": {"name": "ExtendedDUX4"...},
      "B": {"name": "IGH"...},
      "evidence": {...}
    },
    {
      "id": "03",
      "A": {"name": "CoreDUX4"...},
      "B": {"name": "UNNAMED"...},
      "evidence": {...}
    }
  ]
}

A and B document the exact set of genomic regions used for each rearrangement. For example, the core DUX4 region is shown below. While IGH, CoreDUX4 and ExtendedDUX4 are pre-defined, each UNNAMED region will be different.

{
  "name": "CoreDUX4",
  "regions": [
    {
      "chrom": "chr4",
      "start": 190020407,
      "end": 190023665
    },
    {
      "chrom": "chr4",
      "start": 190066935,
      "end": 190093279
    },
    {
      "chrom": "chr4",
      "start": 190172774,
      "end": 190176845
    },
    {
      "chrom": "chr10",
      "start": 133663429,
      "end": 133685936
    },
    {
      "chrom": "chr10",
      "start": 133739606,
      "end": 133762125
    }
  ]
}

The evidence for each rearrangement consists of the number of split and paired reads between region sets A and B, and the spanning read pairs per billion (SRPB). It is calculated as $$\text{SRPB} = 10^9 \frac{\text{paired reads} + \text{split reads}}{\text{total unique and mapped reads}}.$$

{
  "paired_reads": 15,
  "split_reads": 4,
  "SRPB": 19.0
}

SAM files

Optionally, for each rearrangement a SAM file can be exported which contains all paired and split reads with their mates. The naming convention is <id>_<name_A>-<name_B>.sam, where <id>, <name_A>, <name_B> correspond to the ID and names of genomic region sets A and B, respectively, as documented in the JSON.

Contributing

We are not accepting pull requests into this repository at this time, as the licence currently does not allow modifications by third parties. For any bug report / recommendation / feature request, please open an issue.

Credits

See Authors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ilmn-pelops-0.8.0b48.tar.gz (31.1 kB view hashes)

Uploaded source

Built Distribution

ilmn_pelops-0.8.0b48-py3-none-any.whl (40.9 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page