Skip to main content

Dedicated caller for DUX4 rearrangements from whole genome sequencing data.

Project description

Pelops

Dedicated caller for DUX4 rearrangements from whole genome sequencing data.

Citing

Pelops is based on a method first described in:

Ryan, S.L., Peden, J.F., Kingsbury, Z. et al. Whole genome sequencing provides comprehensive genetic testing in childhood B-cell acute lymphoblastic leukaemia. Leukemia 37, 518–528 (2023). https://doi.org/10.1038/s41375-022-01806-8

Pelops itself is described and validated in:

Grobecker, P., Mijuskovic, M., et al. Pelops: A dedicated caller for DUX4 rearrangements from short-read whole genome sequencing data. In preparation (2024)

Prerequisites

  • Python 3.7 and above.

Installation

You can install the latest stable released version of Pelops using pip

pip install ilmn-pelops --upgrade

Note: pip/pypi will prefer stable versions (i.e. 0.7.0) over subsequent beta releases (i.e. 0.8.0b1). If you need to install a beta version for testing, please first uninstall your current version of pelops.

pip uninstall ilmn-pelops

Usage

Pelops is a tool with a command line interface (cli). Discover its usage with

pelops --help

Calling DUX4-rearrangements

To call DUX4-rearrangements from a BAM/CRAM file, use the dux4r subcommand. To see all available options run

pelops dux4r --help

Inputs

The input to Pelops is a short-read whole-genome sequencing BAM or CRAM file from a tumour sample, aligned to the GRCh38 reference genome. The BAM/CRAM file needs to be indexed. Pelops was tested on alignments by DRAGEN (version 4.0.3), bwa (version 0.7.17), and Isaac (version SAAC01325.18.01.29).

Systematic noise BED file

To increase specificity when calling non-IGH DUX4-rearrangements, we recommend using a systematic noise BED file. This file contains genomic regions that will be ignored by Pelops when identifying candidate regions involved in a DUX4-rearrangement. Since such regions can be specific to the read alignment tool, reference genome, sequencing protocol, and cancer type analysed, we recommend creating a separate systematic noise BED file for each project. One way to obtain these genomic regions would be to run Pelops on a panel of normal samples, which are guaranteed to have no DUX4-rearrangements, and generate a list of false-positive calls.

Outputs

Pelops outputs results in a JSON file, and optionally exports supporting reads in SAM files.

Description of JSON output

The top level of the JSON contains information about pelops (assumed genome reference, version, name, and CLI command). It also contains information about the input file (number of unique and mapped reads - which can be a user input). Finally, it contains a list of rearrangements investigated by pelops.

{
    "reference": "GRCh38",
    "unique_mapped_reads": 1000000000,
    "rearrangements": [...],
    "program_name": "pelops",
    "version": "0.5.0",
    "cli_command": "pelops dux4r --total-number-reads 1000000000 --export . test.bam"
}

The rearrangements consist of a unique ID, genomic region sets "A" and "B", and the evidence for the rearrangement between these two regions. For the command pelops dux4r, ID 01 always corresponds to rearrangements between the core DUX4 regions and IGH, while ID 02 corresponds to rearrangements of the extended DUX4 regions with IGH. IDs 03 and beyond are potential rearrangements of the core DUX4 region with other genomic regions (marked as UNNAMED); there can be a variable number of them.

{
  "rearrangements": [
    {
      "id": "01",
      "A": {"name": "CoreDUX4"...},
      "B": {"name": "IGH"...},
      "evidence": {...}
    },
    {
      "id": "02",
      "A": {"name": "ExtendedDUX4"...},
      "B": {"name": "IGH"...},
      "evidence": {...}
    },
    {
      "id": "03",
      "A": {"name": "CoreDUX4"...},
      "B": {"name": "UNNAMED"...},
      "evidence": {...}
    }
  ]
}

A and B document the exact set of genomic regions used for each rearrangement. For example, the core DUX4 region is shown below. While IGH, CoreDUX4 and ExtendedDUX4 are pre-defined, each UNNAMED region will be different.

{
  "name": "CoreDUX4",
  "regions": [
    {
      "chrom": "chr4",
      "start": 190020407,
      "end": 190023665
    },
    {
      "chrom": "chr4",
      "start": 190066935,
      "end": 190093279
    },
    {
      "chrom": "chr4",
      "start": 190172774,
      "end": 190176845
    },
    {
      "chrom": "chr10",
      "start": 133663429,
      "end": 133685936
    },
    {
      "chrom": "chr10",
      "start": 133739606,
      "end": 133762125
    }
  ]
}

The evidence for each rearrangement consists of the number of split and paired reads between region sets A and B, and the spanning read pairs per billion (SRPB). It is calculated as $$\text{SRPB} = 10^9 \frac{\text{paired reads} + \text{split reads}}{\text{total unique and mapped reads}}.$$

{
  "paired_reads": 15,
  "split_reads": 4,
  "SRPB": 19.0
}

SAM files

Optionally, for each rearrangement a SAM file can be exported which contains all paired and split reads with their mates. The naming convention is <id>_<name_A>-<name_B>.sam, where <id>, <name_A>, <name_B> correspond to the ID and names of genomic region sets A and B, respectively, as documented in the JSON.

Contributing

We are not accepting pull requests into this repository at this time, as the licence currently does not allow modifications by third parties. For any bug report / recommendation / feature request, please open an issue.

Credits

See Authors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ilmn-pelops-0.8.0.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ilmn_pelops-0.8.0-py3-none-any.whl (40.9 kB view details)

Uploaded Python 3

File details

Details for the file ilmn-pelops-0.8.0.tar.gz.

File metadata

  • Download URL: ilmn-pelops-0.8.0.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for ilmn-pelops-0.8.0.tar.gz
Algorithm Hash digest
SHA256 39942601dc15c7b4081ca45c39fe36b133eca4989d2e990be730d4d00d9788a9
MD5 a31bd2c20c67709019d1065cfb263790
BLAKE2b-256 f433d279c3180f01b4142e1167eb19ccbddf7ac90b7056a0e7e15d766f531c19

See more details on using hashes here.

File details

Details for the file ilmn_pelops-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: ilmn_pelops-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 40.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for ilmn_pelops-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ac0caa77d9909415e6d10fac6b226a202390212d8444c8afe3934c39107b78f
MD5 47e7273f728963f7edc2a526a0c19f38
BLAKE2b-256 ca30f13eb8a98a2fcf8ac10cf18dcf882a91c0cb0b5b096c276dcad676b69479

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page