Skip to main content

Dirichlet allocation of mutations in cancer genomes

Project description

Dirichlet Allocation of MUTAtions in cancer

Damage and Misrepair Signatures: Compact Representations of Pan-cancer Mutational Processes

Documentation Status Python 3.8+ License: CC BY-NC-SA 4.0 PyPI version


DAMUTA signature definitions

nb. internally these signatures are referred to by their symbols in the graphical model: eta and phi respectively.

Features

  • Separately model damage and misrepair processes
  • Estimate activities of DAMUTA signatures
  • Fit new Damage- and Misrepair-signatures denovo

Model

image

Installation

DAMUTA is built on pymc3 - which depends on theano. To use theano with gpu, you will need to install pygpu. The simplest way to do so is via conda.

conda create -f damuta_env.yml

from pipy

pip install damuta

From Environment File

Clone this repo git clone https://github.com/morrislab/damuta

conda env create -f damuta_env.yml
conda activate damuta
pip install -e .

theanorc

To use the GPU, ~/.theanorc should contain the following:

[global]
floatX = float64
device = cuda

Otherwise, device will default to CPU.

Data

Some files are omitted from this repository due to access restrictions. access can be requested from the corresponding sources:

Data for reproducing manuscript figures

Unrestricted-access data and certain useful intemediate files are also available via can be downloaded from zenodo

To download and organize these data:

# in top-level directory
wget  https://zenodo.org/records/15685052/files/damuta_zenodo.zip
unzip damuta_zenodo

mv damuta_zenodo/data/* manuscript/data
mv damuta_zenodo/figure_data/* manuscript/results/figure_data

# clean up now-empty directories
rmdir damuta_zenodo/data damuta_zenodo/figure_data damuta_zenodo

Some useful public data

file name info source
COSMIC_v3.2_SBS_GRCh37.csv COSMIC database
icgc_sample_annotations_summary_table.txt sample annotations used by PCAWG heterogeneity & evolution working group ICGC data portal
PCAWG_sigProfiler_SBS_signatures_in_samples counts of mutations attributed to each signature for PCAWG samples syn11738669.7
pcawg_counts.csv mutation type counts in PCAWG samples Derived from syn7357330
pcawg_cancer_types.csv sample annotations used in Jiao et. al Adapted from z-scores file
gel_clinical_ann.csv tumour type annotations for 18640 samples (ICGC, HMF, GEL) Adapted from Degasperi et. al table S6
gel_counts.csv mutation type counts for 18640 samples (ICGC, HMF, GEL) Adapted from Degasperi et. al table S7

Citation


@misc{harrigan_damage_2025,
	title = {Damage and {Misrepair} {Signatures}: {Compact} {Representations} of {Pan}-cancer {Mutational} {Processes}},
	copyright = {© 2025, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at http://creativecommons.org/licenses/by-nc/4.0/},
	shorttitle = {Damage and {Misrepair} {Signatures}},
	url = {https://www.biorxiv.org/content/10.1101/2025.05.29.656360v1},
	doi = {10.1101/2025.05.29.656360},
	language = {en},
	urldate = {2025-06-02},
	publisher = {bioRxiv},
	author = {Harrigan, Caitlin F. and Campbell, Kieran and Morris, Quaid and Funnell, Tyler},
	month = jun,
	year = {2025},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

damuta-1.0.0.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

damuta-1.0.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file damuta-1.0.0.tar.gz.

File metadata

  • Download URL: damuta-1.0.0.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for damuta-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b2445d7affac5d4aacfc9cb9fbec1d85c5e77c2786f3d0fb32efdd7bd4540a12
MD5 1fedfea793d25a8b2197ef19043d1b25
BLAKE2b-256 4439f22802a4e172b3b075d504701916119ec44ee6faa08722591e088bac77d5

See more details on using hashes here.

File details

Details for the file damuta-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: damuta-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for damuta-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 45fb5ae2d516a2c7c07c1f109dd0c72d589dce50ed4887a1973465b425714429
MD5 676cb87b260454cff931ca4029861547
BLAKE2b-256 c7e21064c8f6f2a7758498a008d024a3325d83de64d3c41e7898183b45f5d307

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page