Dirichlet allocation of mutations in cancer genomes
Project description
Dirichlet Allocation of MUTAtions in cancer
Damage and Misrepair Signatures: Compact Representations of Pan-cancer Mutational Processes
DAMUTA signature definitions
nb. internally these signatures are referred to by their symbols in the graphical model: eta and phi respectively.
Features
- Separately model damage and misrepair processes
- Estimate activities of DAMUTA signatures
- Fit new Damage- and Misrepair-signatures denovo
Model
Installation
DAMUTA is built on pymc3 - which depends on theano. To use theano with gpu, you will need to install pygpu. The simplest way to do so is via conda.
conda create -f damuta_env.yml
from pipy
pip install damuta
From Environment File
Clone this repo git clone https://github.com/morrislab/damuta
conda env create -f damuta_env.yml
conda activate damuta
pip install -e .
theanorc
To use the GPU, ~/.theanorc should contain the following:
[global]
floatX = float64
device = cuda
Otherwise, device will default to CPU.
Data
Some files are omitted from this repository due to access restrictions. access can be requested from the corresponding sources:
Data for reproducing manuscript figures
Unrestricted-access data and certain useful intemediate files are also available via can be downloaded from zenodo
To download and organize these data:
# in top-level directory
wget https://zenodo.org/records/15685052/files/damuta_zenodo.zip
unzip damuta_zenodo
mv damuta_zenodo/data/* manuscript/data
mv damuta_zenodo/figure_data/* manuscript/results/figure_data
# clean up now-empty directories
rmdir damuta_zenodo/data damuta_zenodo/figure_data damuta_zenodo
Some useful public data
| file name | info | source |
|---|---|---|
| COSMIC_v3.2_SBS_GRCh37.csv | COSMIC database | |
| icgc_sample_annotations_summary_table.txt | sample annotations used by PCAWG heterogeneity & evolution working group | ICGC data portal |
| PCAWG_sigProfiler_SBS_signatures_in_samples | counts of mutations attributed to each signature for PCAWG samples | syn11738669.7 |
| pcawg_counts.csv | mutation type counts in PCAWG samples | Derived from syn7357330 |
| pcawg_cancer_types.csv | sample annotations used in Jiao et. al | Adapted from z-scores file |
| gel_clinical_ann.csv | tumour type annotations for 18640 samples (ICGC, HMF, GEL) | Adapted from Degasperi et. al table S6 |
| gel_counts.csv | mutation type counts for 18640 samples (ICGC, HMF, GEL) | Adapted from Degasperi et. al table S7 |
Citation
@misc{harrigan_damage_2025,
title = {Damage and {Misrepair} {Signatures}: {Compact} {Representations} of {Pan}-cancer {Mutational} {Processes}},
copyright = {© 2025, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at http://creativecommons.org/licenses/by-nc/4.0/},
shorttitle = {Damage and {Misrepair} {Signatures}},
url = {https://www.biorxiv.org/content/10.1101/2025.05.29.656360v1},
doi = {10.1101/2025.05.29.656360},
language = {en},
urldate = {2025-06-02},
publisher = {bioRxiv},
author = {Harrigan, Caitlin F. and Campbell, Kieran and Morris, Quaid and Funnell, Tyler},
month = jun,
year = {2025},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file damuta-1.0.0.tar.gz.
File metadata
- Download URL: damuta-1.0.0.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2445d7affac5d4aacfc9cb9fbec1d85c5e77c2786f3d0fb32efdd7bd4540a12
|
|
| MD5 |
1fedfea793d25a8b2197ef19043d1b25
|
|
| BLAKE2b-256 |
4439f22802a4e172b3b075d504701916119ec44ee6faa08722591e088bac77d5
|
File details
Details for the file damuta-1.0.0-py3-none-any.whl.
File metadata
- Download URL: damuta-1.0.0-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45fb5ae2d516a2c7c07c1f109dd0c72d589dce50ed4887a1973465b425714429
|
|
| MD5 |
676cb87b260454cff931ca4029861547
|
|
| BLAKE2b-256 |
c7e21064c8f6f2a7758498a008d024a3325d83de64d3c41e7898183b45f5d307
|