Mutational Signature Simulation
Project description
SomaticSiMu
SomaticSiMu generates single and double base pair substitutions, and single base pair insertions and deletions of biologically representative mutation signature probabilities and combinations. SomaticSiMu_GUI is the GUI version of SomaticSiMu.
Description
Simulated genomes with imposed known mutational signatures associated with cancer can be useful for benchmarking machine learning-based classifiers of genomic sequences and finetuning model hyperparameters. SomaticSiMu extracts known signature data from reference signature data, generates novel mutations on an input sequence with respect to a series of user-specified parameters, and outputs the simulated mutated sequence as a machine readable FASTA file and metadata about the position, frequency and local sequence context of each mutation. The simulation can also model temporal directed evolution across early and late stages of 37 cancer types. SomaticSiMu is developed as a lightweight, stand alone, and massively parallel software tool with a graphical user interface, built in documentation and visualization functions of mutation signature plots. The rich selection of input parameters and graphical user interface make SomaticSiMu both an easy to use application and effective as part of a wide range of experimental scenarios.
Installation
SomaticSiMu is implemented in Python. As long as Python is installed on your system, SomaticSiMu should run directly on your system.
$ git clone https://github.com/HillLab/SomaticSiMu\
File Structure
├── DBS_Expected_Frequency
├── Documentation
├── Frequency_Table
├── ID_Expected_Frequency
├── Mutation_Metadata
├── Reference
├── Reference_genome
├── Sample
├── Signature_Combinations
├── kmer_ref_count
│ ├── 1-mer
│ ├── 2-mer
│ ├── 3-mer
│ ├── 4-mer
│ ├── 5-mer
│ ├── 6-mer
├── SomaticSiMu.py
├── SomaticSiMu_CC.py
Quick Start
Simulate 100 sequences by imposing known mutation signatures associated with Biliary-AdenoCA onto the entire length of reference Human chromosome 22.
cd SomaticSiMu
python SomaticSiMu_GUI.py
Input Simulation Parameters:
cancer_type = Biliary-AdenoCA
reading_frame = 1
std_outlier = 3
number_of_lineages = 100
simulation_type = end
sequence_abs_path = Homo_sapiens.GRCh38.dna.chromosome.22.fasta
slice_start = 0
slice_end = 50818467
power=1
syn_rate=1
non_syn_rate=1
Parameter List
"--generation", "-g", help="number of simulated sequences", default=10
"--cancer", "-c", help="cancer type"
"--reading_frame", "-f", help="index start of reading frame", default=1
"--std", "-s", help="exclude signature data outside of n std from the mean", default=3
"--simulation_type", "-v", help="simulation type", default="end"
"--slice_start", "-a", help="start of the slice of the input sequence, default=None (start at first base)"
"--slice_end", "-b", help="end of the slice of the input sequence, default=None (end at first base)"
"--power", "-p", help="multiplier of mutation burden from burden observed in in vivo samples", default=1
"--syn_rate", "-x", help="proportion of synonymous mutations out of all simulated mutations kept in the output simulated sequence", default=1
"--non_syn_rate", "-y", help="proportion of non-synonymous mutations out of all simulated mutations kept in the output simulated sequence", default=1
"--reference", "-r", help="full file path of reference sequence used as input for the simulation"
Output
Sample: Simulated sequences output into directory named after the type of cancer simulated.
Mutation_Metadata: CSV file output of each mutation simulated; the mutation type and index location on the reference input sequence. One file for each simulated sequence.
Frequency_Table: CSV file output of summarized counts of each mutation type and local context. One file for each simulated sequence.
Signature_Combinations: CSV file output of the signature combinations used for each iteration of the simulation. Different combinations of signatures are found operative in the same cancer type and are incorporated into the simulation. One file for each cancer type simulated.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
License
Creative Commons Creative Commons Attribution 4.0 International license
PyPi Hosting
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file SomaticSiMu-3.0.0.tar.gz
.
File metadata
- Download URL: SomaticSiMu-3.0.0.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29937c64c57a27bc43af35e1dab9f0dce6948988f109d5116010d00b481702ca |
|
MD5 | 10fb9bab82868de9240dbf34ad058bff |
|
BLAKE2b-256 | 104010eb20d643adf1bd966f6683d654611d8eba591532b5c1f79adfa73cdfd2 |
File details
Details for the file SomaticSiMu-3.0.0-py3-none-any.whl
.
File metadata
- Download URL: SomaticSiMu-3.0.0-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ec65f3c9ac1a8660d311620320cc6cc70a02d857c68e055c2d7eeaad6ac3f12 |
|
MD5 | 7aa9f41f335b5e212aa7153154b1c5e1 |
|
BLAKE2b-256 | ec8154e6b28f148fbb14a617bb8c3f729f6656c7a596185edc77cf19adf8e2de |