Skip to main content

Mutational Signature Simulation

Project description

SomaticSiMu

SomaticSiMu generates single and double base pair substitutions, and single base pair insertions and deletions of biologically representative mutation signature probabilities and combinations. SomaticSiMu_GUI is the GUI version of SomaticSiMu.

Description

Simulated genomes with imposed known mutational signatures associated with cancer can be useful for benchmarking machine learning-based classifiers of genomic sequences and finetuning model hyperparameters. SomaticSiMu extracts known signature data from reference signature data, generates novel mutations on an input sequence with respect to a series of user-specified parameters, and outputs the simulated mutated sequence as a machine readable FASTA file and metadata about the position, frequency and local sequence context of each mutation. The simulation can also model temporal directed evolution across early and late stages of 37 cancer types. SomaticSiMu is developed as a lightweight, stand alone, and massively parallel software tool with a graphical user interface, built in documentation and visualization functions of mutation signature plots. The rich selection of input parameters and graphical user interface make SomaticSiMu both an easy to use application and effective as part of a wide range of experimental scenarios.

Installation

SomaticSiMu is implemented in Python. As long as Python is installed on your system, SomaticSiMu should run directly on your system.

$ git clone https://github.com/HillLab/SomaticSiMu\

File Structure

├── DBS_Expected_Frequency
├── Documentation
├── Frequency_Table
├── ID_Expected_Frequency
├── Mutation_Metadata
├── Reference
├── Reference_genome
├── Sample
├── Signature_Combinations
├── kmer_ref_count
│   ├── 1-mer
│   ├── 2-mer
│   ├── 3-mer
│   ├── 4-mer
│   ├── 5-mer
│   ├── 6-mer
├── SomaticSiMu.py
├── SomaticSiMu_CC.py

Quick Start

Simulate 100 sequences by imposing known mutation signatures associated with Biliary-AdenoCA onto the entire length of reference Human chromosome 22.

cd SomaticSiMu

python SomaticSiMu_GUI.py

Input Simulation Parameters: 
cancer_type = Biliary-AdenoCA
reading_frame = 1
std_outlier = 3
number_of_lineages = 100
simulation_type = end
sequence_abs_path = Homo_sapiens.GRCh38.dna.chromosome.22.fasta
slice_start = 0
slice_end = 50818467
power=1
syn_rate=1
non_syn_rate=1

Parameter List

"--generation", "-g", help="number of simulated sequences", default=10
"--cancer", "-c", help="cancer type"
"--reading_frame", "-f", help="index start of reading frame", default=1
"--std", "-s", help="exclude signature data outside of n std from the mean", default=3
"--simulation_type", "-v", help="simulation type", default="end"
"--slice_start", "-a", help="start of the slice of the input sequence, default=None (start at first base)"
"--slice_end", "-b", help="end of the slice of the input sequence, default=None (end at first base)"
"--power", "-p", help="multiplier of mutation burden from burden observed in in vivo samples", default=1
"--syn_rate", "-x", help="proportion of synonymous mutations out of all simulated mutations kept in the output simulated sequence", default=1
"--non_syn_rate", "-y", help="proportion of non-synonymous mutations out of all simulated mutations kept in the output simulated sequence", default=1
"--reference", "-r", help="full file path of reference sequence used as input for the simulation"

Output

Sample: Simulated sequences output into directory named after the type of cancer simulated.

Mutation_Metadata: CSV file output of each mutation simulated; the mutation type and index location on the reference input sequence. One file for each simulated sequence.

Frequency_Table: CSV file output of summarized counts of each mutation type and local context. One file for each simulated sequence.

Signature_Combinations: CSV file output of the signature combinations used for each iteration of the simulation. Different combinations of signatures are found operative in the same cancer type and are incorporated into the simulation. One file for each cancer type simulated.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

Creative Commons Creative Commons Attribution 4.0 International license

PyPi Hosting

https://pypi.org/project/SomaticSiMu/3.0.0/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SomaticSiMu-3.0.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

SomaticSiMu-3.0.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file SomaticSiMu-3.0.0.tar.gz.

File metadata

  • Download URL: SomaticSiMu-3.0.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.3

File hashes

Hashes for SomaticSiMu-3.0.0.tar.gz
Algorithm Hash digest
SHA256 29937c64c57a27bc43af35e1dab9f0dce6948988f109d5116010d00b481702ca
MD5 10fb9bab82868de9240dbf34ad058bff
BLAKE2b-256 104010eb20d643adf1bd966f6683d654611d8eba591532b5c1f79adfa73cdfd2

See more details on using hashes here.

File details

Details for the file SomaticSiMu-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: SomaticSiMu-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.3

File hashes

Hashes for SomaticSiMu-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ec65f3c9ac1a8660d311620320cc6cc70a02d857c68e055c2d7eeaad6ac3f12
MD5 7aa9f41f335b5e212aa7153154b1c5e1
BLAKE2b-256 ec8154e6b28f148fbb14a617bb8c3f729f6656c7a596185edc77cf19adf8e2de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page