SigProfilerHotSpots tool
Project description
SigProfilerClusters
Tool for analyzing the inter-mutational distances between SNV-SNV and INDEL-INDEL mutations. Tool separates mutations into clustered and non-clustered groups on a sample-dependent basis and subclassifies all SNVs into a set category of clustered event: i) DBS; ii) MBS; iii) omikli; and iv) kataegis. Indels are not subclassifed.
INTRODUCTION
The purpose of this document is to provide a guide for using the SigProfilerHotSpots framework. An extensive Wiki page detailing the usage of this tool can be found at https://osf.io/qpmzw/wiki/home/.
PREREQUISITES
The framework is written in PYTHON, and uses additional SigProfiler packages:
- PYTHON version 3.4 or newer
- SigProfilerMatrixGenerator (https://github.com/AlexandrovLab/SigProfilerMatrixGenerator)
- SigProfilerSimulator (https://github.com/AlexandrovLab/SigProfilerSimulator)
Please visit their respective GitHub pages for detailed installation and usage instructions.
QUICK START GUIDE
This section will guide you through the minimum steps required to perform clustered analysis:
1a. Install the python package using pip (current package): pip install SigProfilerClusters
1b. Install the python package using pip (deprecated version): pip install SigProfilerHotSpots
Install your desired reference genome from the command line/terminal as follows (available reference genomes are: GRCh37, GRCh38, mm9, and mm10):
$ python
>> from SigProfilerMatrixGenerator import install as genInstall
>> genInstall.install('GRCh37', rsync=False, bash=True)
This will install the human 37 assembly as a reference genome. You may install as many genomes as you wish. If you have a firewall on your server, you may need to install rsync and use the rsync=True parameter. Similarly, if you do not have bash, use bash=False. 2. Place your vcf files in your desired output folder. It is recommended that you name this folder based on your project's name. Before you can analyze clustered mutations, you need to generate a background model for each of your samples. To do this, generate a minimum of 100 simulations for your project (see SigProfilerSimulator for a detailed list of parameters):
>>from SigProfilerSimulator import SigProfilerSimulator as sigSim
>>sigSim.SigProfilerSimulator(project, project_path, genome, contexts=["96"], simulations=100, chrom_based=True)
- Now the original mutations can be partitioned into clustered and non-clustered sets using the required parameters below:
>> from SigProfilerHotSpots import SigProfilerHotSpots as hp
>> hp.analysis(project, genome, contexts, simContext, input_path)
See below for a detailed list of available parameters
- The partitioned vcf files are placed under [project_path]/ouput/vcf_files/[project]_clustered/ and [project_path]/ouput/vcf_files/[project]nonClustered/. You can visualize the results by looking at the IMD plots available under [project_path]/ouput/simulations/[project]simulations[genome][context]_intradistance_plots/.
AVAILABLE PARAMETERS
Required:
project: [string] Unique name for the given project
genome: [string] Reference genome to use. Must be installed using SigProfilerMatrixGenerator
contexts: [string] Mutation context for measuring IMD (e.g. "6", "96", "1536", etc,)
simContext: [list of strings] Mutations context that was used for generating the background model (e.g ["6144"] or ["96"])
input_path: [string] Path to the given project
Optional:
analysis: [string] Desired analysis pipeline. By default output_type='all'. Other options include "subClassify" and "hotspot".
sortSims: [boolean] Option to sort the simulated files if they have already been sorted. By default sortSims=True to ensure accurate results. The files must be sorted for accurate results.
interdistance: [string] The mutation types to calculate IMDs between - Use only when performing analysis of indels (default='ID').
calculateIMD: [boolean] Parameter to calculate the IMDs. This will save time if you need to rerun the subclassification step only (default=True).
chrom_based: [boolean] Option to generate chromosome-dependent IMDs per sample. By default chrom_based=False.
max_cpu: [integer] Change the number of allocated CPUs. By default all CPUs are used
subClassify: [boolean] Subclassify the clustered mutations. Requires that VAF scores are available in TCGA or Sanger format. By default subClassify=False
plotIMDfigure: [boolean] Parameter that generates IMD and mutational spectra plots for each sample (default=True).
plotRainfall [boolean] Parameter that generates rainfall plots for each sample using the subclassification of clustered events (default=True).
The following parameters are used if the subClassify argument is True:
includedVAFs: [boolean] Parameter that informs the tool of the inclusion of VAFs in the dataset (default=True)
sanger: [boolean] The input files are from Sanger. By default sanger=True
TCGA: [boolean] The input files are from TCGA. By default TCGA=False
windowSize: [integer] Window size for calculating mutation density in the rainfall plots. By default windowSize=10000000
correction [boolean] Optional parameter to perform a genome-wide mutational density correction (boolean; default=False)
LOG FILES
All errors and progress checkpoints are saved into SigProfilerHotSpots_[project][genome].err and SigProfilerHotSpots[project]_[genome].out, respectively. For all errors, please email the error and progress log files to the primary contact under CONTACT INFORMATION.
CITATION
Bergstrom EN, Luebeck J, Petljak M, Bafna V, Mischell PS, Harris RS, and Alexandrov LB (2021) Comprehensive analysis of clustered mutations in cancer reveals recurrent APOBEC3 mutagenesis of ecDNA. bioRxiv 2021.05.27.445689; doi: https://doi.org/10.1101/2021.05.27.445689
COPYRIGHT
Copyright (c) 2021, Erik Bergstrom [Alexandrov Lab] All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
CONTACT INFORMATION
Please address any queries or bug reports to Erik Bergstrom at ebergstr@eng.ucsd.edu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file SigProfilerHotSpots-0.0.28.tar.gz
.
File metadata
- Download URL: SigProfilerHotSpots-0.0.28.tar.gz
- Upload date:
- Size: 47.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 725e01791ee1096c5660f8bb83c373578bf074286421f70e36394a0e10658e3c |
|
MD5 | 5e98af1d9e2ef4d41223d1ac9056ea0d |
|
BLAKE2b-256 | 96b5e1fe118ee8cf7d0eb35c6c9fde6cdb08f90c677accffb4d005a04cb69999 |
File details
Details for the file SigProfilerHotSpots-0.0.28-py3-none-any.whl
.
File metadata
- Download URL: SigProfilerHotSpots-0.0.28-py3-none-any.whl
- Upload date:
- Size: 49.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 966e730f65b7e0c4eb1107e40df55466f67ce7f91a3a9f95d236c06b1fe4edde |
|
MD5 | b616f917babb75a509486472d2900590 |
|
BLAKE2b-256 | bcb827bb5624b73aeb175bcb114b5b1defe7bd5f85a847f91a735d7badada885 |