Mutational signatures attribution and decomposition tool
Project description
SigProfilerAssignment
SigProfilerAssignment is a new mutational attribution and decomposition tool that performs the following functions:
- Attributing a known set of mutational signatures to an individual sample or multiple samples.
- Decomposing de novo signatures to COSMIC signature database.
- Attributing COSMIC database or a custom signature database to given samples.
The tool identifies the activity of each signature in the sample and assigns the probability for each signature to cause a specific mutation type in the sample. The tool makes use of SigProfilerMatrixGenerator, SigProfilerExtractor and SigProfilerPlotting.
Installs
for installing from PyPi in new conda environment
$ pip install SigProfilerAssignment
Installing this package : git clone this repo or download the zip file. Unzip the contents of SigProfilerExtractor-master.zip or the zip file of a corresponding branch.
$ cd SigProfilerAssignment-master
$ pip install .
Signature Subgroups
exclude_signature_subgroups = ['remove_MMR_deficiency_signatures',
'remove_POL_deficiency_signatures',
'remove_HR_deficiency_signatures' ,
'remove_BER_deficiency_signatures',
'remove_Chemotherapy_signatures',
'remove_Immunosuppressants_signatures'
'remove_Treatment_signatures'
'remove_APOBEC_signatures',
'remove_Tobacco_signatures',
'remove_UV_signatures',
'remove_AA_signatures',
'remove_Colibactin_signatures',
'remove_Artifact_signatures',
'remove_Lymphoid_signatures']
Signature subgroup | SBS signatures excluded | DBS signatures excluded | ID signatures excluded |
---|---|---|---|
MMR_deficiency_signatures | 6, 14, 15, 20, 21, 26, 44 | 7, 10 | 7 |
POL_deficiency_signatures | 10a, 10b, 10c, 10d, 28 | 3 | - |
HR_deficiency_signatures | 3 | - | 6 |
BER_deficiency_signatures | 30, 36 | - | - |
Chemotherapy_signatures | 11, 25, 31, 35, 86, 87, 90 | 5 | - |
Immunosuppressants_signatures | 32 | - | - |
Treatment_signatures | 11, 25, 31, 32, 35, 86, 87, 90 | 5 | - |
APOBEC_signatures | 2, 13 | - | - |
Tobacco_signatures | 4, 29, 92 | 2 | 3 |
UV_signatures | 7a, 7b, 7c, 7d, 38 | 1 | 13 |
AA_signatures | 22 | - | - |
Colibactin_signatures | 88 | - | 18 |
Artifact_signatures | 27, 43, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 95 | - | - |
Lymphoid_signatures | 9, 84, 85 | - | - |
Decompose Fit
Decomposes the De Novo Signatures into COSMIC Signatures and assigns COSMIC signatures into samples.
from SigProfilerAssignment import Analyzer as Analyze
Analyze.decompose_fit(samples,
output,
signatures=signatures,
signature_database=sigs,
genome_build="GRCh37",
verbose=False,
new_signature_thresh_hold=0.8,
exclude_signature_subgroups=exclude_signature_subgroups,
exome=False)
Analysis
De Novo Fit
Attributes mutations of given Samples to input denovo signatures.
from SigProfilerAssignment import Analyzer as Analyze
Analyze.denovo_fit( samples,
output,
signatures=signatures,
signature_database=sigs,
genome_build="GRCh37",
verbose=False)
COSMIC Fit
Attributes mutations of given Samples to input COSMIC signatures. Note that penalties associated with denovo fit and COSMIC fits are different.
from SigProfilerAssignment import Analyzer as Analyze
Analyze.cosmic_fit( samples,
output,
signatures=None,
signature_database=sigs,
genome_build="GRCh37",
verbose=False,
collapse_to_SBS96=False,
make_plots=True,
exclude_signature_subgroups=exclude_signature_subgroups,
exome=False
)
Main Parameters
Parameter | Variable Type | Parameter Description |
---|---|---|
samples | String | Path to input file for input_type :
input_type :
|
output | String | Path to the output folder. |
input_type | String | The type of input:
|
context_type | String | Required context type if input_type is "vcf". context_type takes which context type of the input data is considered for assignment. Valid options include "96", "288", "1536", "DINUC", and "ID". The default value is "96". |
signatures | String | Path to a tab delimited file that contains the signature table where the rows are mutation types and colunms are signature IDs. |
genome_build | String | The reference genome build. List of supported genomes: "GRCh37", "GRCh38", "mm9", "mm10" and "rn6". The default value is "GRCh37". If the selected genome is not in the supported list, the default genome will be used. |
cosmic_version | Float | Takes a positive float among 1, 2, 3, 3.1, 3.2 and 3.3. Defines the version of the COSMIC reference signatures. The default value is 3.3. |
new_signature_thresh_hold | Float | Parameter in cosine similarity to declare a new signature. Applicable for decompose_fit only. The default value is 0.8. |
exclude_signature_subgroups | List | Removes the signatures corresponding to specific subtypes for better fitting. The usage is given above. The default value is None. |
exome | Boolean | Defines if the exome renormalized signatures will be used. The default value is False. |
export_probabilities | Boolean | Defines if the probability matrix per mutational context for all samples is created. The default value is True. |
export_probabilities_per_mutation | Boolean | Defines if the probability matrices per mutation for all samples are created. Only available when input_type is "vcf". The default value is False. |
make_plots | Boolean | Toggle on and off for making and saving all plots. The default value is True. |
verbose | Boolean | Prints statements. The default value is False. |
Examples
SPA analysis - Example for a matrix
#import modules
import SigProfilerAssignment as spa
from SigProfilerAssignment import Analyzer as Analyze
#set directories and paths to signatures and samples
dir_inp = spa.__path__[0]+'/data/Examples/'
samples = dir_inp+"Input_scenario_8/Samples.txt"
output = "output_example/"
signatures = dir_inp+"Results_scenario_8/SBS96/All_Solutions/SBS96_3_Signatures/Signatures/SBS96_S3_Signatures.txt"
sigs = "COSMIC_v3_SBS_GRCh37_noSBS84-85.txt" #Custom Signature Database
#Analysis of SP Assignment
Analyze.cosmic_fit( samples,
output,
signatures=None,
signature_database=sigs,
genome_build="GRCh37",
cosmic_version=3.3,
verbose=False,
collapse_to_SBS96=False,
make_plots=True,
exclude_signature_subgroups=None,
exome=False)
SPA analysis - Example for input vcf files
#import modules
import SigProfilerAssignment as spa
from SigProfilerAssignment import Analyzer as Analyze
import os
#set directories and paths to signatures and samples
dir_inp = os.path.join(spa.__path__[0], '/data/Examples/')
# directory of vcf files
samples = os.path.join(spa.__path__[0], '/data/tests/vcf_input/')
output = "output_example/"
signatures = os.path.join(dir_inp, \
"Results_scenario_8/SBS96/All_Solutions/SBS96_3_Signatures/Signatures/" \
+ "SBS96_S3_Signatures.txt")
sigs = "COSMIC_v3_SBS_GRCh37_noSBS84-85.txt" #Custom Signature Database
#Analysis of SP Assignment
Analyze.cosmic_fit( samples,
output,
input_type="vcf",
context_type="96",
signatures=None,
signature_database=sigs,
genome_build="GRCh37",
cosmic_version=3.3,
verbose=False,
collapse_to_SBS96=False,
make_plots=True,
exclude_signature_subgroups=None,
exome=False)
SPA analysis - Example for an input multi-sample segmentation file
#import modules
import SigProfilerAssignment as spa
from SigProfilerAssignment import Analyzer as Analyze
import os
#set directories and paths to signatures and samples
dir_inp = os.path.join(spa.__path__[0], 'data/Examples/')
# segmentation file
samples = os.path.join(spa.__path__[0], \
'/data/tests/cnv_input/all.breast.ascat.summary.sample.tsv')
output = "output_example/"
#Analysis of SP Assignment
Analyze.cosmic_fit( samples,
output,
input_type="seg:ASCAT_NGS",
context_type="CNV48",
signatures=None,
signature_database=None,
genome_build="GRCh37",
cosmic_version=3.3,
verbose=False,
collapse_to_SBS96=False,
make_plots=True,
exclude_signature_subgroups=None,
exome=False)
Copyright
This software and its documentation are copyright 2022 as a part of the SigProfiler project. The SigProfilerAssignment framework is free software and is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Contact Information
Please address any queries or bug reports to Raviteja Vangara at rvangara@health.ucsd.edu or Marcos Díaz-Gay at mdiazgay@health.ucsd.edu.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for SigProfilerAssignment-0.0.23.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 902298cc566b13bb27ba2ebcaa4ac6fd279de4eab3643d3bf5f724458aae5090 |
|
MD5 | 7e06c1c3a37d87df7b79a82cda6ee7fb |
|
BLAKE2b-256 | 370df699d559ce0225494e4b3098966cddfb41413ba8d393bf336ffaba124fa8 |
Hashes for SigProfilerAssignment-0.0.23-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c0c68bd4d783ff36538554dbf83c2d70826b0b79b8380c94d8720882355ca92 |
|
MD5 | 67dcb0dbb84d75253e4d63fb350d8370 |
|
BLAKE2b-256 | 371c95d8594a43cac1bf505b57a3ce7e010e6ef768809a17854100552161d7ca |