Skip to main content

Mutational signatures attribution and decomposition tool

Project description

License Build Status

drawing

SigProfilerAssignment

SigProfilerAssignment is a new mutational attribution and decomposition tool that performs the following functions:

  • Attributing a known set of mutational signatures to an individual sample or multiple samples.
  • Decomposing de novo signatures to COSMIC signature database.
  • Attributing COSMIC database or a custom signature database to given samples.

The tool identifies the activity of each signature in the sample and assigns the probability for each signature to cause a specific mutation type in the sample. The tool makes use of SigProfilerMatrixGenerator, SigProfilerExtractor and SigProfilerPlotting.

Installs

for installing from PyPi in new conda environment

$ pip install SigProfilerAssignment

Installing this package : git clone this repo or download the zip file. Unzip the contents of SigProfilerExtractor-master.zip or the zip file of a corresponding branch.

$ cd SigProfilerAssignment-master
$ pip install .

Signature Subgroups

exclude_signature_subgroups = ['remove_MMR_deficiency_signatures',
                               'remove_POL_deficiency_signatures',
                               'remove_HR_deficiency_signatures' ,
                               'remove_BER_deficiency_signatures',
                               'remove_Chemotherapy_signatures',
                               'remove_Immunosuppressants_signatures'
                               'remove_Treatment_signatures'
                               'remove_APOBEC_signatures',
                               'remove_Tobacco_signatures',
                               'remove_UV_signatures',
                               'remove_AA_signatures',
                               'remove_Colibactin_signatures',
                               'remove_Artifact_signatures',
                               'remove_Lymphoid_signatures']
Signature subgroup SBS signatures excluded DBS signatures excluded ID signatures excluded
MMR_deficiency_signatures 6, 14, 15, 20, 21, 26, 44 7, 10 7
POL_deficiency_signatures 10a, 10b, 10c, 10d, 28 3 -
HR_deficiency_signatures 3 - 6
BER_deficiency_signatures 30, 36 - -
Chemotherapy_signatures 11, 25, 31, 35, 86, 87, 90 5 -
Immunosuppressants_signatures 32 - -
Treatment_signatures 11, 25, 31, 32, 35, 86, 87, 90 5 -
APOBEC_signatures 2, 13 - -
Tobacco_signatures 4, 29, 92 2 3
UV_signatures 7a, 7b, 7c, 7d, 38 1 13
AA_signatures 22 - -
Colibactin_signatures 88 - 18
Artifact_signatures 27, 43, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 95 - -
Lymphoid_signatures 9, 84, 85 - -

Decompose Fit

Decomposes the De Novo Signatures into COSMIC Signatures and assigns COSMIC signatures into samples. drawing

from SigProfilerAssignment import Analyzer as Analyze
Analyze.decompose_fit(samples, 
                       output, 
                       signatures=signatures,
                       signature_database=sigs,
                       genome_build="GRCh37", 
                       verbose=False,
                       new_signature_thresh_hold=0.8,
                       exclude_signature_subgroups=exclude_signature_subgroups,
                       exome=False)

Analysis

De Novo Fit

Attributes mutations of given Samples to input denovo signatures. drawing

from SigProfilerAssignment import Analyzer as Analyze
Analyze.denovo_fit( samples,
                    output, 
                    signatures=signatures,
                    signature_database=sigs,
                    genome_build="GRCh37", 
                    verbose=False)

COSMIC Fit

Attributes mutations of given Samples to input COSMIC signatures. Note that penalties associated with denovo fit and COSMIC fits are different.

drawing
from SigProfilerAssignment import Analyzer as Analyze
Analyze.cosmic_fit( samples, 
                    output, 
                    signatures=None,
                    signature_database=sigs,
                    genome_build="GRCh37", 
                    verbose=False,
                    collapse_to_SBS96=False,
                    make_plots=True,
                    exclude_signature_subgroups=exclude_signature_subgroups,
                    exome=False
)

Main Parameters

Parameter Variable Type Parameter Description
samples String Path to input file for input_type:
  • "matrix"
  • "seg:TYPE"
Path to input folder for input_type:
  • "vcf"
output String Path to the output folder.
input_type String The type of input:
  • "matrix": used for table format inputs using a tab-separated file where the rows are mutation types and the columns are sample IDs.
  • "vcf": used for mutation calling file inputs (VCFs, MAFs or simple text files).
  • "seg:TYPE": used for a multi-sample segmentation file for copy number analysis. The accepted callers for TYPE are the following {"ASCAT", "ASCAT_NGS", "SEQUENZA", "ABSOLUTE", "BATTENBERG", "FACETS", "PURPLE", "TCGA"}. For example, when using segmentation file from BATTENBERG then set input_type to "seg:BATTENBERG".
The default value is "matrix".
context_type String Required context type if input_type is "vcf". context_type takes which context type of the input data is considered for assignment. Valid options include "96", "288", "1536", "DINUC", and "INDEL". The default value is "96".
signatures String Path to a tab delimited file that contains the signature table where the rows are mutation types and colunms are signature IDs.
genome_build String The reference genome build. List of supported genomes: "GRCh37", "GRCh38", "mm9", "mm10" and "rn6". The default value is "GRCh37". If the selected genome is not in the supported list, the default genome will be used.
cosmic_version Float Takes a positive float among 1, 2, 3, 3.1, 3.2 and 3.3. Defines the version of the COSMIC reference signatures. The default value is 3.3.
new_signature_thresh_hold Float Parameter in cosine similarity to declare a new signature. Applicable for decompose_fit only. The default value is 0.8.
exclude_signature_subgroups List Removes the signatures corresponding to specific subtypes for better fitting. The usage is given above. The default value is None.
exome Boolean Defines if the exome renormalized signatures will be used. The default value is False.
export_probabilities Boolean Defines if the probability matrix is created. The default value is True.
make_plots Boolean Toggle on and off for making and saving all plots. The default value is True.
verbose Boolean Prints statements. The default value is False.

Examples

SPA analysis - Example for a matrix

#import modules
import SigProfilerAssignment as spa
from SigProfilerAssignment import Analyzer as Analyze

#set directories and paths to signatures and samples
dir_inp     = spa.__path__[0]+'/data/Examples/'
samples     = dir_inp+"Input_scenario_8/Samples.txt"
output      = "output_example/"
signatures  = dir_inp+"Results_scenario_8/SBS96/All_Solutions/SBS96_3_Signatures/Signatures/SBS96_S3_Signatures.txt"
sigs        = "COSMIC_v3_SBS_GRCh37_noSBS84-85.txt" #Custom Signature Database

#Analysis of SP Assignment 
Analyze.cosmic_fit( samples, 
                    output, 
                    signatures=None,
                    signature_database=sigs,
                    genome_build="GRCh37",
                    cosmic_version=3.3,
                    verbose=False,
                    collapse_to_SBS96=False,
                    make_plots=True,
                    exclude_signature_subgroups=None,
                    exome=False)

SPA analysis - Example for input vcf files

#import modules
import SigProfilerAssignment as spa
from SigProfilerAssignment import Analyzer as Analyze
import os

#set directories and paths to signatures and samples
dir_inp = os.path.join(spa.__path__[0], '/data/Examples/')
# directory of vcf files
samples = os.path.join(spa.__path__[0], '/data/tests/vcf_input/')
output = "output_example/"
signatures = os.path.join(dir_inp, \
    "Results_scenario_8/SBS96/All_Solutions/SBS96_3_Signatures/Signatures/" \
    + "SBS96_S3_Signatures.txt")
sigs = "COSMIC_v3_SBS_GRCh37_noSBS84-85.txt" #Custom Signature Database

#Analysis of SP Assignment 
Analyze.cosmic_fit( samples, 
                    output,
                    input_type="vcf",
                    context_type="96", 
                    signatures=None,
                    signature_database=sigs,
                    genome_build="GRCh37",
                    cosmic_version=3.3,
                    verbose=False,
                    collapse_to_SBS96=False,
                    make_plots=True,
                    exclude_signature_subgroups=None,
                    exome=False)

SPA analysis - Example for an input multi-sample segmentation file

#import modules
import SigProfilerAssignment as spa
from SigProfilerAssignment import Analyzer as Analyze
import os

#set directories and paths to signatures and samples
dir_inp = os.path.join(spa.__path__[0], 'data/Examples/')
# segmentation file
samples = os.path.join(spa.__path__[0], \
    '/data/tests/cnv_input/all.breast.ascat.summary.sample.tsv')
output = "output_example/"

#Analysis of SP Assignment 
Analyze.cosmic_fit( samples, 
                    output,
                    input_type="seg:ASCAT_NGS",
                    context_type="CNV48", 
                    signatures=None,
                    signature_database=None,
                    genome_build="GRCh37",
                    cosmic_version=3.3,
                    verbose=False,
                    collapse_to_SBS96=False,
                    make_plots=True,
                    exclude_signature_subgroups=None,
                    exome=False)

Copyright

This software and its documentation are copyright 2022 as a part of the SigProfiler project. The SigProfilerAssignment framework is free software and is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Contact Information

Please address any queries or bug reports to Raviteja Vangara at rvangara@health.ucsd.edu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SigProfilerAssignment-0.0.18.tar.gz (4.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

SigProfilerAssignment-0.0.18-py3-none-any.whl (6.7 MB view details)

Uploaded Python 3

File details

Details for the file SigProfilerAssignment-0.0.18.tar.gz.

File metadata

  • Download URL: SigProfilerAssignment-0.0.18.tar.gz
  • Upload date:
  • Size: 4.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for SigProfilerAssignment-0.0.18.tar.gz
Algorithm Hash digest
SHA256 26bdfe6a5536b007836c8ed32e2359be09351e0cbbe3188b4029cf86c29756ea
MD5 ab7657241db7e6cecff625e8505354e6
BLAKE2b-256 3076b2ac20625c7d23ebf959681df3436174a0b61be964d8c74829c42ab1b88f

See more details on using hashes here.

File details

Details for the file SigProfilerAssignment-0.0.18-py3-none-any.whl.

File metadata

File hashes

Hashes for SigProfilerAssignment-0.0.18-py3-none-any.whl
Algorithm Hash digest
SHA256 28a19a1efa26ff1103e6711aa24151df6e3b828aeef02ed0174a4c34e6f7c6ad
MD5 609d1d95bc08c69645015d5a9c6ef447
BLAKE2b-256 7673c586f0e9b2bb41a67c64605f214499ca755929f4bf4092d11a28cd2982b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page