Skip to main content

Package to manipulate mutational processes.

Project description

SigNet

SigNet is a package to study genetic mutational processes. Check out our theoretical background page for further information on this topic. As of now, it contains 3 solutions:

This is the official code implementation of the paper: Mutational signature decomposition with deep neural networks reveals origins of clock-like processes and hypoxia dependencies. By Claudia Serrano, Oleguer Canal, et all.

Readme contents

You can use SigNet in 3 different ways depending on your workflow:

  1. Python Package

    1. Python Package Installation
    2. Python Package Usage
  2. Command Line Interface (CLI)

  3. Source Code

    1. Downloading Source Code
    2. Code-Basics

Python Package

Recommended if you want to integrate SigNet as part of your python workflow, or intending to re-train models on custom data with limited ANN architectural changes. You can install the python package running:

pip install signaturesnet

Once installed, you can run Signet Refitter like so:

import pandas as pd
from signaturesnet.modules.signet_module import SigNet

# Read your mutational data
mutations = pd.read_csv("your_input", header=0, index_col=0)

# Load & Run signet
signet = SigNet(opportunities_name_or_path="your_normalization_file")
results = signet(mutation_dataset=mutations)

# Extract results
w, u, l, c, _ = results.get_output()

# Store results
results.save(path='Output', name="this_experiment_filename")

# Plot figures
results.plot_results(save=True)

For a more usage examples: Check out the examples folder:

NOTE: It is recommended that you work on a custom python virtualenvironment to avoid package version mismatches.

Command Line Interface

Recommended if only interested in running SigNet modules independently and not willing to retrain models or change the source code.
NOTE: This option is only tested on Debian-based Linux distributions. Steps:

  1. Download the signaturesnet exectuable
  2. Change directory to wherever you downloaded it: cd <wherever/you/downloaded/the/executable/>
  3. Make it executable by your user: sudo chmod u+x signaturesnet

Refitter:

The following example shows how to use SigNet Refitter.

cd <wherever/you/downloaded/the/executable/>
./signaturesnet refitter  [--input_format {counts, bed, vcf}]
                   [--input_data INPUTFILE]
                   [--reference_genome REFGENOME]
                   [--normalization {None, exome, genome, PATH_TO_ABUNDANCES}] 
                   [--only_nnls ONLYNNLS]
                   [--cutoff CUTOFF]
                   [--output OUTPUT]
                   [--plot_figs False]
  • --input_format: Name of the format of the input. The default is 'counts'. Please refer to Mutations Input for further details.

  • --input_data: Path to the file containing the mutational counts. Please refer to Mutations Input for further details.

  • --reference_genome: Name or path to the reference genome. Needed when input_format is bed or vcf.

  • --normalization: As the INPUTFILE contain counts, we need to normalize them according to the abundances of each trinucleotide on the genome region we are counting the mutations.

    • Choose None (default): If you don't want any normalization.
    • Choose exome: If the data that is being input comes from Whole Exome Sequencing. This will normalize the counts according to the trinucleotide abundances in the exome.
    • Choose genome: If the data comes from Whole Genome Sequencing.
    • Set a PATH_TO_ABUNDANCES to use a custom normalization file. Please refer to Normalization Input for further details on the input format.
  • --only_nnls: Whether to use NNLS mode only (the finetuner is not run). Default: False.

  • --cutoff: Cutoff to be applied to the final weights. Default: 0.01.

  • --output Path to the folder where all the output files (weights guesses and figures) will be stored. By default, this folder will be called "Output" and will be created in the current directory. Please refer to SigNet Refitter Output for further details on the output format.

  • --plot_figs Whether to generate output plots or not. Possible options are True or False.

Detector:

cd <wherever/you/downloaded/the/executable/>
./signaturesnet detector  [--input_data INPUTFILE]
                   [--normalization {None, exome, genome, PATH_TO_ABUNDANCES}] 
                   [--output OUTPUT]

(Same arguments as before)

Generator:

cd <wherever/you/downloaded/the/executable/>
./signaturesnet generator  [--n_datapoints INT]
                    [--output OUTPUT]
  • --n_datapoints: Number of signature weight combinations to generate.

Source Code

Is the option which gives more flexibility. Recommended if you want to play around with the code, re-train custom models or do contributions.

Downloading Source Code

Clone the repo and install it as an editable pip package like so:

git clone git@github.com:weghornlab/SigNet.git
cd SigNet
pip install -e .

Refer here for the project code organization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signaturesnet-0.0.10.tar.gz (35.5 MB view hashes)

Uploaded Source

Built Distribution

signaturesnet-0.0.10-py3-none-any.whl (29.2 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page