Package to manipulate mutational processes.
Project description
SigNet
SigNet is a package to study genetic mutational processes. Check out our theoretical background page for further information on this topic. As of now, it contains 3 solutions:
- SigNet Refitter: Tool for signature decomposition.
- SigNet Generator: Tool for realistic mutational data generation.
- SigNet Detector: Tool for mutational vector out-of-distribution detection.
This is the official code implementation of the paper: Mutational signature decomposition with deep neural networks reveals origins of clock-like processes and hypoxia dependencies. By Claudia Serrano, Oleguer Canal, et all.
Readme contents
You can use SigNet in 3 different ways depending on your workflow:
-
- Python Package Installation
- Python Package Usage
-
Command Line Interface (CLI)
-
- Downloading Source Code
- Code-Basics
Python Package
Recommended if you want to integrate SigNet as part of your python workflow, or intending to re-train models on custom data with limited ANN architectural changes. You can install the python package running:
pip install signaturesnet
Once installed, you can run Signet Refitter like so:
import pandas as pd
from signaturesnet.modules.signet_module import SigNet
# Read your mutational data
mutations = pd.read_csv("your_input", header=0, index_col=0)
# Load & Run signet
signet = SigNet(opportunities_name_or_path="your_normalization_file")
results = signet(mutation_dataset=mutations)
# Extract results
w, u, l, c, _ = results.get_output()
# Store results
results.save(path='Output', name="this_experiment_filename")
# Plot figures
results.plot_results(save=True)
For a more usage examples: Check out the examples folder:
- refitter_example.py for a usage example.
- generator_example.py for a usage example.
- detector_example.py for a usage example.
NOTE: It is recommended that you work on a custom python virtualenvironment to avoid package version mismatches.
Command Line Interface
Recommended if only interested in running SigNet modules independently and not willing to retrain models or change the source code.
NOTE: This option is only tested on Debian-based Linux distributions. Steps:
- Download the signaturesnet exectuable
- Change directory to wherever you downloaded it:
cd <wherever/you/downloaded/the/executable/>
- Make it executable by your user:
sudo chmod u+x signaturesnet
Refitter:
The following example shows how to use SigNet Refitter.
cd <wherever/you/downloaded/the/executable/>
./signaturesnet refitter [--input_format {counts, bed, vcf}]
[--input_data INPUTFILE]
[--reference_genome REFGENOME]
[--normalization {None, exome, genome, PATH_TO_ABUNDANCES}]
[--only_nnls ONLYNNLS]
[--cutoff CUTOFF]
[--output OUTPUT]
[--plot_figs False]
-
--input_format
: Name of the format of the input. The default is 'counts'. Please refer to Mutations Input for further details. -
--input_data
: Path to the file containing the mutational counts. Please refer to Mutations Input for further details. -
--reference_genome
: Name or path to the reference genome. Needed when input_format is bed or vcf. -
--normalization
: As the INPUTFILE contain counts, we need to normalize them according to the abundances of each trinucleotide on the genome region we are counting the mutations.- Choose
None
(default): If you don't want any normalization. - Choose
exome
: If the data that is being input comes from Whole Exome Sequencing. This will normalize the counts according to the trinucleotide abundances in the exome. - Choose
genome
: If the data comes from Whole Genome Sequencing. - Set a
PATH_TO_ABUNDANCES
to use a custom normalization file. Please refer to Normalization Input for further details on the input format.
- Choose
-
--only_nnls
: Whether to use NNLS mode only (the finetuner is not run). Default:False
. -
--cutoff
: Cutoff to be applied to the final weights. Default: 0.01. -
--output
Path to the folder where all the output files (weights guesses and figures) will be stored. By default, this folder will be called "Output" and will be created in the current directory. Please refer to SigNet Refitter Output for further details on the output format. -
--plot_figs
Whether to generate output plots or not. Possible options areTrue
orFalse
.
Detector:
cd <wherever/you/downloaded/the/executable/>
./signaturesnet detector [--input_data INPUTFILE]
[--normalization {None, exome, genome, PATH_TO_ABUNDANCES}]
[--output OUTPUT]
(Same arguments as before)
Generator:
cd <wherever/you/downloaded/the/executable/>
./signaturesnet generator [--n_datapoints INT]
[--output OUTPUT]
--n_datapoints
: Number of signature weight combinations to generate.
Source Code
Is the option which gives more flexibility. Recommended if you want to play around with the code, re-train custom models or do contributions.
Downloading Source Code
Clone the repo and install it as an editable pip package like so:
git clone git@github.com:weghornlab/SigNet.git
cd SigNet
pip install -e .
Refer here for the project code organization.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file signaturesnet-0.1.1.tar.gz
.
File metadata
- Download URL: signaturesnet-0.1.1.tar.gz
- Upload date:
- Size: 35.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0b325ae8401d14df7cab0e1dca06e744dce9ae4fadd8152993ef8511cc00fc8 |
|
MD5 | 5b843e5986f2704578b36a4c9619878b |
|
BLAKE2b-256 | 0ec782bb23c3fb1b95cc764c5cfecf011a77a3c66914a9c16b2b7cd1cbc46228 |
File details
Details for the file signaturesnet-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: signaturesnet-0.1.1-py3-none-any.whl
- Upload date:
- Size: 35.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d0327ace8a4936b4e4527b00d62fed1f8f927157a6cdd9e5615149f9201ed10 |
|
MD5 | 8cfe94143ec50161d1dcecb51001101f |
|
BLAKE2b-256 | df4d71b2bd59c95395977ea8ceb63acec5276cd7e55aae4de1f1e4fdf4913c18 |