Skip to main content

Tensorframework for mutational signature analysis.

Project description

https://img.shields.io/pypi/v/tensorsignatures.svg https://img.shields.io/travis/sagar87/tensorsignatures.svg Documentation Status

DISCLAIMER: TensorSignatures is currently being developed and not stable yet. Although, the current version is in principle fully functional, it is likely that you may face problems using the software; if so, please don’t hesitate to get in touch.

TensorSignatures is a tensor factorization framework for mutational signature analysis, which in contrast to other methods, deciphers mutational processes not only in terms of mutational spectra, but also assess their properties with respect to various genomic variables.

Quick install

There are several ways to install TensorSignatures.

Via GitHub

To obtain the most recent version of TensorSignatures, we recommend to create a virtual environment and download the repository directly from GitHub. To get started, clone the repository by executing the following commands in your terminal

$ git clone https://github.com/gerstung-lab/tensorsignatures.git && cd tensorsignatures

Then, create a new virtual environment and install all dependencies.

$ python -m venv env
$ source env/bin/activate
$ pip install --upgrade pip setuptools wheel && pip install -r requirements.txt

Finally, install TensorSignatures.

$ pip install -e .

Via Pypi

To install tensorsignatures via Pypi simply type

$ pip install tensorsignatures

into your shell. To get started with tensorsignatures please refer to the documentation.

Via docker (& jupyter)

To run TensorSignatures within a docker environment (and jupyter) clone the first the repository

$ git clone https://github.com/gerstung-lab/tensorsignatures.git
$ cd tensorsignatures

and then spin up the container using docker-compose

$ docker-compose up --build

Getting started

Step 1: Data preparation

To apply TensorSignatures on your data single nucleotide variants (SNVs) need to be split according to their genomic context and represented in a highdimensional count tensor. Similarly, multinucleotide variants (MNVs), deletions and indels (indels) have to be classified and represented in count matrix (currently we do not provide a automated way of generating a structural variant table yet). Despite the fact that TensorSignatures is written in Python, this part of the pipeline runs in R and and depends on the bioconductor packages VariantAnnotation and rhdf5. Make sure you have R3.4.x installed, and the packages VariantAnnotation and rhdf5. You can install them, if necessary, by executing

$ Rscript -e "source('https://bioconductor.org/biocLite.R'); biocLite('VariantAnnotation')"

and

$ Rscript -e "source('https://bioconductor.org/biocLite.R'); biocLite('rhdf5')"

from your command line.

To get started, download the following files and place them in the same directory:

Constants.RData (contains GRanges objects that annotate transcription/replication orientation, nucleosomal and epigenetic states)

mutations.R (all required functions to partiton SNVs, MNVs and indels)

processVcf.R (loads vcf files and creates the SNV count tensor, MNV and indel count matrix; eventually needs custom modification to make the script run on your vcfs.)

genome.zip (optionally).

To obtain the SNV count tensor and the matrices containing all other mutation types try to execute

$ Rscript processVcf.R yourVcfFile1.vcf.gz yourVcfFile2.vcf.gz ... yourVcfFileN.vcf.gz outputHdf5File.h5

which ideally outputs an hdf5 file that can be used as an input for the TensorSignatures software. In case of errors please check wether you have correctly specified paths in line 6-8. Also, take a look at the readVcfSave function and adjust it in case of errors.

Before you can run TensorSignatures, a trinucleotide normalization constant needs to be added to the hdf5 data file. You can do this by calling the prep subroutine of the TensorSignatures commandline programme.

$ tensorsignatures prep outputHdf5File.ht tsData.h5

Step 2: Run TensorSignatures

Once you have obtained the prepared input file, there are to ways to run TensorSignatures using either the refit option, which fits the exposures of a set of pre-defined signatures to a new dataset, or via the train subroutine, that performs a denovo extraction of TensorSignatures. Both options have advantages and disadvantages: Refitting tensor signatures is computationally fast but does not allow to discover new signatures, while fitting new signatures requires a large number of samples and is computationally intensive (GPU required). For most use cases, with a small number of samples, we advice to use the refit option:

$ tensorsignatures --verbose refit tsData.h5 refit.pkl -n

Here, is an example call to run a denovo extraction of tensor signatures

$ tensorsignatures --verbose train tsData.h5 denovo.pkl <rank> -k <size> -n -ep <epochs>

Running Tensorsignatures will yield a pickle dump which can subsequently inspected using the tensorsignatures package (tutorials will follow soon).

Features

  • Run tensorsignatures on your dataset using the TensorSignature class provided by the package or via the command line tool.

  • Compute percentile based bootstrap confidence intervals for inferred parameters.

  • Basic plotting tools to visualize tensor signatures and inferred parameters

Credits

  • Harald Vöhringer and Moritz Gerstung

History

0.4.0 (2019-11-25)

  • added subroutine prep which adds the normalization constant to a hdf5 input file of tensorsignatures

  • added subroutine refit which refits a set of predefined signatures to mew dataset

  • updated README.rst

  • fixed issue with package data

0.3.0 (2019-10-03)

  • various fixes

  • design changes

  • fixed setup.py

0.1.0 (2019-08-21)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensorsignatures-0.4.0.tar.gz (108.4 kB view hashes)

Uploaded Source

Built Distribution

tensorsignatures-0.4.0-py2.py3-none-any.whl (732.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page