A package to train and work with Mutual Hazard Networks
Project description
mhn: A Python package to efficiently compute Mutual Hazard Networks
Mutual Hazard Networks (MHN) were first introduced by Schill et al. (2019)
and are used to model cancer progression.
This Python package can be used to work with MHNs. It includes functions that were part of the
original R implementation as well as functions that make use of state-space restriction
to make learning a new MHN from cancer data faster and more efficient. Furthermore, it
also contains functions to work with data for which the samples' ages are known and can
therefore be considered while learning an MHN (see Rupp et al. (2021)).
There are optimizer classes for data with known sample ages as well as for data without, which make learning a new MHN possible with
only a few lines of code.
Install the mhn package
You can install the mhn package using pip:
pip3 install mhn
After completing the installation of this package you should be able to import it by calling
import mhn
If a new version of the mhn package is available, you can upgrade your installation with
pip3 install --upgrade mhn
A quick overview
The package contains the original MHN functions implemented in Python. You import them from mhn.original
:
from mhn.original import Likelihood, ModelConstruction, RegularizedOptimization, UtilityFunctions
You can train an MHN using state-space restriction. The corresponding functions can be imported with
from mhn.ssr import state_space_restriction, state_containers
The functions that make use of the known ages of samples can be imported via
from mhn.ssr import matrix_exponential
Using the CUDA implementation to accelerate score computations
If your device has an Nvidia GPU, you can accelerate the computation of the log-likelihood score and its gradient for both the full and the restricted state-space with CUDA. For that you have to have CUDA and the CUDA compiler installed on your device. You can check that in the terminal with
nvcc --version
If this command is recognized, then CUDA should be installed on your device.
You can also use the following function of the state_space_restriction
submodule:
from mhn.ssr import state_space_restriction
print(state_space_restriction.cuda_available())
# the three possible results are also available as constants:
# CUDA_AVAILABLE, CUDA_NOT_AVAILABLE, CUDA_NOT_FUNCTIONAL
if state_space_restriction.cuda_available() == state_space_restriction.CUDA_AVAILABLE:
print('CUDA is available')
if state_space_restriction.cuda_available() == state_space_restriction.CUDA_NOT_AVAILABLE:
print('CUDA compiler nvcc was not present during installation')
if state_space_restriction.cuda_available() == state_space_restriction.CUDA_NOT_FUNCTIONAL:
print('CUDA compiler nvcc available but CUDA functions not working. Check CUDA installation')
Be especially aware of the CUDA_NOT_FUNCTIONAL
case: This means that the CUDA compiler
is installed on your device but basic functionalities like allocating memory on the GPU
are not working as expected. In this case
something is probably wrong with your CUDA drivers and you should check your CUDA
installation.
If you installed nvcc
after installing the mhn
package, you have to
reinstall this package to gain access to the CUDA functions.
How to train a new MHN
The simplest way to train a new MHN is to import the optimizers
module and
use the StateSpaceOptimizer
class.
from mhn.optimizers import StateSpaceOptimizer
opt = StateSpaceOptimizer()
We can specify the data that we want our MHN to be trained on:
opt.load_data_matrix(data_matrix)
Make sure, that the binary numpy matrix data_matrix
is set to dtype=np.int32
, else you
might get an error. Alternatively, if your training data is stored in a CSV file, you can call
opt.load_data_from_csv(filename, delimiter)
where delimiter
is the delimiter separating the items in the CSV file (default: ','
).
Internally, this method uses pandas' read_csv()
function to extract the data from the CSV file.
All additional keyword arguments given to this method will be passed on to that
pandas function. This means parameters like usecols
or skiprows
of the read_csv()
function can also be used as parameters for this method.
If you want to make sure that the matrix was loaded correctly, you can get
the loaded matrix with
loaded_matrix = opt.training_data
If you work with a CUDA-capable device, you can choose which device you want to use to train a new MHN:
# uses both CPU and GPU depending on the number of mutations in the individual sample
opt.set_device(StateSpaceOptimizer.Device.AUTO)
# use the CPU to compute log-likelihood score and gradient
opt.set_device(StateSpaceOptimizer.Device.CPU)
# use the GPU to compute log-likelihood score and gradient
opt.set_device(StateSpaceOptimizer.Device.GPU)
# you can also access the Device enum directly with an Optimizer object
opt.set_device(opt.Device.AUTO)
You could also change the initial theta that is the starting point for training, which by default is an independence model, with
opt.set_init_theta(init_theta)
If you want to regularly save the progress during training you can use
opt.save_progress(steps=-1, always_new_file=False, filename='theta_backup.npy')
The parameters of this method are
steps
(default: -1
): if positive, the number of iterations between two progress storages
always_new_file
(default: False
): if True, creates a new file for every progress storage,
else the former progress file is overwritten each time
filename
(default: "theta_backup.npy"
): the file name of the progress file.
Lastly, you could specify a callback function that is called after each training step
def some_callback_function(theta: np.ndarray):
pass
opt.set_callback_func(some_callback_function)
Finally, you can train a new MHN with
from mhn.optimizers import StateSpaceOptimizer
opt = StateSpaceOptimizer()
opt = opt.load_data_from_csv(filename, delimiter)
opt.train()
Some important parameters of the train
method include
lam
(default: 0
), which is
the lambda tuning parameter to control L1 regularization,
maxit
(default: 5000
), which is the maximum
number of training iterations,
reltol
(default: 1e-7
), which is the gradient norm at which the training terminates and
round_result
(default: True
), which, if set to True, rounds the result to two decimal places
The resulting MHN is returned by the train()
method, but can also be obtained
from the result
parameter:
new_mhn = opt.result
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.