Skip to main content

Fast inference of MHN models

Project description

Test

FastMHN - Fast inference of MHNs

FastMHN is a Python package for approximate learning of Mutational Hierarchical Networks (MHNs) and observation MHNs (oMHNs). It enables fast inference through suitable rank-1 approximations of the time-marginalized probability distributions, making it practical to work with larger datasets where exact methods would be computationally prohibitive.

Overview

Mutational Hierarchical Networks (MHNs) are probabilistic graphical models used to model the accumulation of mutations in cancer and other evolutionary processes. They capture dependencies between binary events (e.g., mutations, copy-number alterations) through a graph structure.

This package provides:

  • Approximate learning of MHN and oMHN models using clustering-based approximations
  • Exact learning methods for smaller datasets (mostly for testing)
  • Cross-validation support for hyperparameter tuning (e.g., regularization strength)

The approximation methods allow inference on datasets with higher mutational burdens where exact computation of the full state space would be infeasible.

Installation

The package can be installed directly from PyPI:

pip install fastmhn

Or clone the repository and install manually:

git clone https://phygit.ur.de/physics/mhn/fastmhn.git
cd fastmhn
pip install -e .

Dependencies

  • Python >= 3.11
  • NumPy
  • joblib
  • mhn

Example Usage

Learning an MHN model

import numpy as np
import fastmhn

# Generate synthetic data: N samples, d events
# Each row is a binary vector indicating which events occurred
d = 5
N = 100
data = np.random.randint(2, size=(N, d), dtype=np.int32)

# Learn MHN model with approximate gradient computation
theta = fastmhn.learn.learn_mhn(
    data,
    reg=1e-2,  # L1 regularization strength
    gradient_and_score_params={"max_cluster_size": 10},
    adam_params={
        "alpha": 0.1,
        "beta1": 0.7,
        "beta2": 0.9,
        "eps": 1e-8,
        "verbose": True,
    },
)

# theta is a d x d matrix representing the learned MHN
print(f"Learned theta matrix:\n{theta}")

Replace data with your own dataset, this is just a placeholder in the code snippet.

Learning an oMHN model

The observation MHN (oMHN) extends MHN by modeling observation rates that the active events can influence:

import numpy as np
import fastmhn

# Generate data
d = 5
N = 100
data = np.random.randint(2, size=(N, d), dtype=np.int32)

# Learn oMHN model
theta = fastmhn.learn.learn_omhn(
    data,
    reg=1e-2,
    gradient_and_score_params={"max_cluster_size": 10},
    adam_params={"alpha": 0.1, "beta1": 0.7, "beta2": 0.9, "eps": 1e-8},
)

# theta is a (d+1) x d matrix
# First d rows: MHN parameters
# Last row: observation rates
print(f"Learned oMHN theta matrix:\n{theta}")

Cross-validation for regularization strength

import numpy as np
import fastmhn

# Generate data
d = 5
N = 100
data = np.random.randint(2, size=(N, d), dtype=np.int32)

# Cross-validation parameters
k = 5  # number of folds
reg = 1e-2  # regularization strength to evaluate

# Shuffle data
rng = np.random.default_rng(42)
shuffled_indices = np.arange(N)
rng.shuffle(shuffled_indices)
data = data[shuffled_indices, :]

# Create folds
fold_sizes = (N // k) * np.ones(k, dtype=int)
fold_sizes[: N % k] += 1

# Get score offset for comparison
score_offset = fastmhn.utility.get_score_offset(data)
average_validation_score = 0

for k_index in range(k):
    # Split into training and validation
    val_start = np.sum(fold_sizes[:k_index])
    val_end = np.sum(fold_sizes[: k_index + 1])
    data_val = data[val_start:val_end]
    data_train = np.concatenate((data[:val_start], data[val_end:]))

    # Learn model on training data
    theta = fastmhn.learn.learn_omhn(
        data_train,
        reg=reg,
        gradient_and_score_params={"max_cluster_size": 10},
        adam_params={"verbose": False},
    )

    # Evaluate on validation data
    ctheta = fastmhn.utility.cmhn_from_omhn(theta)
    _, val_score = fastmhn.approx.approx_gradient_and_score(
        ctheta, data_val, max_cluster_size=10
    )
    average_validation_score += val_score

average_validation_score /= k
print(f"Average validation score: {average_validation_score} (offset: {score_offset})")

Using the command-line scripts

The repository includes convenience scripts for common tasks:

  • learn_approx_mhn.py - Learn an MHN model
  • learn_approx_omhn.py - Learn an oMHN model
  • learn_approx_omhn_crossvalidated.py - Learn oMHN with cross-validation

You can use these as templates or run them directly:

python learn_approx_omhn.py

API Reference

The main functions are accessible through the fastmhn package:

  • fastmhn.learn.learn_mhn() - Learn an MHN model
  • fastmhn.learn.learn_omhn() - Learn an oMHN model
  • fastmhn.approx.approx_gradient_and_score() - Approximate gradient and score computation
  • fastmhn.exact.gradient_and_score() - Exact gradient and score computation
  • fastmhn.utility.create_pD() - Create probability distribution
  • fastmhn.utility.generate_data() - Generate synthetic data

License

This project is licensed under the MIT License - see the LICENSE file for details.

Repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastmhn-1.0.0.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastmhn-1.0.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file fastmhn-1.0.0.tar.gz.

File metadata

  • Download URL: fastmhn-1.0.0.tar.gz
  • Upload date:
  • Size: 23.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for fastmhn-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0280e91d9ffe814fbcd2374f397f977c54eaff0736efc1472ca29f709244c867
MD5 0dd6151c456b40ba32ffbee5615b201f
BLAKE2b-256 7941f60b61c21eb52ae5f9a8585d205fc546a28267d52fa8ae0494e5bd2e697d

See more details on using hashes here.

File details

Details for the file fastmhn-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: fastmhn-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for fastmhn-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ad6c34939c7b254171b79b342b734d9983d8b1989998f629ba5bf225978e65c7
MD5 7fcb07354dc3514f37fcfdcef494b8fa
BLAKE2b-256 f36f53ecec02929a9a823208ed803b3fbef09d695e550563d89dfdcc01148ae8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page