Skip to main content

Correlation based feature selection for MD data

Project description

Molecular Systems Automated Identification of Cooperativity

MoSAIC is a new method for correlation analysis which automatically detects collective motion in MD simulation data, identifies uncorrelated features as noise and hence provides a detailed picture of the key coordinates driving a conformational change in a biomolecular system. It is based on the Leiden community detection algorithm which is used to bring a correlation matrix in a block-diagonal form.

The method was published in:

G. Diez, D. Nagel, and G. Stock, Correlation-based feature selection to identify functional dynamcis in proteins, in preparation

We kindly ask you to cite this article in case you use this software package for published works.

Features

  • Intuitive usage via module and via CI
  • Sklearn-style API for fast integration into your Python workflow
  • No magic, only a single parameter
  • Extensive documentation and detailed discussion in publication

Installation

So far the package is only published to PyPI. Soon, it will be added conda-forge, as well. To install it within a python environment simple call:

python3 -m pip install --upgrade moldyn-mosaic

or for the latest dev version

# via ssh key
python3 -m pip install git+ssh://git@github.com/moldyn/MoSAIC.git

# or via password-based login
python3 -m pip install git+https://github.com/moldyn/MoSAIC.git

Shell Completion

Using the bash, zsh or fish shell click provides an easy way to provide shell completion, checkout the docs. In the case of bash you need to add following line to your ~/.bashrc

eval "$(_MOSAIC_COMPLETE=bash_source MoSAIC)"

Usage

In general one can call the module directly by its entry point $ MoSAIC or by calling the module $ python -m mosaic. The latter method is preferred to ensure using the desired python environment. For enabling the shell completion, the entry point needs to be used.

CI - Usage Directly from the Command Line

The module brings a rich CI using click. Each module and submodule contains a detailed help, which can be accessed by

$ python -m mosaic
Usage: python -m mosaic [OPTIONS] COMMAND [ARGS]...

  MoSAIC motion v0.1.0

  Molecular systems automated identification of collective motion, is
  a correlation based feature selection framework for MD data.
  Copyright (c) 2022, Georg Diez and Daniel Nagel

Options:
  --help  Show this message and exit.

Commands:
  clustering  Clustering similarity matrix of coordinates.
  similarity  Creating similarity matrix of coordinates.
  umap        Embedd similarity matrix with UMAP.

For more details on the submodule one needs to specify one of the three commands.

A simple workflow example for clustering the input file input_file using correlation and Leiden with CPM and the default resolution parameter:

# creating correlation matrix
$ python -m mosaic similarity -i input_file -o output_similarity -metric correlation -v

MoSAIC SIMILARITY
~~~ Initialize similarity class
~~~ Load file input_file
~~~ Fit input
~~~ Store similarity matrix in output_similarity

# clustering with CPM and default resolution parameter
# the latter needs to be fine-tuned to each matrix
$ python -m mosaic clustering -i output_similarity -o output_clustering --plot -v

MoSAIC CLUSTERING
~~~ Initialize clustering class
~~~ Load file output_similarity
~~~ Fit input
~~~ Store output
~~~ Plot matrix

This will generate the similarity matrix stored in output_similarity, the plotted result in output_clustering.matrix.pdf, the raw data of the matrix in output_clustering.matrix and a file containing in each row the indices of a cluster.

Module - Inside a Python Script

import mosaic

# Load file
# X is np.ndarray of shape (n_samples, n_features)

sim = mosaic.Similarity(
    metric='correlation',  # or 'NMI', 'GY', 'JSD'
)
sim.fit(X)


# Cluster matrix
clust = mosaic.Clustering(
    mode='CPM',  # or 'modularity
)
clust.fit(sim.matrix_)

clusters = clust.clusters_
clusterd_X = clust.matrix_
...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosaic-clustering-0.2.0.tar.gz (20.2 kB view hashes)

Uploaded Source

Built Distribution

mosaic_clustering-0.2.0-py3-none-any.whl (20.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page