Lumping tool for state trajectories

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

The Most Probable Path Algorithm for Reducing the Number of States in a State Trajectory

This package implements the most probable path (MPP) algorithm, which is used to coarse-grain the number of discrete states of a Markov process. Based on a microstate trajectory, a Markov state model is estimated utilising msmhelper. The transition probabilities and optional other descriptors are used to combine states in such a way that the final macrostates exhibit a given minimum population and metastability.

In the most basic example, a lumping tree is generated by iteratively selecting the least stable state and merging it with the state, to which its transition probability is highest. In a second step, the tree is parsed in reverse order (starting at the root) in order to identify the macrostates.

Full documentation: https://moldyn.github.io/MPP/

Features

Perform the most probable path (MPP) algorithm on a given microstate trajectory
Variety of analysis plots
Multi trajectory support
Extensions to the basic algorithm
- Similarity by Kullback-Leibler divergence of transition probabilities
- Incorporation Jenson-Shannon divergence of a feature (e.g. contact distances)
- Stochastic lumping
Three levels of user interface
- as integration in your Python code (use the MPP.Lumping or MPP.run.Data objects)
- as Python module (python -m MPP.run --help)
- in your Snakemake workflow (see workflow)

Installation

The package is available in the Python Package Index (PyPI) and can be installed via pip:

python -m pip install mpp-lumping

Caveat:

At the moment of the development, the bezier package didn't support python 3.13, which is why the package was only tested with Python 3.12.11.

Usage

Dependent on your skills and needs, the module can be used at three different entry points:

Integrate the central MPP.Lumping or MPP.run.Data object in your Python pipeline.
Use the high-level Python interface (python -m MPP.run ...) via the command line.
In a Snakemake workflow where you only need to provide the configuration of your system and you're ready to go.

Config File

Config files (YAML files) are used to pass the information of where the files are located and some lumping parameters. Below you see a reference config file with all possible parameters. Note that only the following fields are mandatory: microstate_trajectory, multi_feature_trajectory, frame_length, lagtime, pop_thr, q_min. Please refer to the documentation for a detailed description of the parameters.

# source: /optional/path/to/input/  # root directory for input files; defaults to the config file's own directory

microstate_trajectory: microstate_trajectory # the microstate trajectory
multi_feature_trajectory: contact_distances_trajectory # the feature trajectory, each line contains the feature values of the respective feature
cluster_file: clusters # defines the contact clusters, each row contains the contact indices of a contact cluster
contact_index_file: contacts.ndx # residue indices for contacts
limits: null # contains the lengths of each trajectory when multiple trajectories are concatenated

topology_file: structure.pdb # topology file used with the xtc trajectory
xtc_file: trajectory.xtc # the xtc trajectory file
helices: helices # definition of secondary structure elements

contact_threshold: null # threshold in the feature space below which e.g. a contact is considered to be formed
frame_length: 0.2 # in ns / frame
lagtime: 50 # in frames
pop_thr: 0.005 # population threshold for macrostates
q_min: 0.5 # minimum metastability of macrostates

n_timescales: 3 # number of timescales to plot in the implied timescales plot. 3 is the default.

# For stochastic lumping
stochastic:
  method: n
  param: 2
  n: 10

# PyMol rendering
view: view # contains the view information for PyMol
width: 500 # width of the image in px
height: 500 # height of the image in px

Python Module

Running the package requires a configuration file as described above as a first parameter. The following two positional arguments define the similarity between two states. First comes the dynamic similarity (T, KL, none), second the geometric similarity (JS, none). For reference, G-PCCA can be performed by issuing gpcca first and then the number of macrostates to create. Pass ref in order to take the number of macrostates from the T none lumping (the similarity between states corresponds to the transition probability between them).

Provide the target file (where to store the plot) with the option -o. The lumping tree (the result of the first, potentially intense step) is defined by a Z matrix, which can be stored and loaded by the -Z option. If the provided path exists, this file is loaded as Z matrix. For the --rmsd option, the same holds true as it is intense to calculate. With the --rmsd-feature option, you can select if the RMSD should be calculated for the C alpha atoms in the Cartesian coordinate space or the space of your feature, e.g. contact distances. -r allows you to draw N random frames of each macrostate and -p produces desired plots. --get-least-moving-residues saves the indices in the state trajectory of the least moving residues per macrostate to a text file. This allows to find the residues, which participate in the most stable contact distances for each macrostate.

~$ python -m MPP.run --help
usage: python -m MPP.run [-h] [-o PATH] [-Z PATH] [--rmsd PATH]
                         [--rmsd-feature CA|feature] [-r N] [-p PLOT]
                         [--scale FLOAT] [--n-timescales N]
                         [--get-least-moving-residues CONTACT_INDEX_FILE]
                         [--metrics]
                         config.yml d g

Run MPP (Most Probable Path) lumping on a Markov state model.
Reads a YAML configuration file, runs or loads a lumping, and optionally generates plots or exports results.

positional arguments:
  config.yml            YAML configuration file specifying input paths and
                        lumping parameters (source, microstate_trajectory,
                        multi_feature_trajectory, lagtime, pop_thr, q_min,
                        frame_length, and optional keys).
  d                     Dynamic similarity selector (lumping kernel). One of:
                        'T' (transition probability, recommended default),
                        'KL' (Kullback-Leibler divergence of transition rows),
                        'none' (feature-only mode, requires g=JS), or 'gpcca'
                        (GPCCA comparison run).
  g                     Feature similarity selector. One of: 'JS' (Jensen-
                        Shannon divergence of feature distributions) or 'none'
                        (no feature similarity). When d='gpcca': an integer
                        number of macrostates, or 'reference_count' to reuse
                        the macrostate count from the reference T lumping.

options:
  -h, --help            show this help message and exit
  -o PATH, --out PATH   Output file path for the plot or exported file
                        (required when -p or -r is used).
  -Z PATH               Path to the Z matrix file (.npy). If the file does not
                        exist, MPP is run and the result is saved here. If the
                        file already exists, it is loaded instead of
                        recomputed. Also writes macrostate_map.npy to the same
                        directory.
  --rmsd PATH           Compute per-macrostate C-alpha RMSD and write the
                        result to this .npy file.
  --rmsd-feature CA|feature
                        RMSD variant: 'CA' for C-alpha RMSD (default) or
                        'feature' for feature-based RMSD.
  -r N, --draw-random N
                        Write N random frame indices per macrostate as .ndx
                        files to the directory given by -o.
  -p PLOT, --plot PLOT  Plot type to generate and save to the path given by
                        -o. One of: dendrogram, timescales, sankey, contacts,
                        macrotraj, ck_test, rmsd, delta_rmsd, state_network,
                        macro_feature, stochastic_state_similarity,
                        relative_implied_timescales, transition_matrix,
                        transition_time, macrostate_trajectory. The
                        'macrostate_trajectory' type writes a text file of
                        macrostate assignments (one integer per line).
  --scale FLOAT         Scaling factor for plot size (default: 1).
  --n-timescales N      Number of implied timescales to compute (overrides the
                        n_timescales value in the config file).
  --get-least-moving-residues CONTACT_INDEX_FILE
                        Write the least-varying residues per macrostate to the
                        file given by -o, using CONTACT_INDEX_FILE as the
                        contact index.
  --metrics             Print all available quality metrics to stdout as
                        key=value pairs. Metrics reported: shannon_entropy,
                        davies_bouldin, gmrq, gmrq2, silhouette,
                        calinski_harabasz (one value per run, comma-separated
                        for stochastic lumpings).

Your can try the example in the GitHub repository by downloading the example directory, navigate into it and try a command like

python -m MPP.run sample_system/input/config.yml T none -Z sample_system/results/t/Z.npy -p dendrogram -o sample_system/results/t/dendrogram.pdf

Please note that not all functions of the package work here because the sample system is only a mock up.

The Snakemake Workflow

Snakemake is a workflow organization tool and used here to provide a high level user interface. In order to use it, prepare a conda environment with Snakemake installed:

~$ conda create -c conda-forge -c bioconda -c nodefaults -n snakemake snakemake

Make sure that GROMACS is callable (gmx command) and your ready to go (no need to install MPP manually).

Then copy the workflow directory to the same location as your data directory:

├── data/
│   ├── system1/
│   │   ├── input/
│   │   └── results/
│   ├── system2/
│   │   ├── input/
│   │   └── results/
│   └── ...
└── workflow/
    ├── mpp.yml
    ├── Snakefile
    └── ...

The Snakemake workflow requires this directory structure but the name of the data directory can be freely chosen. Only make sure to set it correctly in the Snakefile (data_root = "your_data_directory") and the config file of each system (source is the whole path to input directory). To create files with Snakemake, you only need to tell which file(s) you would like to create, e.g.

snakemake --cores 'all' --sdm conda -p example/sample_system/results/{t,t_js,kl,kl_js}/dendrogram.p{df,ng} --cache

creates dendrograms for four different lumpings as .pdf and .png files.

Explanation of some flags:

--cores Number of cores to utilize. 'all' for all cores.
--software-deployment-method, --sdm Use conda to deploy software environment.
--dry-run, --dryrun, -n Do not execute anything, just print out the jobs that would be run.
--cache So rules may be eligible for caching. Enable it with this option.
--force, -f Force recreation of the given file(s).
--printshellcmds, -p Print out the shell commands that are executed.

More information can be found here: Snakemake Documentation

Bash parameter expansion (the use of { and }) is possible to create e.g. several diagrams at once for multiple systems and/or setups.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

biofe

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.0

May 31, 2026

1.0.0

May 26, 2026

0.9.1

Sep 30, 2025

0.9.0

Sep 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mpp_lumping-1.1.0.tar.gz (2.5 MB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mpp_lumping-1.1.0-py3-none-any.whl (67.7 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file mpp_lumping-1.1.0.tar.gz.

File metadata

Download URL: mpp_lumping-1.1.0.tar.gz
Upload date: May 31, 2026
Size: 2.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mpp_lumping-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`477c1b78a2f7095a0e2a06000249183898bf2d4aa3276cc937ab2af5bac7a2d3`
MD5	`135e9ff24f168e89da05a863475480a6`
BLAKE2b-256	`8ba9ad4c4f3270e71bf72153538446fe3d99df4301cc6ce7050a4ad085360731`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mpp_lumping-1.1.0.tar.gz:

Publisher: python-publish.yml on moldyn/MPP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mpp_lumping-1.1.0.tar.gz
- Subject digest: 477c1b78a2f7095a0e2a06000249183898bf2d4aa3276cc937ab2af5bac7a2d3
- Sigstore transparency entry: 1682640292
- Sigstore integration time: May 31, 2026
Source repository:
- Permalink: moldyn/MPP@9e58f53fc7f55d4e5d79f9c0f02b00d3ad4c51b1
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/moldyn
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@9e58f53fc7f55d4e5d79f9c0f02b00d3ad4c51b1
- Trigger Event: release

File details

Details for the file mpp_lumping-1.1.0-py3-none-any.whl.

File metadata

Download URL: mpp_lumping-1.1.0-py3-none-any.whl
Upload date: May 31, 2026
Size: 67.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mpp_lumping-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`164aeb627ac1540cfa0299f8be10687e2eb43da978dc2576b965e13a56b8bb20`
MD5	`26b13a9e4b25fd8f7b29f89a0e72251d`
BLAKE2b-256	`b8fecd1db4af48b240e03b8e28e9c36360d04f4ccdac690cd62efe6b230856b9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mpp_lumping-1.1.0-py3-none-any.whl:

Publisher: python-publish.yml on moldyn/MPP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mpp_lumping-1.1.0-py3-none-any.whl
- Subject digest: 164aeb627ac1540cfa0299f8be10687e2eb43da978dc2576b965e13a56b8bb20
- Sigstore transparency entry: 1682640325
- Sigstore integration time: May 31, 2026
Source repository:
- Permalink: moldyn/MPP@9e58f53fc7f55d4e5d79f9c0f02b00d3ad4c51b1
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/moldyn
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@9e58f53fc7f55d4e5d79f9c0f02b00d3ad4c51b1
- Trigger Event: release

mpp-lumping 1.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

The Most Probable Path Algorithm for Reducing the Number of States in a State Trajectory

Features

Installation

Caveat:

Usage

Config File

Python Module

The Snakemake Workflow

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance