Skip to main content

Simulate and analyze trajectories of Block Markov Chains.

Project description

Detection and evaluation of clusters within sequential data

This contains public source code for the research article "Detection and evaluation of clusters within sequential data."

BMCToolkit

This Python package provides tools to simulate and analyze trajectories of Block Markov Chains (BMCs).

Contents

This Python module distributes a Dynamic-Link Library (DLL) written in C++. Among other functionalities, the DLL is able to calculate both projected and lifted variants of the equilibrium distribution, frequency matrix, and transition matrix of a BMC; to compute the difference between two clusters and the spectral norm; to estimate the parameters of a BMC from a sample path; to execute the spectral clustering algorithm and the cluster improvement algorithm; to generate sample paths and trimmed frequency matrices; and to relabel clusters according to the size or the equilibrium probability of a cluster. The package includes an easy-to-use Python interface to the DLL, and stubs for it.

Related literature

This module was introduced in the following scientific article:

  1. Alexander Van Werde, Albert Senen-Cerda, Gianluca Kosmella, Jaron Sanders (2022). Detection and Evaluation of Clusters within Sequential Data. Preprint. Preprint available at ArXiv 2210.01679.

This module was improved in the following scientific article:

  1. Thomas van Vuren, Thomas Cronk, Jaron Sanders (2024). Detecting the number of clusters of a Block Markov Chain. Preprint. Preprint available on Arxiv 2407.18287.

The module also relates to the following scientific articles:

  1. Jaron Sanders, Alexandre Proutiere, Se-Young Yun (2019). Clustering in Block Markov Chains. Annals of Statistics. Preprint available at ArXiv 1712.09232v3.
  2. Jaron Sanders, Albert Senen-Cerda (2021). Spectral norm bounds for block Markov chain random matrices. Stochastic Processes and their Applications. Preprint available at ArXiv 2111.06201.
  3. Jaron Sanders, Alexander Van Werde (2022). Singular value distribution of dense random matrices with block Markovian dependence. Stochastic Processes and their Applications. Preprint available at ArXiv 2204.13534.
  4. Alexander Van Werde, Jaron Sanders (2023). Matrix concentration inequalities with dependent summands and sharp leading-order terms. Preprint. Preprint available at ArXiv 2307.11632.

Python3 compatibility

As of version 0.8.0, BMCToolkit works with several versions of Python. However, please be warned that BMCToolkit has not been extensively tested for every different combination of Python version and/or numpy version.

Specifically, using vcpkg, BMCToolkit is compiled against the following Python3 ports:

vcpkg's builtin-baseline Python3 version
6db51d86a9c2796581d74c9a7eb46e52ee8cb7eb 3.11.8#4
f84c18b6a5fb31983853b0d481b9de0263d6fc6c 3.10.7#7
2ed5383f7b88b23975f9cfd325f6451fd8716fb2 3.9.7#2

Related libraries

The DLL utilizes the following C++ libraries / interfaces, in unmodified form:

The C++ library was originally tested using Microsoft's Unit Testing Framework. These C++ tests were later migrated to Google's Testing and Mocking Framework. The Python module is tested for a bit with Pytest.

Example

Here is an example that shows how to use BMCToolkit:

# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.

import random
import matplotlib.pyplot as plt
import numpy as np
import BMCToolkit as BMCToolkit

if __name__ == "__main__":

    print("Initializing test variables...")
    transition_matrix_of_MC = [[0.6, 0.4], [0.7,0.3]]
    transition_matrix_of_BMC = [[0.3, 0.7], [0.2,0.8]]
    rel_cluster_sizes = [[0.4], [0.6]]
    num_clusters = 2

    abs_size = 100
    trajectory_length = 150000

    print("Generating a trajectory of a BMC...")
    trajectory = BMCToolkit.generate_sample_path_of_BMC( transition_matrix_of_BMC, rel_cluster_sizes, abs_size, trajectory_length, 1234)
    print(trajectory)
    print("Generating a trajectory of a perturbed MC...")
    trajectory = BMCToolkit.generate_sample_path_of_perturbed_BMC( transition_matrix_of_BMC, rel_cluster_sizes, 2, 0.25, transition_matrix_of_MC, trajectory_length, 1234)
    print(trajectory)
    print("Generating a trajectory of a MC...")
    trajectory = BMCToolkit.generate_sample_path_of_MC(transition_matrix_of_MC, trajectory_length, 1234)
    print(trajectory)
    time.sleep(3) # Pause for a few seconds.

    print("Generating a (nontrimmed) frequency matrix from a BMC...")
    frequency_matrix = BMCToolkit.generate_trimmed_matrix( transition_matrix_of_BMC, rel_cluster_sizes, abs_size, trajectory_length, 0, 1234)
    print(frequency_matrix)
    print("Testing matrix trimming...")
    trimmed_matrix = BMCToolkit.trim_count_matrix(frequency_matrix,5)
    print(trimmed_matrix)
    time.sleep(3) # Pause for a few seconds.

    print("Testing compute_clusters_from_trajectory, by default via CPU...")
    clustering1 = BMCToolkit.compute_clusters_from_trajectory( trajectory, abs_size, num_clusters, preferred_hardware = BMCToolkit.PreferredHardware.CPU )
    print(clustering1)
    print("Testing compute_clusters_from_trajectory via GPU instead...")
    clustering1 = BMCToolkit.compute_clusters_from_trajectory( trajectory, abs_size, num_clusters, preferred_hardware = BMCToolkit.PreferredHardware.GPU )
    print(clustering1)
    time.sleep(6) # Pause for a few seconds.

    print("Testing compute_cluster_improvement...")
    clustering2 = BMCToolkit.compute_cluster_improvement(frequency_matrix, clustering1, 10)
    print(clustering2)
    time.sleep(3) # Pause for a few seconds.

    print("Testing compute_bmc_parameters...")
    print( BMCToolkit.compute_bmcs_parameters(frequency_matrix, clustering2))

    print("Testing compute_k_means...")
    clustering3 = BMCToolkit.compute_k_means(frequency_matrix, num_clusters, preferred_hardware = BMCToolkit.PreferredHardware.CPU)
    print(clustering3)
    time.sleep(3) # Pause for a few seconds.

    print("Testing compute_neighborhoods_from_rows_and_columns...")
    radius = np.sqrt( trajectory_length / ( abs_size * abs_size) ) * np.log( trajectory_length / abs_size)
    neighborhoods = BMCToolkit.compute_neighborhoods_from_rows_and_columns( frequency_matrix, radius, abs_size)
    print(neighborhoods)
    time.sleep(3) # Pause for a few seconds.

    print("Testing compute_spectral_clustering...")
    spectral_clustering = BMCToolkit.compute_spectral_clustering(frequency_matrix, num_clusters)
    print(spectral_clustering)

    print("Testing compute_cluster_improvement...")
    improved_clustering = BMCToolkit.compute_cluster_improvement(frequency_matrix, spectral_clustering, 10)
    print(improved_clustering)

    print("Testing compute_spectral_norm...")
    print(BMCToolkit.compute_spectral_norm(frequency_matrix, BMCToolkit.SingularValueCalculationMethod.VIA_BDCSVD))
    print(BMCToolkit.compute_spectral_norm(frequency_matrix, BMCToolkit.SingularValueCalculationMethod.VIA_HERMITIZATION))
    print(BMCToolkit.compute_spectral_norm(frequency_matrix, BMCToolkit.SingularValueCalculationMethod.VIA_MATRIX_PRODUCT))
    time.sleep(3) # Pause for a few seconds.

    print(BMCToolkit.compute_equilibrium_distribution_lift([[0.3, 0.7], [0.2,0.8]],[[0.4],[0.6]],10))
    time.sleep(3) # Pause for a few seconds.

    print("Testing compute_num_clusters...")
    ratio = trajectory_length / abs_size
    radius = ratio / np.sqrt( abs_size * np.log(ratio) )
    neighborhood_size_threshold = abs_size * np.log(ratio) / ratio
    singular_value_threshold = np.sqrt( ratio ) * np.log(ratio)
    num_indices = int(np.ceil( np.log(abs_size) ))
    estimated_num_clusters = BMCToolkit.compute_num_clusters(trimmed_matrix, radius, neighborhood_size_threshold, singular_value_threshold, num_indices)
    print(estimated_num_clusters)
    time.sleep(3) # Pause for a few seconds.

    print("Testing generate_random_probability_vector...")
    print(BMCToolkit.generate_random_probability_vector(3, 123))

    print("Testing generate_random_transition_matrix...")
    print(BMCToolkit.generate_random_transition_matrix(4, 456))
    time.sleep(3) # Pause for a few seconds.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

BMCToolkit-0.10.0-cp311-cp311-win_amd64.whl (576.7 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page