Simulate and analyze trajectories of Block Markov Chains.
Project description
Detection and evaluation of clusters within sequential data
This contains public source code for the research article "Detection and evaluation of clusters within sequential data."
BMCToolkit
This Python package provides tools to simulate and analyze trajectories of Block Markov Chains (BMCs).
Contents
This Python module distributes a Dynamic-Link Library (DLL) written in C++. Among other functionalities, the DLL is able to calculate both projected and lifted variants of the equilibrium distribution, frequency matrix, and transition matrix of a BMC; to compute the difference between two clusters and the spectral norm; to estimate the parameters of a BMC from a sample path; to execute the spectral clustering algorithm and the cluster improvement algorithm; to generate sample paths and trimmed frequency matrices; and to relabel clusters according to the size or the equilibrium probability of a cluster. The package includes an easy-to-use Python interface to the DLL, and stubs for it.
Related literature
This module was introduced in the following scientific article:
- Alexander Van Werde, Albert Senen-Cerda, Gianluca Kosmella, Jaron Sanders (2022). Detection and Evaluation of Clusters within Sequential Data. Preprint. Preprint available at ArXiv 2210.01679.
This module was improved in the following scientific article:
- Thomas van Vuren, Thomas Cronk, Jaron Sanders (2024). Detecting the number of clusters of a Block Markov Chain. Preprint. Preprint available on Arxiv 2407.18287.
The module also relates to the following scientific articles:
- Jaron Sanders, Alexandre Proutiere, Se-Young Yun (2019). Clustering in Block Markov Chains. Annals of Statistics. Preprint available at ArXiv 1712.09232v3.
- Jaron Sanders, Albert Senen-Cerda (2021). Spectral norm bounds for block Markov chain random matrices. Stochastic Processes and their Applications. Preprint available at ArXiv 2111.06201.
- Jaron Sanders, Alexander Van Werde (2022). Singular value distribution of dense random matrices with block Markovian dependence. Stochastic Processes and their Applications. Preprint available at ArXiv 2204.13534.
- Alexander Van Werde, Jaron Sanders (2023). Matrix concentration inequalities with dependent summands and sharp leading-order terms. Preprint. Preprint available at ArXiv 2307.11632.
Python3 compatibility
As of version 0.8.0, BMCToolkit works with several versions of Python. However, please be warned that BMCToolkit has not been extensively tested for every different combination of Python version and/or numpy version.
Specifically, using vcpkg, BMCToolkit is compiled against the following Python3 ports:
vcpkg's builtin-baseline | Python3 version |
---|---|
6db51d86a9c2796581d74c9a7eb46e52ee8cb7eb | 3.11.8#4 |
f84c18b6a5fb31983853b0d481b9de0263d6fc6c | 3.10.7#7 |
2ed5383f7b88b23975f9cfd325f6451fd8716fb2 | 3.9.7#2 |
Related libraries
The DLL utilizes the following C++ libraries / interfaces, in unmodified form:
- NVIDIA CUDA Toolkit
- Eigen
- OpenMP
- Pybind11
- Sparse Eigenvalue Computation Toolkit as a Redesigned ARPACK (SPECTRA)
The C++ library was originally tested using Microsoft's Unit Testing Framework. These C++ tests were later migrated to Google's Testing and Mocking Framework. The Python module is tested for a bit with Pytest.
Example
Here is an example that shows how to use BMCToolkit:
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.
import random
import matplotlib.pyplot as plt
import numpy as np
import BMCToolkit as BMCToolkit
if __name__ == "__main__":
print("Initializing test variables...")
transition_matrix_of_MC = [[0.6, 0.4], [0.7,0.3]]
transition_matrix_of_BMC = [[0.3, 0.7], [0.2,0.8]]
rel_cluster_sizes = [[0.4], [0.6]]
num_clusters = 2
abs_size = 100
trajectory_length = 150000
print("Generating a trajectory of a BMC...")
trajectory = BMCToolkit.generate_sample_path_of_BMC( transition_matrix_of_BMC, rel_cluster_sizes, abs_size, trajectory_length, 1234)
print(trajectory)
print("Generating a trajectory of a perturbed MC...")
trajectory = BMCToolkit.generate_sample_path_of_perturbed_BMC( transition_matrix_of_BMC, rel_cluster_sizes, 2, 0.25, transition_matrix_of_MC, trajectory_length, 1234)
print(trajectory)
print("Generating a trajectory of a MC...")
trajectory = BMCToolkit.generate_sample_path_of_MC(transition_matrix_of_MC, trajectory_length, 1234)
print(trajectory)
time.sleep(3) # Pause for a few seconds.
print("Generating a (nontrimmed) frequency matrix from a BMC...")
frequency_matrix = BMCToolkit.generate_trimmed_matrix( transition_matrix_of_BMC, rel_cluster_sizes, abs_size, trajectory_length, 0, 1234)
print(frequency_matrix)
print("Testing matrix trimming...")
trimmed_matrix = BMCToolkit.trim_count_matrix(frequency_matrix,5)
print(trimmed_matrix)
time.sleep(3) # Pause for a few seconds.
print("Testing compute_clusters_from_trajectory, by default via CPU...")
clustering1 = BMCToolkit.compute_clusters_from_trajectory( trajectory, abs_size, num_clusters, preferred_hardware = BMCToolkit.PreferredHardware.CPU )
print(clustering1)
print("Testing compute_clusters_from_trajectory via GPU instead...")
clustering1 = BMCToolkit.compute_clusters_from_trajectory( trajectory, abs_size, num_clusters, preferred_hardware = BMCToolkit.PreferredHardware.GPU )
print(clustering1)
time.sleep(6) # Pause for a few seconds.
print("Testing compute_cluster_improvement...")
clustering2 = BMCToolkit.compute_cluster_improvement(frequency_matrix, clustering1, 10)
print(clustering2)
time.sleep(3) # Pause for a few seconds.
print("Testing compute_bmc_parameters...")
print( BMCToolkit.compute_bmcs_parameters(frequency_matrix, clustering2))
print("Testing compute_k_means...")
clustering3 = BMCToolkit.compute_k_means(frequency_matrix, num_clusters, preferred_hardware = BMCToolkit.PreferredHardware.CPU)
print(clustering3)
time.sleep(3) # Pause for a few seconds.
print("Testing compute_neighborhoods_from_rows_and_columns...")
radius = np.sqrt( trajectory_length / ( abs_size * abs_size) ) * np.log( trajectory_length / abs_size)
neighborhoods = BMCToolkit.compute_neighborhoods_from_rows_and_columns( frequency_matrix, radius, abs_size)
print(neighborhoods)
time.sleep(3) # Pause for a few seconds.
print("Testing compute_spectral_clustering...")
spectral_clustering = BMCToolkit.compute_spectral_clustering(frequency_matrix, num_clusters)
print(spectral_clustering)
print("Testing compute_cluster_improvement...")
improved_clustering = BMCToolkit.compute_cluster_improvement(frequency_matrix, spectral_clustering, 10)
print(improved_clustering)
print("Testing compute_spectral_norm...")
print(BMCToolkit.compute_spectral_norm(frequency_matrix, BMCToolkit.SingularValueCalculationMethod.VIA_BDCSVD))
print(BMCToolkit.compute_spectral_norm(frequency_matrix, BMCToolkit.SingularValueCalculationMethod.VIA_HERMITIZATION))
print(BMCToolkit.compute_spectral_norm(frequency_matrix, BMCToolkit.SingularValueCalculationMethod.VIA_MATRIX_PRODUCT))
time.sleep(3) # Pause for a few seconds.
print(BMCToolkit.compute_equilibrium_distribution_lift([[0.3, 0.7], [0.2,0.8]],[[0.4],[0.6]],10))
time.sleep(3) # Pause for a few seconds.
print("Testing compute_num_clusters...")
ratio = trajectory_length / abs_size
radius = ratio / np.sqrt( abs_size * np.log(ratio) )
neighborhood_size_threshold = abs_size * np.log(ratio) / ratio
singular_value_threshold = np.sqrt( ratio ) * np.log(ratio)
num_indices = int(np.ceil( np.log(abs_size) ))
estimated_num_clusters = BMCToolkit.compute_num_clusters(trimmed_matrix, radius, neighborhood_size_threshold, singular_value_threshold, num_indices)
print(estimated_num_clusters)
time.sleep(3) # Pause for a few seconds.
print("Testing generate_random_probability_vector...")
print(BMCToolkit.generate_random_probability_vector(3, 123))
print("Testing generate_random_transition_matrix...")
print(BMCToolkit.generate_random_transition_matrix(4, 456))
time.sleep(3) # Pause for a few seconds.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file BMCToolkit-0.10.0-cp311-cp311-win_amd64.whl
.
File metadata
- Download URL: BMCToolkit-0.10.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 576.7 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a07733ece8dff9078d2a897f26b5e8a7c320917bef47eb6ba5f907fd2e65ac39 |
|
MD5 | 9f1c1fa6409f79a95ea7b2f0f56ce5bf |
|
BLAKE2b-256 | eb90a342c27fa1a9d026745ffb453b8a0aed089c740b5fd3db407a138ff111c2 |