Simulate and analyze trajectories of Block Markov Chains.
Project description
This Python package provides tools to simulate and analyze trajectories of Block Markov Chains (BMCs).
Contents
This Python module distributes a Dynamic-Link Library (DLL) written in C++. Among other functionalities, the DLL is able to calculate both projected and lifted variants of the equilibrium distribution, frequency matrix, and transition matrix of a BMC; to compute the difference between two clusters and the spectral norm; to estimate the parameters of a BMC from a sample path; to execute the spectral clustering algorithm and the cluster improvement algorithm; to generate sample paths and trimmed frequency matrices; and to relabel clusters according to the size or the equilibrium probability of a cluster. The package includes an easy-to-use Python interface to the DLL, and stubs for it.
Related scientific articles
This module was introduced in and written for the following scientific article:
- Alexander Van Werde, Albert Senen-Cerda, Gianluca Kosmella, Jaron Sanders (2022). Detection and Evaluation of Clusters within Sequential Data. Preprint. ArXiv 2210.01679.
The module also relates to the following scientific articles:
-
Jaron Sanders, Alexandre Proutiere, Se-Young Yun (2019). Clustering in Block Markov Chains. Annals of Statistics. ArXiv 1712.09232v3.
-
Jaron Sanders, Albert Senen-Cerda (2021). Spectral norm bounds for block Markov chain random matrices. Preprint. ArXiv 2111.06201.
-
Jaron Sanders, Alexander Van Werde (2022). Singular value distribution of dense random matrices with block Markovian dependence. Preprint. ArXiv 2204.13534.
Requirements
This module was compiled for Python 3.9.6, and only tested with one specific version of Python. However, experience showed that the module also works with certain other versions of Python ">= 3.9.6".
Similarly, this module was tested primarily with recent versions of numpy; version 1.23.1 at the time of writing.
Related libraries
The DLL utilizes the following C++ libraries / interfaces, in unmodified form:
- the Eigen library, available at https://eigen.tuxfamily.org/;
- the OpenMP, available at https://www.openmp.org/;
- the Pybind11, available at https://pybind11.readthedocs.io/;
- the Sparse Eigenvalue Computation Toolkit as a Redesigned ARPACK (SPECTRA) library, available at https://spectralib.org/.
Example
Here is an example on how to use BMCToolkit:
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.
import random
import matplotlib.pyplot as plt
import numpy as np
import BMCToolkit as BMCToolkit
if __name__ == "__main__":
num_states = 10*30
num_clusters = 2
path_length = 1500
current_state = 0
frequency_matrix = np.zeros((num_states,num_states))
trajectory = []
for t in range(path_length):
past_state = current_state
if t % 2 == 0:
if current_state <= num_states / 3:
current_state = random.randint(num_states/3,num_states-1)
else:
current_state = random.randint(0,num_states/3-1)
else:
current_state = random.randint(0,num_states-1)
trajectory.append(current_state)
frequency_matrix[past_state,current_state] += 1
print("Testing compute_clusters_from_trajectory...")
clustering1 = BMCToolkit.compute_clusters_from_trajectory( trajectory, num_states, num_clusters )
print(clustering1)
print("Testing compute_cluster_improvement...")
clustering2 = BMCToolkit.compute_cluster_improvement(frequency_matrix, clustering1, 10)
print(clustering2)
print("Testing matrix trimming...")
trimmed_matrix = BMCToolkit.trim_frequency_matrix(frequency_matrix,5)
print(trimmed_matrix)
print(frequency_matrix)
print("Testing compute_bmc_parameters...")
print( BMCToolkit.compute_bmcs_parameters(frequency_matrix, clustering2))
print("Testing compute_k_means...")
clustering3 = BMCToolkit.compute_k_means(frequency_matrix, num_clusters)
print(clustering3)
print("Testing compute_spectral_clustering...")
spectral_clustering = BMCToolkit.compute_spectral_clustering(frequency_matrix, num_clusters)
print(spectral_clustering)
print("Testing compute_spectral_clustering with full arguments...")
improved_clustering = BMCToolkit.compute_spectral_clustering(frequency_matrix, num_clusters, 1987, 10000, 1000, 0, 10, False)
print(improved_clustering)
print("Testing compute_spectral_clustering with a negative argument...")
improved_clustering = BMCToolkit.compute_spectral_clustering(frequency_matrix, num_clusters, 1987, 10000, 1000, 0, -1, False)
print(improved_clustering)
print("Testing compute_cluster_improvement...")
improved_clustering = BMCToolkit.compute_cluster_improvement(frequency_matrix, spectral_clustering, 10)
print(clustering2)
print("Spectral norm: ")
print(BMCToolkit.compute_spectral_norm(frequency_matrix))
print(BMCToolkit.get_equilibrium_distribution_lift([[0.3, 0.7], [0.2,0.8]],[[0.4],[0.6]],10))
improv_alpha = BMCToolkit.compute_bmcs_parameters(frequency_matrix,improved_clustering)[:,0]
improv_pi = BMCToolkit.compute_bmcs_parameters(frequency_matrix,improved_clustering)[:,1]
improv_p = BMCToolkit.compute_bmcs_parameters(frequency_matrix,improved_clustering)[:,2:]
spectral_alpha = BMCToolkit.compute_bmcs_parameters(frequency_matrix,spectral_clustering)[:,0]
spectral_pi = BMCToolkit.compute_bmcs_parameters(frequency_matrix,spectral_clustering)[:,1]
spectral_p = BMCToolkit.compute_bmcs_parameters(frequency_matrix,spectral_clustering)[:,2:]
model_P = np.array( [ [ improv_p[improved_clustering[i],improved_clustering[j]] / (num_states * improv_alpha[improved_clustering[j]] ) for j in range(0,num_states) ] for i in range(0,num_states) ] )
model_Q = np.array( [ [ spectral_p[spectral_clustering[i],spectral_clustering[j]] / (num_states * spectral_alpha[spectral_clustering[j]] ) for j in range(0,num_states) ] for i in range(0,num_states) ] )
plt.figure()
output = BMCToolkit.KL_divergence_rate_difference_between_models(model_P, model_Q, trajectory, 0.1, 0.95, 50)
x_data = list(output.keys())
y_data = [ value[0] for value in output.values() ]
y_err = [ value[1] for value in output.values() ]
print(x_data)
print(y_data)
print(y_err)
plt.errorbar(x_data, y_data, y_err)
pplot = plt.matshow(model_P)
plt.colorbar(pplot)
pplot = plt.matshow(model_Q)
plt.colorbar(pplot)
plt.show()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for BMCToolkit-0.7.7-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09ad8f1c8385801a88a8fa471643989478b8f6e7b71655cefded3747c96fa777 |
|
MD5 | 67229f24467cb5aabd18b07660736d96 |
|
BLAKE2b-256 | e1af11f1047bd149b390bb03ad189af6d5dca9a7faa43628a13a000c598f35e6 |