Skip to main content

Simulate and analyze trajectories of Block Markov Chains.

Project description

This Python package provides tools to simulate and analyze trajectories of Block Markov Chains (BMCs).

Contents

This Python module distributes a Dynamic-Link Library (DLL) written in C++. Among other functionalities, the DLL is able to calculate both projected and lifted variants of the equilibrium distribution, frequency matrix, and transition matrix of a BMC; to compute the difference between two clusters and the spectral norm; to estimate the parameters of a BMC from a sample path; to execute the spectral clustering algorithm and the cluster improvement algorithm; to generate sample paths and trimmed frequency matrices; and to relabel clusters according to the size or the equilibrium probability of a cluster. The package includes an easy-to-use Python interface to the DLL, and stubs for it.

Related scientific articles

This module was introduced in and written for the following scientific article:

  1. Alexander Van Werde, Albert Senen-Cerda, Gianluca Kosmella, Jaron Sanders (2022). Detection and Evaluation of Clusters within Sequential Data. Preprint. ArXiv 2210.01679.

The module also relates to the following scientific articles:

  1. Jaron Sanders, Alexandre Proutiere, Se-Young Yun (2019). Clustering in Block Markov Chains. Annals of Statistics. ArXiv 1712.09232v3.

  2. Jaron Sanders, Albert Senen-Cerda (2021). Spectral norm bounds for block Markov chain random matrices. Preprint. ArXiv 2111.06201.

  3. Jaron Sanders, Alexander Van Werde (2022). Singular value distribution of dense random matrices with block Markovian dependence. Preprint. ArXiv 2204.13534.

Requirements

This module was compiled for Python 3.9.6, and only tested with one specific version of Python. However, experience showed that the module also works with certain other versions of Python ">= 3.9.6".

Similarly, this module was tested primarily with recent versions of numpy; version 1.23.1 at the time of writing.

Related libraries

The DLL utilizes the following C++ libraries / interfaces, in unmodified form:

Example

Here is an example on how to use BMCToolkit:

# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.

import random
import matplotlib.pyplot as plt
import numpy as np
import BMCToolkit as BMCToolkit

if __name__ == "__main__":

    num_states = 10*30
    num_clusters = 2
    path_length = 1500

    current_state = 0
    frequency_matrix = np.zeros((num_states,num_states))
    trajectory = []
    for t in range(path_length):
        past_state = current_state
        if t % 2 == 0:
            if current_state <= num_states / 3:
                current_state = random.randint(num_states/3,num_states-1)
            else:
                current_state = random.randint(0,num_states/3-1)
        else:
            current_state = random.randint(0,num_states-1)    
        trajectory.append(current_state)
        frequency_matrix[past_state,current_state] += 1
    
    print("Testing compute_clusters_from_trajectory...")
    clustering1 = BMCToolkit.compute_clusters_from_trajectory( trajectory, num_states, num_clusters )
    print(clustering1)

    print("Testing compute_cluster_improvement...")
    clustering2 = BMCToolkit.compute_cluster_improvement(frequency_matrix, clustering1, 10)
    print(clustering2)
    
    print("Testing matrix trimming...")
    trimmed_matrix = BMCToolkit.trim_frequency_matrix(frequency_matrix,5)
    print(trimmed_matrix)
    print(frequency_matrix)

    print("Testing compute_bmc_parameters...")
    print( BMCToolkit.compute_bmcs_parameters(frequency_matrix, clustering2))

    print("Testing compute_k_means...")
    clustering3 = BMCToolkit.compute_k_means(frequency_matrix, num_clusters)
    print(clustering3)   

    print("Testing compute_spectral_clustering...")
    spectral_clustering = BMCToolkit.compute_spectral_clustering(frequency_matrix, num_clusters)
    print(spectral_clustering)

    print("Testing compute_spectral_clustering with full arguments...")
    improved_clustering = BMCToolkit.compute_spectral_clustering(frequency_matrix, num_clusters, 1987, 10000, 1000, 0, 10, False)
    print(improved_clustering)

    print("Testing compute_spectral_clustering with a negative argument...")
    improved_clustering = BMCToolkit.compute_spectral_clustering(frequency_matrix, num_clusters, 1987, 10000, 1000, 0, -1, False)
    print(improved_clustering)    

    print("Testing compute_cluster_improvement...")
    improved_clustering = BMCToolkit.compute_cluster_improvement(frequency_matrix, spectral_clustering, 10)
    print(clustering2)
    

    print("Spectral norm: ")
    print(BMCToolkit.compute_spectral_norm(frequency_matrix))
    print(BMCToolkit.get_equilibrium_distribution_lift([[0.3, 0.7], [0.2,0.8]],[[0.4],[0.6]],10))


    improv_alpha = BMCToolkit.compute_bmcs_parameters(frequency_matrix,improved_clustering)[:,0]
    improv_pi = BMCToolkit.compute_bmcs_parameters(frequency_matrix,improved_clustering)[:,1] 
    improv_p = BMCToolkit.compute_bmcs_parameters(frequency_matrix,improved_clustering)[:,2:]
    spectral_alpha = BMCToolkit.compute_bmcs_parameters(frequency_matrix,spectral_clustering)[:,0]
    spectral_pi = BMCToolkit.compute_bmcs_parameters(frequency_matrix,spectral_clustering)[:,1] 
    spectral_p = BMCToolkit.compute_bmcs_parameters(frequency_matrix,spectral_clustering)[:,2:]

    model_P = np.array( [ [ improv_p[improved_clustering[i],improved_clustering[j]] / (num_states * improv_alpha[improved_clustering[j]] ) for j in range(0,num_states) ] for i in range(0,num_states) ] )
    model_Q = np.array( [ [ spectral_p[spectral_clustering[i],spectral_clustering[j]] / (num_states * spectral_alpha[spectral_clustering[j]] ) for j in range(0,num_states) ] for i in range(0,num_states) ] )

    plt.figure()
    output = BMCToolkit.KL_divergence_rate_difference_between_models(model_P, model_Q, trajectory, 0.1, 0.95, 50)
    x_data = list(output.keys())
    y_data = [ value[0] for value in output.values() ]
    y_err = [ value[1] for value in output.values() ]
    print(x_data)
    print(y_data)
    print(y_err)
    plt.errorbar(x_data, y_data, y_err)

    pplot = plt.matshow(model_P)  
    plt.colorbar(pplot)
    pplot = plt.matshow(model_Q)  
    plt.colorbar(pplot)

    plt.show()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

BMCToolkit-0.7.7-cp39-cp39-win_amd64.whl (345.4 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page