Variable Length Markov Chain

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Variable Length Markov Model (VLMC)

Implementation of Variable Length Markov Chains (VLMC) for Python. Suffix tree building is done top-down using the Peres-Shield order estimation method. It is written in Rust and ported to Python with PyO3.

Instalation

You can install using pip.

pip install vlmc

Basic Usage

import vlmc
tree = vlmc.VLMC(max_depth, alphabet_size, n_jobs=-1)

Parameters: -max_depth: Maximum depth of tree. Subsequences whose length exceed the max_depth will not be considered nor counted. -alphabet_size: Total number of symbols in the alphabet. This number has to be bigger than the highest integer encountered, else it will cause runtime errors. -n_jobs: Number of subprocesses to spawn when running the vlmc. Choose -1 for using all available processes.

Methods

`fit`

data = [
[1,2,3],
[2,3],
[1,0,1],
[2]
]

tree.fit(data)

Note fit method returns None and not self. This is by design as to not expose the rust object to python.

Parameters: -data: List of lists containing sequences of discrete values to fit on. Values are assumed to be integers form 0 to alphabet_size. List is expected to be two dimensional.

`get_suffix`

Given a sequence, returns the longest suffix that is present in the VLMC.

suffix = tree.get_suffix(sequence)

Arguments: -sequence: list of integers representing a sequence of discrete varaibles. Returns:

suffix : longest suffix of sequence that is present in the VLMC.

`get_counts`

Gets the total number of occurences of a given sequence of integers. Will throw a KeyError if the sequence is not a tree node. Consider using get_suffix to make sure to get a tree node.

counts = tree.get_counts(sequence)

Arguments: -sequence: list of integers representing a sequence of discrete varaibles. Returns:

counts : integer

`get_distribution`

Gets the vector of probabilities over the entire alphabet for the given sequence. Will throw a KeyError if the sequence is not a tree node. Consider using get_suffix to make sure to get a tree node.

probabilities = tree.get_distribution(sequence)

Arguments: -sequence: list of integers representing a sequence of discrete variables. Returns:

probabilities : list of floats representing the probability of observing a specific state (index) as the next symbol.

`get_contexts`

contexts = tree.get_contexts()

Returns: -contexts: list of relevant contexts according to the Peres-Shield tree prunning method. Contexts are ordered by length.

Parameters

`contexts`

contexts = tree.contexts

Returns:

contexts : list of sequences of each node in the tree, each relevant context.

`adjacency_matrix`

matrix, labels = tree.adjacency_matrix

Returns:

matrix : adjacency matrix representing the suffix tree.
labels : list of sequences associated with each node of the tree.

TODO

Paralelization

After experimentation the best possible idea for paralelization would be to create different hashmaps for each sunsequence length. Hashmaps are then joined from longest to smallest. The hashmap at max_depth + 1 can be discarded after. Could be very fast depending on merging algo.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.0

Jun 7, 2023

0.1.5

Jun 6, 2023

This version

0.1.2

Jun 6, 2023

0.1.1

Jun 6, 2023

0.1.0

Jun 6, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlmc-0.1.2.tar.gz (7.9 kB view hashes)

Uploaded Jun 6, 2023 Source

Built Distribution

vlmc-0.1.2-cp37-abi3-macosx_11_0_arm64.whl (205.0 kB view hashes)

Uploaded Jun 6, 2023 CPython 3.7+ macOS 11.0+ ARM64

Hashes for vlmc-0.1.2.tar.gz

Hashes for vlmc-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`85fc919e13ebf16033a39ce58bd5f6a8f144bd8b6a792caef029f3f2b722b7f6`
MD5	`3bd2df4da81e334686473dd5c9f23fa0`
BLAKE2b-256	`94178c672c14c0f1102cc6b03208f348b7e45f49051b6830d5300d38385e67fb`

Hashes for vlmc-0.1.2-cp37-abi3-macosx_11_0_arm64.whl

Hashes for vlmc-0.1.2-cp37-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`d8d1cf3e6a3789ac6001b282def25ef5bfd66b30006ec45e2abbfb68e942dde4`
MD5	`e48ab460eca8e5eb53915cf35b788aad`
BLAKE2b-256	`7a3da366a8ec27ea5b5029b006801244dac0955d2054d04e229cf2410d1f2344`