Skip to main content

Variable Length Markov Chain

Project description

Variable Length Markov Model (VLMC)

Downloads PyPI version

Implementation of Variable Length Markov Chains (VLMC) for Python. Suffix tree building is done top-down using the Peres-Shield order estimation method. It is written in Rust with Python Bindings.

Contents

Installation

Pre-built packages for many Linux, Windows, and OSX systems are available in PyPI and can be installed with:

pip install vlmc

On uncommon architectures, you may need to first install Cargo before running pip install vlmc.

Compilation from source

In order to compile from source you will need to install Rust/Cargo and maturin for the python bindings. Maturin is best used within a Python virtual environment:

# activate your desired virtual environment first, then:
pip install maturin
git clone https://github.com/antonio-leitao/vlmc.git
cd vlmc
# build and install the package:
maturin develop --release

Usage

import vlmc
tree = vlmc.VLMC(max_depth, alphabet_size, n_jobs=-1)

Parameters:

  • max_depth: Maximum depth of tree. Subsequences whose length exceed the max_depth will not be considered nor counted.
  • alphabet_size: Total number of symbols in the alphabet. This number has to be bigger than the highest integer encountered, else it will cause runtime errors.
  • n_jobs: Number of subprocesses to spawn when running the vlmc. Choose -1 for using all available processes.

fit

Note fit method returns None and not self. This is by design as to not expose the rust object to python.

data = [
  [1,2,3],
  [2,3],
  [1,0,1],
  [2]
]

tree.fit(data)

Arguments:

  • data: List of lists containing sequences of discrete values to fit on. Values are assumed to be integers form 0 to alphabet_size. List is expected to be two dimensional.

get_suffix

Given a sequence, returns the longest suffix that is present in the VLMC.

suffix = tree.get_suffix(sequence)

Arguments:

  • sequence: list of integers representing a sequence of discrete varaibles.

Returns:

  • suffix : longest suffix of sequence that is present in the VLMC.

get_counts

Gets the total number of occurences of a given sequence of integers. Will throw a KeyError if the sequence is not a tree node. Consider using get_suffix to make sure to get a tree node.

counts = tree.get_counts(sequence)

Arguments:

  • sequence: list of integers representing a sequence of discrete varaibles.

Returns:

  • counts : integer

get_distribution

Gets the vector of probabilities over the entire alphabet for the given sequence. Will throw a KeyError if the sequence is not a tree node. Consider using get_suffix to make sure to get a tree node.

probabilities = tree.get_distribution(sequence)

Arguments:

  • sequence: list of integers representing a sequence of discrete variables.

Returns:

  • probabilities : list of floats representing the probability of observing a specific state (index) as the next symbol.

get_contexts

contexts = tree.get_contexts()

Returns:

  • contexts: list of relevant contexts according to the Peres-Shield tree prunning method. Contexts are ordered by length.

TODO

Paralelization

After experimentation the best possible idea for paralelization would be to create different hashmaps for each sunsequence length. Hashmaps are then joined from longest to smallest. The hashmap at max_depth + 1 can be discarded after. Could be very fast depending on merging algo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlmc-0.2.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distributions

vlmc-0.2.0-cp37-abi3-win_amd64.whl (144.6 kB view details)

Uploaded CPython 3.7+ Windows x86-64

vlmc-0.2.0-cp37-abi3-win32.whl (143.1 kB view details)

Uploaded CPython 3.7+ Windows x86

vlmc-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

vlmc-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.3 MB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ s390x

vlmc-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ppc64le

vlmc-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ARMv7l

vlmc-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ARM64

vlmc-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl (1.1 MB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.5+ i686

vlmc-0.2.0-cp37-abi3-macosx_11_0_arm64.whl (263.7 kB view details)

Uploaded CPython 3.7+ macOS 11.0+ ARM64

vlmc-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl (274.5 kB view details)

Uploaded CPython 3.7+ macOS 10.7+ x86-64

File details

Details for the file vlmc-0.2.0.tar.gz.

File metadata

  • Download URL: vlmc-0.2.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.0.1

File hashes

Hashes for vlmc-0.2.0.tar.gz
Algorithm Hash digest
SHA256 97b3543787fb608fe18ddd3b4c806efe8fb486fc1066461d4e03f5b4d4889571
MD5 ef7248b77f63a7220539ba3be1681a44
BLAKE2b-256 d587a4b3b677d61af6392e747b9a005c5143e42388114318e31b31de17ebfbdd

See more details on using hashes here.

File details

Details for the file vlmc-0.2.0-cp37-abi3-win_amd64.whl.

File metadata

  • Download URL: vlmc-0.2.0-cp37-abi3-win_amd64.whl
  • Upload date:
  • Size: 144.6 kB
  • Tags: CPython 3.7+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.0.1

File hashes

Hashes for vlmc-0.2.0-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1333ba076280e4d66e25b052b198f807edb86f41848e28e201f4aff4b50f5d5f
MD5 1e24cc42683d8f0fd364c1a3cbcd33f9
BLAKE2b-256 e07a7cbc207ae0c656e3a8a14d95ee496568a32dfdd0aad980183eaeca038201

See more details on using hashes here.

File details

Details for the file vlmc-0.2.0-cp37-abi3-win32.whl.

File metadata

  • Download URL: vlmc-0.2.0-cp37-abi3-win32.whl
  • Upload date:
  • Size: 143.1 kB
  • Tags: CPython 3.7+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.0.1

File hashes

Hashes for vlmc-0.2.0-cp37-abi3-win32.whl
Algorithm Hash digest
SHA256 28a0b37965c5732858493e0eaf5b062ae596045f706edc12cf802d0c9ec418f8
MD5 ac66fdec8007146f540414701cd69b53
BLAKE2b-256 25c422d7acbf9420ee022f7326dad25b45a06c43fca8af0828c7b740248a540e

See more details on using hashes here.

File details

Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vlmc-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d811af5a679980e64a8b90214a55f1538f710660ac9dfeaa802aede631871828
MD5 09e7e391bef3c6a6d946194dae6e9f5d
BLAKE2b-256 09319d1b3860036c09f0ccb52d67f786f7a6877ee7f7eed10c8a9fc2a99432f9

See more details on using hashes here.

File details

Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for vlmc-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 f840b7909afb3df6b3c1d424c6994339169106a2a8ebba3e3ece47d5fa6995ad
MD5 bd363d38a59170dac5175a262a811153
BLAKE2b-256 0e0be75f2c3cc1def7dff941d33ee477de2fb1a0fe7bfcfd9b0fb8b17ad7f664

See more details on using hashes here.

File details

Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for vlmc-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 959fd2d844dabfd649efc9b35f89947a48dba506188fd8797ebd97cc892589be
MD5 8ee302a0d5b8393d9811102d47120a0a
BLAKE2b-256 00e79752ce24dced4bfd24a7de953286dd7b6c92e8122a6be94de372bfcfdbd9

See more details on using hashes here.

File details

Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for vlmc-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 b1288e1fc720ba68339baf89dd5de3d29a8bc72e0fcc136b350078b3d16d58e4
MD5 41e6857c1f40a6daaf9cba6614544e2f
BLAKE2b-256 a8c6cea5ce14c90cee5739f6e259e10cf90ddd6f9b672892a9ce73ad819b884b

See more details on using hashes here.

File details

Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vlmc-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3c344f2482017f07c83d550b3288fec59f5133500edf76e5ede036aa0bcaed06
MD5 892687e4ac3e18d52f91f53640bc3f27
BLAKE2b-256 02c5b0c1c67e7ec79deacdb0b085f6d20d90380490debfbb0f3681ea93a11980

See more details on using hashes here.

File details

Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl.

File metadata

File hashes

Hashes for vlmc-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm Hash digest
SHA256 97ed9dbd341a47250eabb863984faa279639a492e95d4f75fe727684c26b2dbb
MD5 98ff3a83dc31c11635de09931d7c30ec
BLAKE2b-256 174dd28ad76ab2ca0626d37bac27d62c7b5777cdb03c535f307f9bc718277c5e

See more details on using hashes here.

File details

Details for the file vlmc-0.2.0-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for vlmc-0.2.0-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c7d0a4bc012de1fcb3b52077cdb573afd828ffbbdf9f43dbcbcebbbf30286148
MD5 521e23505698cb002ac68e723cbb97c9
BLAKE2b-256 e1ce97e9cab2f06a580b7735ea023844c77deee3dd33aafd0e19dac7acf4f9ca

See more details on using hashes here.

File details

Details for the file vlmc-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for vlmc-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 486559533902402343611df85c7dfb3c90b26a974747424ca6e4bcf68d20ed0a
MD5 a6f50696ea26e2984ed0e901689bad5b
BLAKE2b-256 8344794f063f438a7d1f12bdb9543c3d8160625c2493beca1ff24c667891a7fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page