Variable Length Markov Chain
Project description
Variable Length Markov Model (VLMC)
Implementation of Variable Length Markov Chains (VLMC) for Python. Suffix tree building is done top-down using the Peres-Shield order estimation method. It is written in Rust with Python Bindings.
Contents
Installation
Pre-built packages for many Linux, Windows, and OSX systems are available in PyPI and can be installed with:
pip install vlmc
On uncommon architectures, you may need to first
install Cargo before running pip install vlmc.
Compilation from source
In order to compile from source you will need to install Rust/Cargo and maturin for the python bindings. Maturin is best used within a Python virtual environment:
# activate your desired virtual environment first, then:
pip install maturin
git clone https://github.com/antonio-leitao/vlmc.git
cd vlmc
# build and install the package:
maturin develop --release
Usage
import vlmc
tree = vlmc.VLMC(max_depth, alphabet_size, n_jobs=-1)
Parameters:
max_depth: Maximum depth of tree. Subsequences whose length exceed themax_depthwill not be considered nor counted.alphabet_size: Total number of symbols in the alphabet. This number has to be bigger than the highest integer encountered, else it will cause runtime errors.n_jobs: Number of subprocesses to spawn when running the vlmc. Choose-1for using all available processes.
fit
Note fit method returns
Noneand notself. This is by design as to not expose the rust object to python.
data = [
[1,2,3],
[2,3],
[1,0,1],
[2]
]
tree.fit(data)
Arguments:
data: List of lists containing sequences of discrete values to fit on. Values are assumed to be integers form0toalphabet_size. List is expected to be two dimensional.
get_suffix
Given a sequence, returns the longest suffix that is present in the VLMC.
suffix = tree.get_suffix(sequence)
Arguments:
sequence: list of integers representing a sequence of discrete varaibles.
Returns:
suffix: longest suffix of sequence that is present in the VLMC.
get_counts
Gets the total number of occurences of a given sequence of integers.
Will throw a KeyError if the sequence is not a tree node. Consider using get_suffix to make sure to get a tree node.
counts = tree.get_counts(sequence)
Arguments:
sequence: list of integers representing a sequence of discrete varaibles.
Returns:
counts: integer
get_distribution
Gets the vector of probabilities over the entire alphabet for the given sequence.
Will throw a KeyError if the sequence is not a tree node. Consider using get_suffix to make sure to get a tree node.
probabilities = tree.get_distribution(sequence)
Arguments:
sequence: list of integers representing a sequence of discrete variables.
Returns:
probabilities: list of floats representing the probability of observing a specific state (index) as the next symbol.
get_contexts
contexts = tree.get_contexts()
Returns:
contexts: list of relevant contexts according to the Peres-Shield tree prunning method. Contexts are ordered by length.
TODO
Paralelization
After experimentation the best possible idea for paralelization would be to create different hashmaps for each sunsequence length.
Hashmaps are then joined from longest to smallest.
The hashmap at max_depth + 1 can be discarded after.
Could be very fast depending on merging algo.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vlmc-0.2.0.tar.gz.
File metadata
- Download URL: vlmc-0.2.0.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97b3543787fb608fe18ddd3b4c806efe8fb486fc1066461d4e03f5b4d4889571
|
|
| MD5 |
ef7248b77f63a7220539ba3be1681a44
|
|
| BLAKE2b-256 |
d587a4b3b677d61af6392e747b9a005c5143e42388114318e31b31de17ebfbdd
|
File details
Details for the file vlmc-0.2.0-cp37-abi3-win_amd64.whl.
File metadata
- Download URL: vlmc-0.2.0-cp37-abi3-win_amd64.whl
- Upload date:
- Size: 144.6 kB
- Tags: CPython 3.7+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1333ba076280e4d66e25b052b198f807edb86f41848e28e201f4aff4b50f5d5f
|
|
| MD5 |
1e24cc42683d8f0fd364c1a3cbcd33f9
|
|
| BLAKE2b-256 |
e07a7cbc207ae0c656e3a8a14d95ee496568a32dfdd0aad980183eaeca038201
|
File details
Details for the file vlmc-0.2.0-cp37-abi3-win32.whl.
File metadata
- Download URL: vlmc-0.2.0-cp37-abi3-win32.whl
- Upload date:
- Size: 143.1 kB
- Tags: CPython 3.7+, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28a0b37965c5732858493e0eaf5b062ae596045f706edc12cf802d0c9ec418f8
|
|
| MD5 |
ac66fdec8007146f540414701cd69b53
|
|
| BLAKE2b-256 |
25c422d7acbf9420ee022f7326dad25b45a06c43fca8af0828c7b740248a540e
|
File details
Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: vlmc-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d811af5a679980e64a8b90214a55f1538f710660ac9dfeaa802aede631871828
|
|
| MD5 |
09e7e391bef3c6a6d946194dae6e9f5d
|
|
| BLAKE2b-256 |
09319d1b3860036c09f0ccb52d67f786f7a6877ee7f7eed10c8a9fc2a99432f9
|
File details
Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.
File metadata
- Download URL: vlmc-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ s390x
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f840b7909afb3df6b3c1d424c6994339169106a2a8ebba3e3ece47d5fa6995ad
|
|
| MD5 |
bd363d38a59170dac5175a262a811153
|
|
| BLAKE2b-256 |
0e0be75f2c3cc1def7dff941d33ee477de2fb1a0fe7bfcfd9b0fb8b17ad7f664
|
File details
Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.
File metadata
- Download URL: vlmc-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ppc64le
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
959fd2d844dabfd649efc9b35f89947a48dba506188fd8797ebd97cc892589be
|
|
| MD5 |
8ee302a0d5b8393d9811102d47120a0a
|
|
| BLAKE2b-256 |
00e79752ce24dced4bfd24a7de953286dd7b6c92e8122a6be94de372bfcfdbd9
|
File details
Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.
File metadata
- Download URL: vlmc-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ARMv7l
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1288e1fc720ba68339baf89dd5de3d29a8bc72e0fcc136b350078b3d16d58e4
|
|
| MD5 |
41e6857c1f40a6daaf9cba6614544e2f
|
|
| BLAKE2b-256 |
a8c6cea5ce14c90cee5739f6e259e10cf90ddd6f9b672892a9ce73ad819b884b
|
File details
Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: vlmc-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c344f2482017f07c83d550b3288fec59f5133500edf76e5ede036aa0bcaed06
|
|
| MD5 |
892687e4ac3e18d52f91f53640bc3f27
|
|
| BLAKE2b-256 |
02c5b0c1c67e7ec79deacdb0b085f6d20d90380490debfbb0f3681ea93a11980
|
File details
Details for the file vlmc-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl.
File metadata
- Download URL: vlmc-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.7+, manylinux: glibc 2.5+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97ed9dbd341a47250eabb863984faa279639a492e95d4f75fe727684c26b2dbb
|
|
| MD5 |
98ff3a83dc31c11635de09931d7c30ec
|
|
| BLAKE2b-256 |
174dd28ad76ab2ca0626d37bac27d62c7b5777cdb03c535f307f9bc718277c5e
|
File details
Details for the file vlmc-0.2.0-cp37-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: vlmc-0.2.0-cp37-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 263.7 kB
- Tags: CPython 3.7+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7d0a4bc012de1fcb3b52077cdb573afd828ffbbdf9f43dbcbcebbbf30286148
|
|
| MD5 |
521e23505698cb002ac68e723cbb97c9
|
|
| BLAKE2b-256 |
e1ce97e9cab2f06a580b7735ea023844c77deee3dd33aafd0e19dac7acf4f9ca
|
File details
Details for the file vlmc-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl.
File metadata
- Download URL: vlmc-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl
- Upload date:
- Size: 274.5 kB
- Tags: CPython 3.7+, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.0.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
486559533902402343611df85c7dfb3c90b26a974747424ca6e4bcf68d20ed0a
|
|
| MD5 |
a6f50696ea26e2984ed0e901689bad5b
|
|
| BLAKE2b-256 |
8344794f063f438a7d1f12bdb9543c3d8160625c2493beca1ff24c667891a7fb
|