Succinct BWT-Based SequencePrediction
Project description
Succinct BWT-Based Sequence Prediction (Subseq)
What is it ?
This project is a c++ implementation with a python wrapper of the Succinct BWT-Based Sequence Prediction
model.
Subseq is a sequence prediction model in a finite alphabet. It is a lossless model (does not discard information while training) and utilizes the succinct Wavelet Tree data structure and the Burrows-Wheeler Transform to compactly store and efficiently access training sequences for prediction.
This implementation is based on the following research paper:
Installation
Installation from the sources.
On Linux and mac, you can easilly install it via make build
.
For Windows users, the installation is trickier.
Simple example
You can test the model with the following code:
from subseq.subseq import Subseq
model = Subseq(1)
model.fit([['hello', 'world']])
model.predict(['hello'])
# Output: ['world']
Features
Train
The model can be trained with the fit
method.
Tuning
Subseq has only 1 meta parameter that need to be tuned. threshold_query
, the number of similar queries that needs to be retrieved to make a confident prediction.
A threshold_query
at 0 does not limit the number of query.
Benchmark
The benchmark has been made on the FIFA dataset, the data can be found on the SPMF website.
Details on the benchmark can be found here.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file subseq-1.0.0-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 55.1 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65325bf75c81e9ce93e239b47454f7a1c3842221841a91df3069f5a5c5b4d219 |
|
MD5 | 2b2a606373d29799c8caa10ad5307cf6 |
|
BLAKE2b-256 | d8b8ceec850e319a8fa6110f45113b861b1c4c9dd2e73d81cc17f30e6df17eb0 |
File details
Details for the file subseq-1.0.0-cp38-cp38-win32.whl
.
File metadata
- Download URL: subseq-1.0.0-cp38-cp38-win32.whl
- Upload date:
- Size: 45.9 kB
- Tags: CPython 3.8, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d3b2d915ad89b6e64c71a53b961e9787761f680221c7a00ea99a33c5fe3f227 |
|
MD5 | 45a3560291f5e8aca34f8c4a75e3e2b2 |
|
BLAKE2b-256 | fc5446ce6b979e055189d8faa4457ace622a633d21c2f4881f59d5e0a0679dd2 |
File details
Details for the file subseq-1.0.0-cp38-cp38-manylinux2010_x86_64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp38-cp38-manylinux2010_x86_64.whl
- Upload date:
- Size: 974.1 kB
- Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa0cd7fdb60511b923715c61fac40610edff46f8fa890c4274b7cc4d7d5079ae |
|
MD5 | 8d1e33a56bc8ffe7fb062a4377c65c27 |
|
BLAKE2b-256 | 512ed386d4c6b1c95b7503153b058fd130191a6b0edacced97279ef41a9712ff |
File details
Details for the file subseq-1.0.0-cp38-cp38-manylinux2010_i686.whl
.
File metadata
- Download URL: subseq-1.0.0-cp38-cp38-manylinux2010_i686.whl
- Upload date:
- Size: 938.3 kB
- Tags: CPython 3.8, manylinux: glibc 2.12+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 540508d8868f70f12b09b6b4f15d28fcf6564574001cd7c3a9283d1cd40183cd |
|
MD5 | 7f4e78ce3d01c1dbc2ea6b334a75aaf8 |
|
BLAKE2b-256 | f4519d91e532a639157062f406e408b960774e5aecf59a9aa701cc86467754b5 |
File details
Details for the file subseq-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl
- Upload date:
- Size: 72.2 kB
- Tags: CPython 3.8, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09298f226753711ab221a1abec53f7a71debcbb2c45ca1d02c1d14810ce9bc5b |
|
MD5 | 24e6ddabb7c5c7b1f96102493819159e |
|
BLAKE2b-256 | 37e73ac4ad0515d045517097ecd99160bcb39c1e3f4480aada4a52db746a421f |
File details
Details for the file subseq-1.0.0-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 54.4 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8939e34fc3f86797f324d3bc8bdb9dbac8c71bde01c51b4d5304c66bfff55ce9 |
|
MD5 | 97ed1e6242c581ae869cf4e23023b421 |
|
BLAKE2b-256 | 9f4178a07f5f46cf52d7165cba6db0688053aba161c5e2e25d0b46864abe6a32 |
File details
Details for the file subseq-1.0.0-cp37-cp37m-win32.whl
.
File metadata
- Download URL: subseq-1.0.0-cp37-cp37m-win32.whl
- Upload date:
- Size: 45.1 kB
- Tags: CPython 3.7m, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5830fde7a44ed7a192e380f1ee320a6eea8780b1e00da421b95fbf1ce397bdd8 |
|
MD5 | 62b012394a57b157a7867d6f7f4430d9 |
|
BLAKE2b-256 | 592696b0eab0b5f10d902ce78d30a9ad413ab887d23d4978d37952ff42732164 |
File details
Details for the file subseq-1.0.0-cp37-cp37m-manylinux2010_x86_64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp37-cp37m-manylinux2010_x86_64.whl
- Upload date:
- Size: 958.2 kB
- Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b186b802f842eae2195b3aa0590763c19d7b1a5c082c5b9edc19e6d14493c18 |
|
MD5 | 9126a11228036bda0766b42befb7f83b |
|
BLAKE2b-256 | c7d5c930a97848ce65c0a34af3e976c64585170af9d1f896186f1005f71fb780 |
File details
Details for the file subseq-1.0.0-cp37-cp37m-manylinux2010_i686.whl
.
File metadata
- Download URL: subseq-1.0.0-cp37-cp37m-manylinux2010_i686.whl
- Upload date:
- Size: 922.6 kB
- Tags: CPython 3.7m, manylinux: glibc 2.12+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | afd4a067f60088bd52d797eecf98d96edf7b098eceed54ecd30a4f5c86dd897a |
|
MD5 | ec2fb2458da7a09b0ad6b6b02152aa92 |
|
BLAKE2b-256 | 84873faa7d7641ec4c81f41cde96f74865ba0c2761f516a38a7bec30e1254fb7 |
File details
Details for the file subseq-1.0.0-cp37-cp37m-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp37-cp37m-macosx_10_9_x86_64.whl
- Upload date:
- Size: 72.1 kB
- Tags: CPython 3.7m, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6703517bafe24857bdf55b56731a5ce18a68a912175ed4bc7b7f53d1398ebf3a |
|
MD5 | 4389a05f22efed912e0c02ae5f7e4557 |
|
BLAKE2b-256 | e501e29c15c3ff19e4df3f5ddb4b0d35a62f447910f7b8fc84eec141c5f03f78 |
File details
Details for the file subseq-1.0.0-cp36-cp36m-win_amd64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp36-cp36m-win_amd64.whl
- Upload date:
- Size: 54.3 kB
- Tags: CPython 3.6m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dc0681ff49af2e42774d11a88f31ad2c813a6001c3b4b55275e5652132683d0 |
|
MD5 | e404da399e6212eddd9faec53776c01e |
|
BLAKE2b-256 | baf0b1b3d3cf9316ad4183bd060e87ba12c8fd867090c644b801e0aa205936e4 |
File details
Details for the file subseq-1.0.0-cp36-cp36m-win32.whl
.
File metadata
- Download URL: subseq-1.0.0-cp36-cp36m-win32.whl
- Upload date:
- Size: 45.2 kB
- Tags: CPython 3.6m, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a679d3e26d637270bd4abbc52140b5df7c39d8afa25106c921c0e79891f76bad |
|
MD5 | 522c696624a7ed1ead25007c8c27cc9a |
|
BLAKE2b-256 | 150132436c5a26097f8b06d6c797cee6059fe034dfd927ab9090734f1166ad50 |
File details
Details for the file subseq-1.0.0-cp36-cp36m-manylinux2010_x86_64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp36-cp36m-manylinux2010_x86_64.whl
- Upload date:
- Size: 953.6 kB
- Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 889f36df4ddec970d80a4119b25a6b3ba055f93d907bf1fa866986cc43f3b97e |
|
MD5 | 6ab059afd9a39131f36eaec09354b336 |
|
BLAKE2b-256 | 50cdc536464e60996b34beb75538653eefcb68353bbe42b10c5dd8a062220501 |
File details
Details for the file subseq-1.0.0-cp36-cp36m-manylinux2010_i686.whl
.
File metadata
- Download URL: subseq-1.0.0-cp36-cp36m-manylinux2010_i686.whl
- Upload date:
- Size: 917.3 kB
- Tags: CPython 3.6m, manylinux: glibc 2.12+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c93cf2f051a4616ac0336936d718f567569333a8034d0a6bd6186a37765cee05 |
|
MD5 | 186b57dcba38f7272bee64b58c9d4107 |
|
BLAKE2b-256 | c040e36e63c159f34d15a26bbd0fede927bfce6ba06d728072d53abd01b2b3e7 |
File details
Details for the file subseq-1.0.0-cp36-cp36m-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp36-cp36m-macosx_10_9_x86_64.whl
- Upload date:
- Size: 72.9 kB
- Tags: CPython 3.6m, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | edd18a399976b3978d8d51a8ca8509ea9c65da72ad7f5fd030e734d6b11a45a2 |
|
MD5 | 5885f4b86224f7dadef4f65570ef75ef |
|
BLAKE2b-256 | 9415bb8be67e17f3d8aeb21f5c4ae38f50e7ded61b71183b70e3216aa4a4510d |
File details
Details for the file subseq-1.0.0-cp35-cp35m-win_amd64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp35-cp35m-win_amd64.whl
- Upload date:
- Size: 54.0 kB
- Tags: CPython 3.5m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.5.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb9ad44ea2e9ca563cad2f35cd1224202b83daa3b39ee92d321b310a9886acee |
|
MD5 | 497185d475c17865841bf422e9ccd16a |
|
BLAKE2b-256 | bf1743803116864235a6afb8c14dbd026ef2a1b980cc5a3030dbfb7c0496fc25 |
File details
Details for the file subseq-1.0.0-cp35-cp35m-win32.whl
.
File metadata
- Download URL: subseq-1.0.0-cp35-cp35m-win32.whl
- Upload date:
- Size: 44.8 kB
- Tags: CPython 3.5m, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.3 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.5.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 983b1fc56e084b6ae293096b7b1ecb727a5e31da7224f0eba18105990fa65590 |
|
MD5 | dcbf76d388419ef8cd2d4fd0e1ec67a1 |
|
BLAKE2b-256 | e7e61a5f0b7831bf9a4b7ac36e3edc89f418e8339ca524ae2f771bd4f5c2fdae |
File details
Details for the file subseq-1.0.0-cp35-cp35m-manylinux2010_x86_64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp35-cp35m-manylinux2010_x86_64.whl
- Upload date:
- Size: 952.4 kB
- Tags: CPython 3.5m, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0313aeb9b4d4ec6ffab46671596f6306327f803776adebf7a66e5d55d05b9fef |
|
MD5 | af76c7ef25881df29f03fbf08105849a |
|
BLAKE2b-256 | 0b8227381f8f50f32cc0cdcd44c1c01d5cf900986d0979cf13672764daaf703d |
File details
Details for the file subseq-1.0.0-cp35-cp35m-manylinux2010_i686.whl
.
File metadata
- Download URL: subseq-1.0.0-cp35-cp35m-manylinux2010_i686.whl
- Upload date:
- Size: 916.7 kB
- Tags: CPython 3.5m, manylinux: glibc 2.12+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97d6071212be2623c934ac8a3cd1805006db49edb7a73f32982f92c8be8fb4d7 |
|
MD5 | e6bdbc278584ca0d376cd79daa47f8d3 |
|
BLAKE2b-256 | 9bebd6b25d40d880ec002efbcc02734f26ac6c9e60d7f126005d498e9379a0de |
File details
Details for the file subseq-1.0.0-cp35-cp35m-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: subseq-1.0.0-cp35-cp35m-macosx_10_9_x86_64.whl
- Upload date:
- Size: 71.7 kB
- Tags: CPython 3.5m, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68c4034fadd704df4e08cd144b685d2c58fcddab49f7f1b160bb174188f5b5bd |
|
MD5 | 31852841ca1c6d85d7d52fc6546c9e3a |
|
BLAKE2b-256 | 38cdd69b066e2de022f1b8c7681a8eaf31e05fda83ca2fa6466686865d41ea7c |