Skip to main content

Succinct BWT-Based SequencePrediction

Project description

Succinct BWT-Based Sequence Prediction (Subseq)

What is it ?

This project is a c++ implementation with a python wrapper of the Succinct BWT-Based Sequence Prediction model.

Subseq is a sequence prediction model in a finite alphabet. It is a lossless model (does not discard information while training) and utilizes the succinct Wavelet Tree data structure and the Burrows-Wheeler Transform to compactly store and efficiently access training sequences for prediction.

This implementation is based on the following research paper:

Installation

Installation from the sources. On Linux and mac, you can easilly install it via make build.

For Windows users, the installation is trickier.

Simple example

You can test the model with the following code:

from subseq.subseq import Subseq
model = Subseq(1)

model.fit([['hello', 'world']])

model.predict(['hello'])
# Output: ['world']

Features

Train

The model can be trained with the fit method.

Tuning

Subseq has only 1 meta parameter that need to be tuned. threshold_query, the number of similar queries that needs to be retrieved to make a confident prediction.

A threshold_query at 0 does not limit the number of query.

Benchmark

The benchmark has been made on the FIFA dataset, the data can be found on the SPMF website.

Details on the benchmark can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

subseq-0.0.0-cp38-cp38-manylinux2010_x86_64.whl (974.1 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

subseq-0.0.0-cp38-cp38-manylinux2010_i686.whl (938.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

subseq-0.0.0-cp38-cp38-macosx_10_9_x86_64.whl (72.2 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

subseq-0.0.0-cp37-cp37m-manylinux2010_x86_64.whl (958.2 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

subseq-0.0.0-cp37-cp37m-manylinux2010_i686.whl (922.7 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

subseq-0.0.0-cp37-cp37m-macosx_10_9_x86_64.whl (72.1 kB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

subseq-0.0.0-cp36-cp36m-manylinux2010_x86_64.whl (953.5 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

subseq-0.0.0-cp36-cp36m-manylinux2010_i686.whl (917.3 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ i686

subseq-0.0.0-cp36-cp36m-macosx_10_9_x86_64.whl (72.9 kB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

subseq-0.0.0-cp35-cp35m-manylinux2010_x86_64.whl (952.4 kB view details)

Uploaded CPython 3.5m manylinux: glibc 2.12+ x86-64

subseq-0.0.0-cp35-cp35m-manylinux2010_i686.whl (916.6 kB view details)

Uploaded CPython 3.5m manylinux: glibc 2.12+ i686

subseq-0.0.0-cp35-cp35m-macosx_10_9_x86_64.whl (71.7 kB view details)

Uploaded CPython 3.5m macOS 10.9+ x86-64

File details

Details for the file subseq-0.0.0-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: subseq-0.0.0-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 974.1 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7

File hashes

Hashes for subseq-0.0.0-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 06095c01f6678d50092cd5a0c148da09e87a7470bb3488014e5a05cbda52e1bc
MD5 73a316250a74f848bed19418219eeb6f
BLAKE2b-256 b104653eedb0672ad9ee229d59e0182788b457f5f82aab76e6c57b946acd2634

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp38-cp38-manylinux2010_i686.whl.

File metadata

  • Download URL: subseq-0.0.0-cp38-cp38-manylinux2010_i686.whl
  • Upload date:
  • Size: 938.3 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7

File hashes

Hashes for subseq-0.0.0-cp38-cp38-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 ca54c016895123f0265f86dc547977cff51ccc7eabed9effb41c93276495dc07
MD5 f794667aabe9b8620f4e96b2adc46599
BLAKE2b-256 6873b43dd8319045067dae89979e3d38e5647e8aed45ed6004adff7058c0614b

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: subseq-0.0.0-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 72.2 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.4

File hashes

Hashes for subseq-0.0.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 4350781c09b062e86515db221e3599343226fcc8158e5f412a903b1ec232c549
MD5 60999fff196a2f8aa980fdc3a1abf5af
BLAKE2b-256 34e1243f76ba90bcaecba532e8d14e34be138bf14d06733a10022574054e564a

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: subseq-0.0.0-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 958.2 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7

File hashes

Hashes for subseq-0.0.0-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 79af3593cb7d5d9d04cf2ce91c86e62da281fff3b62f5d8776cb2cba4c1049ef
MD5 fd03398cc0a5e4a0f88c6af22a7d1268
BLAKE2b-256 bd611771358059e0bbaf1484c60f3cae913ebbbca9be180aea81222c5103ca26

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp37-cp37m-manylinux2010_i686.whl.

File metadata

  • Download URL: subseq-0.0.0-cp37-cp37m-manylinux2010_i686.whl
  • Upload date:
  • Size: 922.7 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7

File hashes

Hashes for subseq-0.0.0-cp37-cp37m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 4bd161a33824a60c229bf02df352bc455c45542184a7a060343cb29d33dc9ecc
MD5 08329ca83bbbb7f8d4b210fc5c23e226
BLAKE2b-256 317db137e0367c8b4bac84b5f9acab0c20f0b652737b2a0253cd0907e4e49a98

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: subseq-0.0.0-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 72.1 kB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.4

File hashes

Hashes for subseq-0.0.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c6f67c6b6480fc4a1abcea4130810296aab09b6211162f80663d6dd7c366313b
MD5 71dfcd81b1312261c2fad38d65e2b18a
BLAKE2b-256 135719dbef93cb599d9bbb992191be745c5e20c56ae548cfa42a5e4c1d4c323c

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: subseq-0.0.0-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 953.5 kB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7

File hashes

Hashes for subseq-0.0.0-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5552c623bfe8afc25721728e75769e578173feade911d8c02f3714f5ebbaec2b
MD5 655655713a53c5700e0f1b50fe46f07f
BLAKE2b-256 7c070ce88da7242db2cdc34c98c4347c0b5216c762dc85d02958e1c24f81caad

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp36-cp36m-manylinux2010_i686.whl.

File metadata

  • Download URL: subseq-0.0.0-cp36-cp36m-manylinux2010_i686.whl
  • Upload date:
  • Size: 917.3 kB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7

File hashes

Hashes for subseq-0.0.0-cp36-cp36m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 c81e219f0c29e88e53fe539154114b2bcd46047232f4b4df1fce0692259f128b
MD5 bdc73b125b85dbfd711d025f68a501f3
BLAKE2b-256 3c9d1f970b3baefc510d3f839c07c076bce9f5a933107e41765c23e6889d3873

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: subseq-0.0.0-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 72.9 kB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.4

File hashes

Hashes for subseq-0.0.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ab5aa0d214b87670fe9e2c186f8e2010c00961d68b7e16de650a889ce482b1ed
MD5 33e3101070619a577656a9a40d8d5efe
BLAKE2b-256 f0ef4db3408b0a0158c925de30aba4689da7cbdfed95dc789c3e88ba9b35f7ef

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp35-cp35m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: subseq-0.0.0-cp35-cp35m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 952.4 kB
  • Tags: CPython 3.5m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7

File hashes

Hashes for subseq-0.0.0-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 16e302e780f3e05f30ec06bdefe97ab451d63e0b4eedd7c71e6fdfa28d84df8b
MD5 0251e233432668c78b17c7b48c91aa5d
BLAKE2b-256 e0f43335e0d72ea7813d82809e26d845486fbe54a3e6418dc22802eae807041a

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp35-cp35m-manylinux2010_i686.whl.

File metadata

  • Download URL: subseq-0.0.0-cp35-cp35m-manylinux2010_i686.whl
  • Upload date:
  • Size: 916.6 kB
  • Tags: CPython 3.5m, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7

File hashes

Hashes for subseq-0.0.0-cp35-cp35m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 caab50a07e0f698dcf7df68bd9adfd251aaf84ac8c319cc8ed580516264a39ff
MD5 8f0b251836b3e761990e5ad9ac9de7ad
BLAKE2b-256 eed8393ec0a7ee62861ec8c30cffebbae58c73b7c35678f939f97db5c9e04a56

See more details on using hashes here.

File details

Details for the file subseq-0.0.0-cp35-cp35m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: subseq-0.0.0-cp35-cp35m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 71.7 kB
  • Tags: CPython 3.5m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.4

File hashes

Hashes for subseq-0.0.0-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 067fc74ea8fe9c612f4fac10d32d642db49b285fbe9dfdadb1a513f504f0cc24
MD5 a6b7b4bf68cc36bcc74c8cc6ff934c5d
BLAKE2b-256 4dce68b4adaf796e18d566ede7a193c392fdf5b42796326d7842e71685b9d7f5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page