Skip to main content

Succinct BWT-Based SequencePrediction

Project description

Succinct BWT-Based Sequence Prediction (Subseq)

What is it ?

This project is a c++ implementation with a python wrapper of the Succinct BWT-Based Sequence Prediction model.

Subseq is a sequence prediction model in a finite alphabet. It is a lossless model (does not discard information while training) and utilizes the succinct Wavelet Tree data structure and the Burrows-Wheeler Transform to compactly store and efficiently access training sequences for prediction.

This implementation is based on the following research paper:

Installation

Subseq is published on pypi. pip install subseq should be enough.

Simple example

You can test the model with the following code:

from subseq.subseq import Subseq
model = Subseq(1)

model.fit([['hello', 'world']])

model.predict(['hello'])
# Output: ['world']

Features

Train

The model can be trained with the fit method.

Tuning

Subseq has only 1 meta parameter that need to be tuned. threshold_query, the number of similar queries that needs to be retrieved to make a confident prediction.

A threshold_query at 0 does not limit the number of query.

Benchmark

The benchmark has been made on the FIFA dataset, the data can be found on the SPMF website.

Details on the benchmark can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subseq-1.0.4.tar.gz (321.3 kB view hashes)

Uploaded Source

Built Distributions

subseq-1.0.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

subseq-1.0.4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (1.7 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

subseq-1.0.4-cp311-cp311-macosx_11_0_arm64.whl (96.6 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

subseq-1.0.4-cp311-cp311-macosx_10_9_x86_64.whl (106.0 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

subseq-1.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

subseq-1.0.4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (1.7 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

subseq-1.0.4-cp310-cp310-macosx_11_0_arm64.whl (98.4 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

subseq-1.0.4-cp310-cp310-macosx_10_9_x86_64.whl (107.5 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

subseq-1.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

subseq-1.0.4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (1.7 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

subseq-1.0.4-cp39-cp39-macosx_11_0_arm64.whl (99.2 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

subseq-1.0.4-cp39-cp39-macosx_10_9_x86_64.whl (108.7 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

subseq-1.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

subseq-1.0.4-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (1.7 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

subseq-1.0.4-cp38-cp38-macosx_11_0_arm64.whl (98.8 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

subseq-1.0.4-cp38-cp38-macosx_10_9_x86_64.whl (108.1 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page