Skip to main content

Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction

Project description

CPT

Downloads License

What is it ?

This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.

CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet.

This implementation is based on the following research papers:

Installation

You can simply use pip install cpt.

Simple example

You can test the model with the following code:

from cpt.cpt import Cpt
model = Cpt()

model.fit([['hello', 'world'],
           ['hello', 'this', 'is', 'me'],
           ['hello', 'me']
          ])

model.predict([['hello'], ['hello', 'this']])
# Output: ['me', 'is']

For an example with the compatibility with sklearn, you should check the documentation.

Features

Train

The model can be trained with the fit method.

If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.

Multithreading

The predictions are launched by default with multithreading with OpenMP.

The predictions can also be launched in a single thread with the option multithread=False in the predict method.

You can control the number of threads by setting the following environment variable OMP_NUM_THREADS.

Pickling

You can pickle the model to save it, and load it later via pickle library.

from cpt.cpt import Cpt
import pickle


model = Cpt()
model.fit([['hello', 'world']])

dumped = pickle.dumps(model)

unpickled_model = pickle.loads(dumped)

print(model == unpickled_model)

Explainability

The CPT class has several methods to explain the predictions.

You can see which elements are considered as noise (with a low presence in sequences) with model.compute_noisy_items(noise_ratio).

You can retrieve trained sequences with model.retrieve_sequence(id).

You can find similar sequences with find_similar_sequences(sequence).

You can not yet retrieve automatically all similar sequences with the noise reduction technique.

Tuning

CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the documentation. To tune you can use the model_selection module from sklearn, you can find an example here on how to.

Benchmark

The benchmark has been made on the FIFA dataset, the data can be found on the SPMF website.

Using multithreading, CPT was able to perform around 5000 predictions per second.

Without multithreading, CPT predicted around 1650 sequences per second.

Details on the benchmark can be found here.

Further reading

A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset.

The study has been published in IJIKM review here. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss.

One of the co-author of CPT has also published an algorithm subseq for sequence prediction. An implementation can be found here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpt-1.3.1.tar.gz (115.9 kB view details)

Uploaded Source

Built Distributions

cpt-1.3.1-cp310-cp310m-win_amd64.whl (93.2 kB view details)

Uploaded CPython 3.10m Windows x86-64

cpt-1.3.1-cp310-cp310m-win32.whl (80.0 kB view details)

Uploaded CPython 3.10m Windows x86

cpt-1.3.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (844.1 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

cpt-1.3.1-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl (819.2 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ i686

cpt-1.3.1-cp310-cp310-macosx_10_9_x86_64.whl (103.6 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

cpt-1.3.1-cp39-cp39m-win_amd64.whl (95.4 kB view details)

Uploaded CPython 3.9m Windows x86-64

cpt-1.3.1-cp39-cp39m-win32.whl (81.5 kB view details)

Uploaded CPython 3.9m Windows x86

cpt-1.3.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (860.0 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

cpt-1.3.1-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl (829.2 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

cpt-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl (103.6 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

cpt-1.3.1-cp38-cp38m-win_amd64.whl (93.4 kB view details)

Uploaded CPython 3.8m Windows x86-64

cpt-1.3.1-cp38-cp38m-win32.whl (77.2 kB view details)

Uploaded CPython 3.8m Windows x86

cpt-1.3.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (875.2 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

cpt-1.3.1-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl (848.2 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

cpt-1.3.1-cp38-cp38-macosx_10_9_x86_64.whl (103.1 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

cpt-1.3.1-cp37-cp37m-win_amd64.whl (91.3 kB view details)

Uploaded CPython 3.7m Windows x86-64

cpt-1.3.1-cp37-cp37m-win32.whl (75.1 kB view details)

Uploaded CPython 3.7m Windows x86

cpt-1.3.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (832.4 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

cpt-1.3.1-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl (808.0 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

cpt-1.3.1-cp37-cp37m-macosx_10_9_x86_64.whl (103.6 kB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

File details

Details for the file cpt-1.3.1.tar.gz.

File metadata

  • Download URL: cpt-1.3.1.tar.gz
  • Upload date:
  • Size: 115.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cpt-1.3.1.tar.gz
Algorithm Hash digest
SHA256 c1e00985daff3ca6bff82189aedeaf7437da00c957950206d4c6f5d2453330d9
MD5 1c1ef1972a0ec412c4cc6cb1b494e5f9
BLAKE2b-256 23fbba5ecd79d1ead429679043a5cac01ae9f3fad0079973c195720cd4ad1871

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp310-cp310m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.3.1-cp310-cp310m-win_amd64.whl
  • Upload date:
  • Size: 93.2 kB
  • Tags: CPython 3.10m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for cpt-1.3.1-cp310-cp310m-win_amd64.whl
Algorithm Hash digest
SHA256 e0ba78b6ab8539b6b2dbd9be8fa8e65ed3826fd39f9b61f993ad632de64ae212
MD5 1d3651c4014b5fb72784bd89355e7529
BLAKE2b-256 40bcdfeb0754911aff201b47e6469470d42811f9f61b8fd5d55b7456d694d567

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp310-cp310m-win32.whl.

File metadata

  • Download URL: cpt-1.3.1-cp310-cp310m-win32.whl
  • Upload date:
  • Size: 80.0 kB
  • Tags: CPython 3.10m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for cpt-1.3.1-cp310-cp310m-win32.whl
Algorithm Hash digest
SHA256 c98ef9fd268e0e708831be9a852008dd479a655bf9220d83287ca7924284fb4c
MD5 2456b26f751f4ed8528baf6159acfeb1
BLAKE2b-256 590d81d0f58a9c8427a467e9d115209822f9ab53b56bde99286cd5fa3f920f67

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 17ad51939c1eeeff68d0842cd12d12faf3192a7888c9d47f6f6169b37ba4168c
MD5 aada4f1fd311e38683496525c1b4f3f0
BLAKE2b-256 05579d5e3d3320090254cd1b56dd3d48ae11689212797bf4e2db93a2611b82e6

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 ff0626ad3a740e0138ac14cc23204c49b058531a904e820f1b648ad655d00b34
MD5 9ff10b501b4bd966074a122656ba82fa
BLAKE2b-256 3d4b9566b5f77d6e2770c6b5323147948cfbd4c9ecac9e55085528f18ea73fa7

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c14205c1bf677d785af3a756542cc4ca86cf3a9e1c4cb63f8b67cbdb8aed21a1
MD5 3f326638cd7980941c7f35c49ef907ac
BLAKE2b-256 f61822b6d15463a45f57e05b5b74c1becd729b3f024958dbda7d98430d9eccef

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp39-cp39m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.3.1-cp39-cp39m-win_amd64.whl
  • Upload date:
  • Size: 95.4 kB
  • Tags: CPython 3.9m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for cpt-1.3.1-cp39-cp39m-win_amd64.whl
Algorithm Hash digest
SHA256 ba37bf4aee78fccb45f4dce3af36a4c1fdc69313436f3b7defe135817ef75d71
MD5 d0e7ac08163b52b3059ed1ab70c9d696
BLAKE2b-256 79755c1e3ee5afb65ec94da96a4888b4dc02beeb0ecf79a3e487f5219850ad8f

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp39-cp39m-win32.whl.

File metadata

  • Download URL: cpt-1.3.1-cp39-cp39m-win32.whl
  • Upload date:
  • Size: 81.5 kB
  • Tags: CPython 3.9m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for cpt-1.3.1-cp39-cp39m-win32.whl
Algorithm Hash digest
SHA256 c5618df9d30819aebae08a121aa4510062ce8fcb8af7e49f16b65774fa9868cd
MD5 65c0a6cd78ccf2477666a373220f965b
BLAKE2b-256 4058f5c1d0e54f8ba6ca2d96dfb146fb728a4bd205b9d197a9d1d8cd368e9584

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 fb2a80652f6d79744360a4e2c430abbea100dda42d68b32a860d26c4a4b76299
MD5 c471df4c8df6e159263c98f58318a980
BLAKE2b-256 c2f119ec898e2450c2a223ef0647a316752fef3e78f2c9f2f0acde87d87858f8

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 c7f7de16bed9e1f2c98209e37bd64996db36751de1d4bce4e26d5e505cc5c4f8
MD5 b3228ae786e4e2b6055e5051e7dc5853
BLAKE2b-256 fbae628675f1858713db2dcbed86a715e2bae2b1191c50685fe9d7979e5b2249

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 981fc628791f513b2d7147a71393c779fa38234d3179525ad93013c8850c1d49
MD5 f260952d0f7e0bc3e24eee4dc190f5b9
BLAKE2b-256 c576f0fa5dd52fb91ae4ac9906dfc16f44f78ce139b9c049d1af28f903bac56f

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp38-cp38m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.3.1-cp38-cp38m-win_amd64.whl
  • Upload date:
  • Size: 93.4 kB
  • Tags: CPython 3.8m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.0

File hashes

Hashes for cpt-1.3.1-cp38-cp38m-win_amd64.whl
Algorithm Hash digest
SHA256 5293022175c32cca81c479c777a951605b33d27b108b6f2f5d6a20c9e15324d6
MD5 f5dee242be895d57c8202b6ada75135a
BLAKE2b-256 2dd38aa6c7f77706be72026f8a6bb21cedb608dea3f2e5f0f84c3b060109a43c

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp38-cp38m-win32.whl.

File metadata

  • Download URL: cpt-1.3.1-cp38-cp38m-win32.whl
  • Upload date:
  • Size: 77.2 kB
  • Tags: CPython 3.8m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.0

File hashes

Hashes for cpt-1.3.1-cp38-cp38m-win32.whl
Algorithm Hash digest
SHA256 e9f577cc692e25d6058d9e6155ed268583c54d4d612c234ec5fc790cff32e631
MD5 932dc235a870d04f8e3c1ff31e7306ce
BLAKE2b-256 04b20876af426215f137169530985d9f65737c944c5b3b967ac52adcbeb598ac

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5b485e29dd1811dea65fa2cb180130580397792bd3f78708cc9f212cca8c8e38
MD5 9ddb2389cd5e97937c11f994d0167574
BLAKE2b-256 edc13a06b072c14f5cbed32177f83dac1d21f1188a63e5401bb413b520dd7724

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 b5b36ca748188d9c992db682ed075a103e2da1c6d1d9b9395c284117c1a4beab
MD5 8e7c7334cf15d06cf704842af6ccd000
BLAKE2b-256 c22e1044271ba0961591a2dcac455d7f3123aecaab76f333f09a7f2a19b977e0

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 aa1abd6016f80eb04349a6f3775611a6f0d4ecdd8a06b1fc27be6aed452d9623
MD5 e17b7c5a72b01b4b3c65793a328ea624
BLAKE2b-256 3edeceaadfb947b47163c758c21de6f0acc2028cf5c4bbde550d3b3a6fa934c7

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.3.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 91.3 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.5

File hashes

Hashes for cpt-1.3.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 b11fea278075b8bc0a7250001dda8fe0ec0dee2bad1cf2d214006fce0a51c4f6
MD5 e263e9d156c74cc36f545cbc020cfe23
BLAKE2b-256 8db462a46b1a6e00c93de053bacfee4402127c1f13f12f634f0a1b4c861c81d5

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp37-cp37m-win32.whl.

File metadata

  • Download URL: cpt-1.3.1-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 75.1 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.5

File hashes

Hashes for cpt-1.3.1-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 39e9d57d55b42004391c49ce970b335164809cd7ca295c797e96e9c7830c4893
MD5 e0f9db4790813dafaa975afb26251e2a
BLAKE2b-256 8d3e04c8fa9f3b7af47f25e838cfec24c41d479d4bd25d09d599e707cc2bb63f

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 19199a1e432d374c42839216656db399b60da2308ee8bdbbf8eae22875ab3fc2
MD5 a7289dbf38be564a302394ecbb08207e
BLAKE2b-256 c3af38749386f58806a0f29f208a6fcdf91902a503dcda147dee972f51eb40db

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 8378b767a25a68db162c92e0899e0285a5da87fd0dc1284b634dbdc6a1821eb1
MD5 0ac7300b7a145d9225e126723219128a
BLAKE2b-256 8b0745e563e562621fd78bcb00ffdbe21a3f2fa29153bd3f3ecdf571b98c3382

See more details on using hashes here.

File details

Details for the file cpt-1.3.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ac1977e1011a6b95070666a2f29e74c3cf6d6d20b163fbd2174117055ff69aa1
MD5 2fcbfd98f7ca03657e3e6aac58ac0877
BLAKE2b-256 1f98e76f4a1602d13111dc139c829a76767dc1a912cb2d84964371060edeb56d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page