Skip to main content

Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction

Project description

CPT

Downloads License

What is it ?

This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.

CPT is a sequence prediction model. It is a highly explainable model specialized in predicting the next element of a sequence over a finite alphabet.

This implementation is based on the following research papers:

Installation

You can simply use pip install cpt.

Simple example

You can test the model with the following code:

from cpt.cpt import Cpt
model = Cpt()

model.fit([['hello', 'world'],
           ['hello', 'this', 'is', 'me'],
           ['hello', 'me']
          ])

model.predict([['hello'], ['hello', 'this']])
# Output: ['me', 'is']

For an example with the compatibility with sklearn, you should check the documentation.

Features

Train

The model can be trained with the fit method.

If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.

Multithreading

The predictions are launched by default with multithreading with OpenMP.

The predictions can also be launched in a single thread with the option multithread=False in the predict method.

You can control the number of threads by setting the following environment variable OMP_NUM_THREADS.

Pickling

You can pickle the model to save it, and load it later via pickle library.

from cpt.cpt import Cpt
import pickle


model = Cpt()
model.fit([['hello', 'world']])

dumped = pickle.dumps(model)

unpickled_model = pickle.loads(dumped)

print(model == unpickled_model)

Explainability

The CPT class has several methods to explain the predictions.

You can see which elements are considered as noise (with a low presence in sequences) with model.compute_noisy_items(noise_ratio).

You can retrieve trained sequences with model.retrieve_sequence(id).

You can find similar sequences with find_similar_sequences(sequence).

You can not yet retrieve automatically all similar sequences with the noise reduction technique.

Tuning

CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the documentation. To tune you can use the model_selection module from sklearn, you can find an example here on how to.

Benchmark

The benchmark has been made on the FIFA dataset, the data can be found on the SPMF website.

Using multithreading, CPT was able to perform around 5000 predictions per second.

Without multithreading, CPT predicted around 1650 sequences per second.

Details on the benchmark can be found here.

Further reading

A study has been made on how to reduce dataset size, and so training / testing time using PageRank on the dataset.

The study has been published in IJIKM review here. An overall performance improvement of 10-40% has been observed with this technique on the prediction time without any accuracy loss.

One of the co-author of CPT has also published an algorithm subseq for sequence prediction. An implementation can be found here

Support

If you enjoy the project and wish to support me, a buymeacoffee link is available.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpt-1.3.3.tar.gz (115.7 kB view details)

Uploaded Source

Built Distributions

cpt-1.3.3-cp311-cp311-win_amd64.whl (81.6 kB view details)

Uploaded CPython 3.11 Windows x86-64

cpt-1.3.3-cp311-cp311-win32.whl (72.8 kB view details)

Uploaded CPython 3.11 Windows x86

cpt-1.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (938.1 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

cpt-1.3.3-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (918.0 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686

cpt-1.3.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (923.3 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

cpt-1.3.3-cp311-cp311-macosx_10_9_x86_64.whl (371.5 kB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

cpt-1.3.3-cp310-cp310-win_amd64.whl (81.9 kB view details)

Uploaded CPython 3.10 Windows x86-64

cpt-1.3.3-cp310-cp310-win32.whl (73.4 kB view details)

Uploaded CPython 3.10 Windows x86

cpt-1.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (916.8 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

cpt-1.3.3-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (904.7 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686

cpt-1.3.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (902.6 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

cpt-1.3.3-cp310-cp310-macosx_10_9_x86_64.whl (374.2 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

cpt-1.3.3-cp39-cp39-win_amd64.whl (96.3 kB view details)

Uploaded CPython 3.9 Windows x86-64

cpt-1.3.3-cp39-cp39-win32.whl (82.3 kB view details)

Uploaded CPython 3.9 Windows x86

cpt-1.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (925.7 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

cpt-1.3.3-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (916.2 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

cpt-1.3.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (915.3 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

cpt-1.3.3-cp39-cp39-macosx_10_9_x86_64.whl (374.2 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

cpt-1.3.3-cp38-cp38-win_amd64.whl (94.3 kB view details)

Uploaded CPython 3.8 Windows x86-64

cpt-1.3.3-cp38-cp38-win32.whl (78.0 kB view details)

Uploaded CPython 3.8 Windows x86

cpt-1.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (929.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

cpt-1.3.3-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (919.9 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

cpt-1.3.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (917.4 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

cpt-1.3.3-cp38-cp38-macosx_10_9_x86_64.whl (373.7 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file cpt-1.3.3.tar.gz.

File metadata

  • Download URL: cpt-1.3.3.tar.gz
  • Upload date:
  • Size: 115.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for cpt-1.3.3.tar.gz
Algorithm Hash digest
SHA256 ee935a158450b1391eda8b1d6810a42ade4040079442aa255b1c9c896ec14603
MD5 83b1686f65769ecc2dc8bf35ee3a8e56
BLAKE2b-256 b3ad3260eba1ab749bef0a707bc6195e0386147da19e70f9577f74286874f2ce

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: cpt-1.3.3-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 81.6 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for cpt-1.3.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 03590dd3ab345ae8181dbc1ca002f9ac170d2a9d5bc9a76a22576b82689d89dc
MD5 059ea75de4261af3a0df785928a52099
BLAKE2b-256 fc5bf00fa849bc58c73facd4e1ff88297ef84796ecddc444a902b5153789d176

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp311-cp311-win32.whl.

File metadata

  • Download URL: cpt-1.3.3-cp311-cp311-win32.whl
  • Upload date:
  • Size: 72.8 kB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for cpt-1.3.3-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 54a8c15474ce8ce369a45028802786eb2a4778986528e147127c37f858cd60ab
MD5 fb6612938f4152b86619d3b09e1e3347
BLAKE2b-256 4ad26ce5c993b1747c873dd8960f7ee56ab5ed04a6db9834db78f25deba7ce1d

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 550c112b2cb9f141fa1ab846a5857e49ba955a65432aa390d11220ba33c9bafd
MD5 18ea9d05fdfae1d7e5c8e89076c87bad
BLAKE2b-256 84d9ac0c25419a5f82250fb089d9914d9ff5266aad1abd9f804c38f0b795e164

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 10c46593e220aefa776641ba426048a63a63795f2cf23d8ec37037c8c4dae4fb
MD5 cbbe63e7041e1dea18932f26d4f658fa
BLAKE2b-256 a09277e27b50fbf0910d3c3ee4c56e2c390913cac351b232546270d7190542a9

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6c6b5eea5b7d78678553338f06f8cbc7f13cc5529f3932404bb424607bb12800
MD5 0c6a2005c2afb39f43624a6513eb315f
BLAKE2b-256 47590568a8c1ff0cf28be300938f03c93b9624caa3d7ac0562d02b1e043b4ea0

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 cbd62cf69b830a8aad32b3604b594c939004ca9d1f39cd2ca1d05b5ec17ac70f
MD5 86e70fbdb848f5094c1b5e4ee52fe8e6
BLAKE2b-256 288ff2388923583d5da359eaa114178a5c910ca3d98206f4c4e3cd36db322388

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: cpt-1.3.3-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 81.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for cpt-1.3.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 bff4ecc57b44b47051785abb0e3bdc0d7c3c500a65b6f9d2d7bdd0e80317d509
MD5 bb7ee3da5746771879e36a8b85115b9d
BLAKE2b-256 944b0d47c7a87b3c6bdc00d3553c4df2fa91150691e3ffc145309fd5d514f9fb

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp310-cp310-win32.whl.

File metadata

  • Download URL: cpt-1.3.3-cp310-cp310-win32.whl
  • Upload date:
  • Size: 73.4 kB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for cpt-1.3.3-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 b8e5ebc3beb676e0a877171dc5892b4f3609d2d2de82741e1aaeb913c3434c62
MD5 385ede9641a340d39f83a3cf3d056e0d
BLAKE2b-256 f7375eb4c525d9a4c092ee6392418a6cbccc94258c7a68fdd922e981a8c004d6

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6939ea183f800f069312f7766ed130f991a7858809c6d788c3f62d5e1add4aca
MD5 ff5f93016442f1be42861796bad306ec
BLAKE2b-256 4a90b4273c7373c605a51c53171f1c714f319f91c6597c82cfdea66aedc32522

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 4e9c387052e9596af0649860d350317bf26bc5b3ceda28b924058d0924f5bd25
MD5 f09c636172b85ff2b4cfc796ca7b5c14
BLAKE2b-256 c0305088f9ba08687856f1087498e79e25d1fc008b1d8f862ff6abfce27515fa

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d40673bb071120643ef98dbb4fd7b8ece86190d2deba16f8a75b4611460b12a9
MD5 98b68c877ccd089da98b0b41865c2dcf
BLAKE2b-256 7ee568bc3e8f38b313ac7965a877b18750d5342b5602719af7ff6add69466975

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 87fe051c3386f8a23f1f821ccc3781278bb69f05c6200dd841b4128562a8c443
MD5 084729596714ae8150fa123ecf840204
BLAKE2b-256 03a2201b2d2db21e6debce9b683071b963ba9c6406f7d9f42ed26f4d4e4f1de4

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: cpt-1.3.3-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 96.3 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cpt-1.3.3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 338c8aa29b7931465e9f48be63d9789ace93dd24216134bfb80af7b5dc45d9ca
MD5 94820cf2b123be620e54966755e909ae
BLAKE2b-256 9e47d299c7ccb435f3ce7f28ea6d5682950331bf0e195dd3d6a60871f7af966b

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp39-cp39-win32.whl.

File metadata

  • Download URL: cpt-1.3.3-cp39-cp39-win32.whl
  • Upload date:
  • Size: 82.3 kB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cpt-1.3.3-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 8c0c8028822429567148942727392f02defc05644c0ecda8725e1caa934c98d9
MD5 220dbf3e18f7c37aa703a3654cc31c81
BLAKE2b-256 8218152f2f8ccf23b3ed0946b1e40781d9e230bd8f79df18bced5e5e67247989

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 44d48f2130bc186b0b34a8a4c25d93b14e781a902bfff511bc88c3821d7f7869
MD5 ce6bea6cd9dbccdcbd7fb74693105c12
BLAKE2b-256 8ceb4b5b5b504be7b851378c5505ae4c8c4cd2430cc03e31b7f0ab7c81c00db4

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 4c7c313584ef083744063e97aa6edfd5238b4b639fafef0bb5b15986fcd13179
MD5 a46b5126a50a608829212d0336730a85
BLAKE2b-256 fafd03d2c8408ad8990b44e27e1a948a0764d150e0ccb3c81088701f6fb14e76

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8a0f500627790c2159e8d42a32099e0028ecfcae55df2037d9c9a76c14c1be99
MD5 eb7ae7edc323dd029a305ef8f14c0cf9
BLAKE2b-256 c0c3885dcf20bb2ea852e753b8b14561770cf576848cbd4c34ea6a07d51f08ad

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 8ce7a4f3d18892625a7b07247b9c8d98b81eaca46ccb2c55d1978053d913db09
MD5 637ebe80a362b0f9f2f1c363f918d1ab
BLAKE2b-256 ca839f7b4be881574a87fd78d0b449ba15840e8976df258d1c014514c650aaa3

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: cpt-1.3.3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 94.3 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.0

File hashes

Hashes for cpt-1.3.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 b2ec50cd9016ca8d2a17ec36733472a94291243255f0b68ea02dedecdf9116eb
MD5 87a21ed67606df5b2ffed5c41c397086
BLAKE2b-256 72a967eda325a8e3a16ee58b7b76e85674ce7431af11ac934b4a8e40dea5b269

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp38-cp38-win32.whl.

File metadata

  • Download URL: cpt-1.3.3-cp38-cp38-win32.whl
  • Upload date:
  • Size: 78.0 kB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.0

File hashes

Hashes for cpt-1.3.3-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 e7feb67c314b09639df723dd069459006fd471e9e1d47d5f7e80c72550e03848
MD5 1f71e66d1489007da10200b0ba75528f
BLAKE2b-256 357bc777223fdcb5e915ac1a35a3d846f8670a94facf17ae525027469c2e46e3

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f3bcb9cde2c2b6723043788d8747b60afc814eca74a84dbe51eb8c1bc8e9db0d
MD5 a8b4c7f79baf258b1e95a1efe6db220b
BLAKE2b-256 b455bf188a41a7b77665a25f4e4e7e887472ae39a14adc7b0aa2c0a014f277fc

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 f209fa4df74c76aabff9c700753059d8e688a629895a92fe6bfb4c203d2c43e0
MD5 5e0ce9b80f64856a2cf93425054ef257
BLAKE2b-256 c05d80e9490611fafa015bf5d1da028773a579ea9ff2b4eb6d10306e18c044fa

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8e0775f247839cae500cf7d4d3ac2fff60f2a5808457e7d1adf85befb17af4b4
MD5 d5d0e67d093173e6736b781beedd8425
BLAKE2b-256 cc99d126bd930c8d083e849b15e7dc10c179f3b65ae3368467d5fd7a1c075f7b

See more details on using hashes here.

File details

Details for the file cpt-1.3.3-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cpt-1.3.3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c188bb299e48448f024f2dcc7d6e6e490d1fbf50d78705a4f7910999e7db33f5
MD5 ea09c49aa84a24cc63848a7b8982dee2
BLAKE2b-256 9ce9bcac136e327b7c06c06ba58c7000aca33dd3734a9d0621dedaa31ae8e542

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page