Skip to main content

Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction

Project description

CPT

This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.

CPT is a sequence prediction algorithm. It is a highly explainable model and good at predicting, in a finite alphabet, next value of a sequence. However, given a sequence, CPT cannot predict an element already present in this sequence. CPT needs a tuning.

This implementation is based on the following research papers

http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf

http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf

Installation

You can simply use pip install cpt.

For unix users, no wheels are published (problem with auditwheel which cannot repair wheels to "manylinux"), you should install cython then cpt: pip install cython cpt.

However unix users can simply install from sources: pip install cython && python setup.py install.

For osx users, do not forget to install brew's llvm and libomp. You can follow the directives of this issue: https://github.com/bluesheeptoken/CPT/issues/68

Simple example

You can test the model with the following code

from cpt.cpt import Cpt
model = Cpt()

model.fit([['hello', 'world'],
           ['hello', 'this', 'is', 'me'],
           ['hello', 'me']
          ])

model.predict([['hello'], ['hello', 'this']])
# Output: ['me', 'is']

For an example with the compatibility with sklearn, you should check the documentation.

Features

Train

The model can be trained with the fit method.

If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.

Multithreading

The predictions are launched by default with multithreading with OpenMP.

The predictions can also be launched in a single thread with the option multithread=False in the predict method.

You can control the number of threads by setting the following environment variable OMP_NUM_THREADS.

Pickling

You can pickle the model to save it, and load it later via pickle library.

from cpt.cpt import Cpt
import pickle


model = Cpt()
model.fit([['hello', 'world']])

dumped = pickle.dumps(model)

unpickled_model = pickle.loads(dumped)

print(model == unpickled_model)

Explainability

The CPT class has several methods to explain the predictions.

You can see which elements are considered as noise (with a low presence in sequences) with model.compute_noisy_items(noise_ratio).

You can retrieve trained sequences with model.retrieve_sequence(id).

You can find similar sequences with find_similar_sequences(sequence).

You can not yet retrieve automatically all similar sequences with the noise reduction technique.

Tuning

CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the documentation. To tune you can use the model_selection module from sklearn, you can find an example here on how to.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpt-1.2.0.tar.gz (105.5 kB view details)

Uploaded Source

Built Distributions

cpt-1.2.0-cp37-cp37m-win_amd64.whl (86.8 kB view details)

Uploaded CPython 3.7m Windows x86-64

cpt-1.2.0-cp37-cp37m-win32.whl (70.5 kB view details)

Uploaded CPython 3.7m Windows x86

cpt-1.2.0-cp37-cp37m-macosx_10_13_x86_64.whl (94.0 kB view details)

Uploaded CPython 3.7m macOS 10.13+ x86-64

cpt-1.2.0-cp36-cp36m-win_amd64.whl (86.9 kB view details)

Uploaded CPython 3.6m Windows x86-64

cpt-1.2.0-cp36-cp36m-win32.whl (70.5 kB view details)

Uploaded CPython 3.6m Windows x86

cpt-1.2.0-cp36-cp36m-macosx_10_13_x86_64.whl (96.9 kB view details)

Uploaded CPython 3.6m macOS 10.13+ x86-64

cpt-1.2.0-cp35-cp35m-win_amd64.whl (86.0 kB view details)

Uploaded CPython 3.5m Windows x86-64

cpt-1.2.0-cp35-cp35m-win32.whl (69.5 kB view details)

Uploaded CPython 3.5m Windows x86

cpt-1.2.0-cp35-cp35m-macosx_10_13_x86_64.whl (94.8 kB view details)

Uploaded CPython 3.5m macOS 10.13+ x86-64

File details

Details for the file cpt-1.2.0.tar.gz.

File metadata

  • Download URL: cpt-1.2.0.tar.gz
  • Upload date:
  • Size: 105.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.7.5

File hashes

Hashes for cpt-1.2.0.tar.gz
Algorithm Hash digest
SHA256 b82eaf7027f0094782b8d6756953bab74f5a44912e47c436b64ee9eada524b62
MD5 d0efa3d421900521ec5bdb6637b031f7
BLAKE2b-256 f436439e4596e2a9df0f3b49e7795aa0e15a3c3cb1d26231b925c727ea58c7a5

See more details on using hashes here.

File details

Details for the file cpt-1.2.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.2.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 86.8 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.7.5

File hashes

Hashes for cpt-1.2.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 bd201aa394d43804802950b99a46b827c494c99a32f3f11c36d4f49a4fba7b1f
MD5 54b995438f1431a1a8e5351a60315522
BLAKE2b-256 b6778d70efcc111b7d29f752d2524b9d16f81ae83a68b5b475fde3d021482f1c

See more details on using hashes here.

File details

Details for the file cpt-1.2.0-cp37-cp37m-win32.whl.

File metadata

  • Download URL: cpt-1.2.0-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 70.5 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.7.5

File hashes

Hashes for cpt-1.2.0-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 a7b32e77dbcf0f9cee9aa66de7101253f2334175c576dc61e740e66e8752de0b
MD5 1b5b88c2411498dd188e6a7bad2c0394
BLAKE2b-256 5d2798ee094f22a22563dfb152377f8883bbc324e32701be3cec0cb697826b48

See more details on using hashes here.

File details

Details for the file cpt-1.2.0-cp37-cp37m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: cpt-1.2.0-cp37-cp37m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 94.0 kB
  • Tags: CPython 3.7m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.7.5

File hashes

Hashes for cpt-1.2.0-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 059f90be67a268d7702c307a809da4b05bc2ce54ed13540beb91f9daa7c1e2af
MD5 9d9d401ceb7507da17268c7132786c9b
BLAKE2b-256 84619fea97f5dd62a49510a1f8f0710532d23e1c86ab508ca8548e850e03f892

See more details on using hashes here.

File details

Details for the file cpt-1.2.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.2.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 86.9 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.8

File hashes

Hashes for cpt-1.2.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 f84b9d25bda51322cffa26db44afe31a5353b8306964b5b18420bad837619d88
MD5 ce0e769ee176649baf1a6642b4372ca3
BLAKE2b-256 7464a425dc09abcc2ca9f2fcedb2cc1162f92d3a1a7bcb29e48a14c494888bc2

See more details on using hashes here.

File details

Details for the file cpt-1.2.0-cp36-cp36m-win32.whl.

File metadata

  • Download URL: cpt-1.2.0-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 70.5 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.8

File hashes

Hashes for cpt-1.2.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 edbf4b325af716d3ad2a18a78281fb5aada9f750c8fb34b6571765e0897ef49e
MD5 9c18c8089608876ed8800325e4025cb7
BLAKE2b-256 856f3a557e40116f2c56f76b04de44e008fefd9851a2bc870b2a776048490708

See more details on using hashes here.

File details

Details for the file cpt-1.2.0-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: cpt-1.2.0-cp36-cp36m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 96.9 kB
  • Tags: CPython 3.6m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.10

File hashes

Hashes for cpt-1.2.0-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 bb7dc8a8ec2ecaeaa676989642fc8205db6476dd0a0c5d8362fe4fc742285916
MD5 47a87b2c058e2c636ba6ec59a054d672
BLAKE2b-256 452382fe6fabd6cdf5f315878f8b718213b8b717be1627c2c33e7afb2e5405ff

See more details on using hashes here.

File details

Details for the file cpt-1.2.0-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.2.0-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 86.0 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.5.4

File hashes

Hashes for cpt-1.2.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 5eabc3638b0175794768e97bd2ace7e5d0165404286ab715eb10f890e2d63a63
MD5 1c0dc8dd64b433aede35435a1ca2a1ed
BLAKE2b-256 0f1c6c68b4b94cf614003f868221033ec76379ae05303a2add75c9148784d989

See more details on using hashes here.

File details

Details for the file cpt-1.2.0-cp35-cp35m-win32.whl.

File metadata

  • Download URL: cpt-1.2.0-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 69.5 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.5.4

File hashes

Hashes for cpt-1.2.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 55d0f9e0726a0b4c37327ba69c586d269e46a3a412ad741dc8ca0b93a8d09b70
MD5 7ff0331f43bed38b2b62957fba56857f
BLAKE2b-256 c2a7cc41252d2894ed4a441c17293365fde394dff3625b94a4738948585a4cf2

See more details on using hashes here.

File details

Details for the file cpt-1.2.0-cp35-cp35m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: cpt-1.2.0-cp35-cp35m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 94.8 kB
  • Tags: CPython 3.5m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.5.9

File hashes

Hashes for cpt-1.2.0-cp35-cp35m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 5f09107126eb7e19c912c24fcf6d05e2311aa341fe7cfee131f4ba1a38bda43e
MD5 1dd3b8075207adcf6c184ca49d961ba9
BLAKE2b-256 9b055335f3ae7c1e4b12fcbdcba7cd0d8c0245aead2a5729ef453236d5f99ceb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page