Skip to main content

Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction

Project description

CPT

This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.

CPT is a sequence prediction algorithm. It is a highly explainable model and good at predicting, in a finite alphabet, next value of a sequence. However, given a sequence, CPT cannot predict an element already present in this sequence. CPT needs a tuning.

This implementation is based on the following research papers

http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf

http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf

Installation

You can simply use pip install cpt.

For windows users, the sources are precompiled. For others, you will need to install cython.

Simple example

You can test the model with the following code

from cpt.cpt import Cpt
model = Cpt()

model.fit([['hello', 'world'],
           ['hello', 'this', 'is', 'me'],
           ['hello', 'me']
          ])

model.predict([['hello'], ['hello', 'this']])
# Output: ['me', 'is']

For an example with the compatibility with sklearn, you should check the documentation.

Features

Train

The model can be trained with the fit method.

If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.

Multithreading

The predictions are launched by default with multithreading with OpenMP.

The predictions can also be launched in a single thread with the option multithread=False in the predict method.

You can control the number of threads by setting the following environment variable OMP_NUM_THREADS.

Pickling

You can pickle the model to save it, and load it later via pickle library.

from cpt.cpt import Cpt
import pickle


model = Cpt()
model.fit([['hello', 'world']])

dumped = pickle.dumps(model)

unpickled_model = pickle.loads(dumped)

print(model == unpickled_model)

Explainability

The CPT class has several methods to explain the predictions.

You can see which elements are considered as noise (with a low presence in sequences) with model.compute_noisy_items(noise_ratio).

You can retrieve trained sequences with model.retrieve_sequence(id).

You can find similar sequences with find_similar_sequences(sequence).

You can not yet retrieve automatically all similar sequences with the noise reduction technique.

Tuning

CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the documentation. To tune you can use the model_selection module from sklearn, you can find an example here on how to.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpt-1.0.2.tar.gz (103.8 kB view details)

Uploaded Source

Built Distributions

cpt-1.0.2-cp37-cp37m-win_amd64.whl (86.6 kB view details)

Uploaded CPython 3.7m Windows x86-64

cpt-1.0.2-cp37-cp37m-win32.whl (70.3 kB view details)

Uploaded CPython 3.7m Windows x86

cpt-1.0.2-cp36-cp36m-win_amd64.whl (86.8 kB view details)

Uploaded CPython 3.6m Windows x86-64

cpt-1.0.2-cp36-cp36m-win32.whl (70.4 kB view details)

Uploaded CPython 3.6m Windows x86

cpt-1.0.2-cp35-cp35m-win_amd64.whl (85.9 kB view details)

Uploaded CPython 3.5m Windows x86-64

cpt-1.0.2-cp35-cp35m-win32.whl (69.4 kB view details)

Uploaded CPython 3.5m Windows x86

File details

Details for the file cpt-1.0.2.tar.gz.

File metadata

  • Download URL: cpt-1.0.2.tar.gz
  • Upload date:
  • Size: 103.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.5.6

File hashes

Hashes for cpt-1.0.2.tar.gz
Algorithm Hash digest
SHA256 facc33644334c4a502c460d3fcef2cccc37db1369a1dde00e0cf4e7617f68791
MD5 135463c7d1a8046a7be9050d9decaad7
BLAKE2b-256 06e671674c2bff6da649426b807a4d9836528be7b34a7ac1f4d4e185bf628be7

See more details on using hashes here.

File details

Details for the file cpt-1.0.2-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.0.2-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 86.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for cpt-1.0.2-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 a6965b215727478e3709c990aa0cc8f8055cf0da228274ace03926a025a2993a
MD5 38a322e991e75d7bac97700cbbd67900
BLAKE2b-256 0622075d362e74677a127afc751a6b63999dbc46cc76c07023a4bb01e312adf7

See more details on using hashes here.

File details

Details for the file cpt-1.0.2-cp37-cp37m-win32.whl.

File metadata

  • Download URL: cpt-1.0.2-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 70.3 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for cpt-1.0.2-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 2a81eee043e10b575775033a2e45e4956a30d66e874d4c783f7fad826224ff8b
MD5 c3d2e439e518c78d35a6387775c2a439
BLAKE2b-256 16e661b08193016ac73abe4578d84b7d62c0a26847b5557824e3eed29a9dda3d

See more details on using hashes here.

File details

Details for the file cpt-1.0.2-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.0.2-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 86.8 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for cpt-1.0.2-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 67150f9eb80b3e6665383aeb7114ff9eeac981384fd39206297e309b3b2c906b
MD5 10ddec5a779ff0ac314ead74ab0f6c7e
BLAKE2b-256 032f347b2c1977ee2967bbce5ffc5044755c68aac49f6de5def11a1ba3071e91

See more details on using hashes here.

File details

Details for the file cpt-1.0.2-cp36-cp36m-win32.whl.

File metadata

  • Download URL: cpt-1.0.2-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 70.4 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for cpt-1.0.2-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 86049cc88a21ba6a932b21b1a236a260eeb0052b86be6d30bae594b6061da833
MD5 9c24eda940bff1c414d934b2286911d5
BLAKE2b-256 d204547225177ad528d22e91531d2527f894733d67df71048e85808665b9f0ad

See more details on using hashes here.

File details

Details for the file cpt-1.0.2-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.0.2-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 85.9 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.5.4

File hashes

Hashes for cpt-1.0.2-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 30e0f189b86a8f652640e6c36af52efdfd1764bb80b4946910d29fc0af54ed82
MD5 25214a16f7cfe66b19f4b5cacfbb9f4b
BLAKE2b-256 65e0c7c3a8755cd4bf59376fa9c3636b021ef6807256f4f4089f56ebbfcf2377

See more details on using hashes here.

File details

Details for the file cpt-1.0.2-cp35-cp35m-win32.whl.

File metadata

  • Download URL: cpt-1.0.2-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 69.4 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.5.4

File hashes

Hashes for cpt-1.0.2-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 679f945c4344a29c93b0cd65dee21ec2c39a273fdc6266abb4ea3313eb3a4638
MD5 a913de9a2f206c468f363bb452a8ca6e
BLAKE2b-256 9bfc9ed84061563ae1e8f178667ed641f525ba33ae3ddecf7b23b9c1863e673d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page