Skip to main content

Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction

Project description

CPT

This project is a cython open-source implementation of the Compact Prediction Tree algorithm using multithreading.

CPT is a sequence prediction algorithm. It is a highly explainable model and good at predicting, in a finite alphabet, next value of a sequence. However, given a sequence, CPT cannot predict an element already present in this sequence (Cf how to tune the "CPT model").

This implementation is based on the following research papers

http://www.philippe-fournier-viger.com/ADMA2013_Compact_Prediction_trees.pdf

http://www.philippe-fournier-viger.com/spmf/PAKDD2015_Compact_Prediction_tree+.pdf

Installation

You can simply use pip install cpt.

For windows users, the sources are precompiled. For others, the tarball will install all dependencies of cpt.

Simple example

You can test the model with the following code

from cpt.cpt import Cpt
model = Cpt()

model.fit([['hello', 'world'],
           ['hello', 'this', 'is', 'me'],
           ['hello', 'me']
          ])

model.predict([['hello'], ['hello', 'this']])
# Output: ['me', 'is']

For an example with the compatibility with sklearn, you should check the documentation.

Features

Train

The model can be trained with the fit method.

If needed the model can be retrained with the same method. It adds new sequences to the model and do not remove the old ones.

Multithreading

The predictions are launched by default with multithreading with OpenMP.

The predictions can also be launched in a single thread with the option multithread=False in the predict method.

You can control the number of threads by setting the following environment variable OMP_NUM_THREADS.

Pickling

You can pickle the model to save it, and load it later via pickle library.

from cpt.cpt import Cpt
import pickle


model = Cpt()
model.fit([['hello', 'world']])

dumped = pickle.dumps(model)

unpickled_model = pickle.loads(dumped)

print(model == unpickled_model)

Explainability

The CPT class has several methods to explain the predictions.

You can see which elements are considered as noise (with a low presence in sequences) with model.compute_noisy_items(noise_ratio).

You can retrieve trained sequences with model.retrieve_sequence(id).

You can find similar sequences with find_similar_sequences(sequence).

You can not yet retrieve automatically all similar sequences with the noise reduction technique.

Tuning

CPT has 3 meta parameters that need to be tuned. You can check how to tune them in the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpt-1.0.1.tar.gz (97.6 kB view details)

Uploaded Source

Built Distributions

cpt-1.0.1-cp37-cp37m-win_amd64.whl (86.8 kB view details)

Uploaded CPython 3.7m Windows x86-64

cpt-1.0.1-cp37-cp37m-win32.whl (70.3 kB view details)

Uploaded CPython 3.7m Windows x86

cpt-1.0.1-cp36-cp36m-win_amd64.whl (87.0 kB view details)

Uploaded CPython 3.6m Windows x86-64

cpt-1.0.1-cp36-cp36m-win32.whl (70.3 kB view details)

Uploaded CPython 3.6m Windows x86

cpt-1.0.1-cp35-cp35m-win_amd64.whl (85.9 kB view details)

Uploaded CPython 3.5m Windows x86-64

cpt-1.0.1-cp35-cp35m-win32.whl (69.3 kB view details)

Uploaded CPython 3.5m Windows x86

File details

Details for the file cpt-1.0.1.tar.gz.

File metadata

  • Download URL: cpt-1.0.1.tar.gz
  • Upload date:
  • Size: 97.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.6

File hashes

Hashes for cpt-1.0.1.tar.gz
Algorithm Hash digest
SHA256 1c5129e667d46263ea19f48a7f189fab4bff8650b8acedb4d76ca57245e526bb
MD5 e324720497df48ba08cfbbff9b078439
BLAKE2b-256 c2808c0ffb104adf74b30cd401e28e03c4e9f8e0abf14f556b836fe88cb25db5

See more details on using hashes here.

File details

Details for the file cpt-1.0.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.0.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 86.8 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for cpt-1.0.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 6a4c322b4e5b2af2c64657f21c8902f001bc9d4ab4f20a088a825a4c49131a68
MD5 e3fac11720fc1df1cdcbe8106d8d5dfb
BLAKE2b-256 24843f3f50cab30e675354923181c89902a586b550c3bec2c299839a6c45b195

See more details on using hashes here.

File details

Details for the file cpt-1.0.1-cp37-cp37m-win32.whl.

File metadata

  • Download URL: cpt-1.0.1-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 70.3 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for cpt-1.0.1-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 b0917f0558f6dad33278cd8394d3a525d49b29de72d049e0e12ac1764d00f1bf
MD5 3c286643aad205a44778e30948abb550
BLAKE2b-256 2c8416f392dbb7199320358e07b5b03e242e940c9aaddd0045f8360f03b110c6

See more details on using hashes here.

File details

Details for the file cpt-1.0.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.0.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 87.0 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for cpt-1.0.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 ee17d6f80d580fe087074c918b42338b46bc66666fa2adfe8151dcf66ee221ea
MD5 d4d53d837cb0ecc3b5d4b4486f1c7e21
BLAKE2b-256 0c821c3122c6ce5bc70c1e5b88d5f94e3618d5638033518ed2477cf71fd976cd

See more details on using hashes here.

File details

Details for the file cpt-1.0.1-cp36-cp36m-win32.whl.

File metadata

  • Download URL: cpt-1.0.1-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 70.3 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for cpt-1.0.1-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 f9a4181a781fff5808dba9436382b0988858d94a061375485b72e5eef3254f3f
MD5 a900aa9947dc71b03b442223b6040f00
BLAKE2b-256 118301290d4c0e6dd262e4b4a6088b19474c02076efa8946b6c85bc92d1eea9b

See more details on using hashes here.

File details

Details for the file cpt-1.0.1-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: cpt-1.0.1-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 85.9 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.4

File hashes

Hashes for cpt-1.0.1-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 ee097c00fe80f02d32238bfd21a872707176bfeebe20cb460a271e0e2bd40901
MD5 a5179e66d054943fd67d590c88ae4c3b
BLAKE2b-256 4084907a4211a30b51a20a79d60e7697f97315fc000576e0ca5b8e1bc816f3a1

See more details on using hashes here.

File details

Details for the file cpt-1.0.1-cp35-cp35m-win32.whl.

File metadata

  • Download URL: cpt-1.0.1-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 69.3 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.4

File hashes

Hashes for cpt-1.0.1-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 e724bdfea1669a5b558401ddaa9b933ed477ed7fc501303a5b2410096e9af4e7
MD5 601f1e4717fd4cb9128da9225ed20386
BLAKE2b-256 b94ec03cd5b13be21d0ea4eacd042d6b7f3d247f81f5709f5641131b06bc6390

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page