Skip to main content

GSP Python implementation

Project description

GSP-python

A Python implementation of the Generalized Sequential Patterns (GSP) algorithm for sequential pattern mining

This project implements the Generalized Sequential Patterns (GSP) algorithm to find frequent sequences within a given dataset. This implementation includes parameters for the mingap, maxgap, and maxspan time constraints.

The project also features a simple dataset generator.


Installation

The package can be installed via pip:

python3 -m pip install gsp_python

Usage

The GSP algorithm and the dataset generator can be executed either from the command line or by importing the package modules in a script.


From the command line

To run the GSP algorithm:

python3 -m gsp_python GSP infile outfile minsup -t maxgap mingap maxspan

Where:

  • infile: specifies the path of the file containing the dataset from which sequences must be mined from. The file must be a text file in which each data-sequence is terminated by ' -2', each element is terminated by ' -1', and each event is separated by a space.
  • outfile: specifies the path of the output file where the result will be printed to. It will contain all frequent sequences found, each paired with their support count.
  • minsup: specifies the minimum support threshold used during execution.
  • -t maxgap mingap maxspan (optional): specifies the maxgap, mingap, and maxspan values used during execution. If not specified, the default values of inf, 0, and inf will be used instead.

For more information about additional optional arguments, type:

python3 -m gsp_python GSP -h

To generate a random dataset:

python3 -m gsp_python DatasetGen outfile size nevents maxevents avgelems

Where:

  • outfile: specifies the path of the output file where the dataset will be printed to. The format used is the same as the one accepted as input for the algorithm above.
  • size: specifies the number of data-sequences.
  • nevents: specifies the number of unique events.
  • maxevents: specifies the maximum number of events per element.
  • avgelems: specifies the average number of elements per data-sequence.

For more information about additional optional arguments, type:

python3 -m gsp_python DatasetGen -h

From within a script

To run the GSP algorithm, use gsp_python.gsp.GSP() to create and initialize a GSP object, providing the required arguments; then, call method run_gsp() to execute the algorithm (the result is returned as a list of tuples, each pairing a sequence with its support count).

An example is given below:

from gsp_python.gsp import load_ds
from gsp_python.gsp import GSP

dataset, dict1, dict2 = load_ds("path/to/file.txt")

algo_gsp = GSP(dataset, minsup=0.3, mingap=1, maxgap=2, maxspan=5)
output = algo_gsp.run_gsp()

Method load_ds() loads the dataset contained in the file at the specified path (provided that it follows the format explained above), converting all events to integers. It also returns the dictionary (here assigned to dict1) that can be used to convert each integer back to the corresponding event.


To generate a random dataset, use gsp_python.dataset_gen.DatasetGenerator() to create and initialize a DatasetGenerator() object, providing the required arguments; then, call method generate_sequence_dataset() to generate a dataset (the dataset is returned as a list[list[list[int]]]).

An example is given below:

from gsp_python.dataset_gen import DatasetGenerator

algo_dsgen = DatasetGenerator(size=100, nevents=8, maxevents=4, avgelems=16)
algo_dsgen.generate_sequence_dataset()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gsp_python-0.0.11.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gsp_python-0.0.11-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file gsp_python-0.0.11.tar.gz.

File metadata

  • Download URL: gsp_python-0.0.11.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for gsp_python-0.0.11.tar.gz
Algorithm Hash digest
SHA256 7e7fad993226c63d53e0580c3f2835b5a7d51c3a96e695d4f8ed77857d68fa9f
MD5 7df8d3d3fa642864bd7387cd836411f9
BLAKE2b-256 54c0fbe18463b47e195cad406848bd5b33bc3380ba7757fc4fe70f00ed831791

See more details on using hashes here.

File details

Details for the file gsp_python-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: gsp_python-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for gsp_python-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 c051360ff05d9f3ae4049e94f0a26a38c928d028409d33e20e6e5cb575700442
MD5 d1a6b1d907d7926da12b48a069a81245
BLAKE2b-256 17c50388b295638c57402adcda5efd75f829b93eda056a020294329fad7d3bfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page