Skip to main content

GSP Python implementation

Project description

GSP-python

A Python implementation of the Generalized Sequential Patterns (GSP) algorithm for sequential pattern mining

This project implements the Generalized Sequential Patterns (GSP) algorithm to find frequent sequences within a given dataset. This implementation includes parameters for the mingap, maxgap, and maxspan time constraints.

The project also features a simple dataset generator.


Installation

The package can be installed via pip:

python3 -m pip install gsp_python

Usage

The GSP algorithm and the dataset generator can be executed either from the command line or by importing the package modules in a script.


From the command line

To run the GSP algorithm:

python3 -m gsp_python GSP infile outfile minsup -t maxgap mingap maxspan

Where:

  • infile: specifies the path of the file containing the dataset from which sequences must be mined from. The file must be a text file in which each data-sequence is terminated by ' -2', each element is terminated by ' -1', and each event is separated by a space.
  • outfile: specifies the path of the output file where the result will be printed to. It will contain all frequent sequences found, each paired with their support count.
  • minsup: specifies the minimum support threshold used during execution.
  • -t maxgap mingap maxspan (optional): specifies the maxgap, mingap, and maxspan values used during execution. If not specified, the default values of inf, 0, and inf will be used instead.

For more information about additional optional arguments, type:

python3 -m gsp_python GSP -h

To generate a random dataset:

python3 -m gsp_python DatasetGen outfile size nevents maxevents avgelems

Where:

  • outfile: specifies the path of the output file where the dataset will be printed to. The format used is the same as the one accepted as input for the algorithm above.
  • size: specifies the number of data-sequences.
  • nevents: specifies the number of unique events.
  • maxevents: specifies the maximum number of events per element.
  • avgelems: specifies the average number of elements per data-sequence.

For more information about additional optional arguments, type:

python3 -m gsp_python DatasetGen -h

From within a script

To run the GSP algorithm, use gsp_python.gsp.GSP() to create and initialize a GSP object, providing the required arguments; then, call method run_gsp() to execute the algorithm (the result is returned as a list of tuples, each pairing a sequence with its support count).

An example is given below:

from gsp_python.gsp import load_ds
from gsp_python.gsp import GSP

dataset, dict1, dict2 = load_ds("path/to/file.txt")

algo_gsp = GSP(dataset, minsup=0.3, mingap=1, maxgap=2, maxspan=5)
output = algo_gsp.run_gsp()

Method load_ds() loads the dataset contained in the file at the specified path (provided that it follows the format explained above), converting all events to integers. It also returns the dictionary (here assigned to dict1) that can be used to convert each integer back to the corresponding event.


To generate a random dataset, use gsp_python.dataset_gen.DatasetGenerator() to create and initialize a DatasetGenerator() object, providing the required arguments; then, call method generate_sequence_dataset() to generate a dataset (the dataset is returned as a list[list[list[int]]]).

An example is given below:

from gsp_python.dataset_gen import DatasetGenerator

algo_dsgen = DatasetGenerator(size=100, nevents=8, maxevents=4, avgelems=16)
algo_dsgen.generate_sequence_dataset()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gsp_python-0.0.10.tar.gz (11.1 kB view hashes)

Uploaded Source

Built Distribution

gsp_python-0.0.10-py3-none-any.whl (11.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page