Skip to main content

Python Wrapper for SPMF

Project description

SPMF

Python Wrapper for SPMF Java library.

Information

This module contains python wrappers for pattern mining algorithms implemented in SPMF Java library. Each algorithm is implemented as a standalone Python class with fully descriptive and tested APIs. It also provides native support for Pandas dataframes.

Why? If you're in a Python pipeline, it might be cumbersome to use Java as an intermediate step. Using spmf-wrapper you can stay in your pipeline as though Java is never used at all.

Installation

pip install spmf-wrapper

A Java Runtime Environment is required to run this wrapper. If an existing installation is not detected, JRE v21 is automatically installed using install-jdk python module at $HOME/.jre/jdk-21.0.2+13-jre. If you prefer to install Java Runtime manually, follow instructions here. Test installation by running the following command on the terminal:

> java -version
java version "1.8.0_391"
Java(TM) SE Runtime Environment (build 1.8.0_391-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.391-b13, mixed mode)

Usage

Example:

from spmf import EMMA

emma = EMMA(min_support=2, max_window=2, timestamp_present=True, transform=True)
output = emma.run_pandas(input_df)

Input:

Time points Itemset
0 1 a
1 2 a
2 3 a
3 3 b
4 6 a
5 7 a
6 7 b
7 8 c
8 9 b
9 11 d

Output:

Frequent episode Support
0 a 5
1 b 3
2 a b 2
3 a-> a 3
4 a -> b 2
5 a -> a b 2

See examples for more details.

For a detailed explanation of the algorithm and parameters, refer to the corresponding webpage in the SPMF documentation.

Implementation Checklist

Sequential Pattern Mining

Algorithm Type Implemented
PrefixSpan Frequent Sequential Pattern
GSP Frequent Sequential Pattern
SPADE Frequent Sequential Pattern
CM-SPADE Frequent Sequential Pattern
SPAM Frequent Sequential Pattern
CM-SPAM Frequent Sequential Pattern
FAST Frequent Sequential Pattern
LAPIN Frequent Sequential Pattern
ClaSP Frequent Closed Sequential Pattern
CM-ClaSP Frequent Closed Sequential Pattern
CloFAST Frequent Closed Sequential Pattern
CloSpan Frequent Closed Sequential Pattern
BIDE+ Frequent Closed Sequential Pattern
Post Processing SPAM or PrefixSpan Frequent Closed Sequential Pattern
MaxSP Frequent Maximal Sequential Pattern
VMSP Frequent Maximal Sequential Pattern
FEAT Frequent Sequential Generator Pattern
FSGP Frequent Sequential Generator Pattern
VGEN Frequent Sequential Generator Pattern
NOSEP Non-overlapping Sequential Pattern
GoKrimp Compressing Sequential Pattern
TKS Top-k Frequent Sequential Pattern
TSP Top-k Frequent Sequential Pattern

Episode Mining

Algorithm Type Implemented
EMMA Frequent Episode
AFEM Frequent Episode
MINEPI Frequent Episode
MINEPI+ Frequent Episode
TKE Top-k Frequent Episodes
MaxFEM Maximal Frequent Episodes
POERM Episode Rules
POERM-ALL Episode Rules
POERMH Episode Rules
NONEPI Episode Rules
TKE-Rules Episode Rules
AFEM-Rules Episode Rules
EMMA-Rules Epsiode Rules
MINEPI+-Rules Episode Rules
HUE-SPAN High Utility Episodes
US-SPAN High Utility Episodes
TUP Top-K High Utility Episodes

Bibliography

Fournier-Viger, P., Lin, C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H. T. (2016).
The SPMF Open-Source Data Mining Library Version 2.
Proc. 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, Springer LNCS 9853,  pp. 36-40.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spmf-wrapper-0.5.0.tar.gz (12.1 MB view hashes)

Uploaded Source

Built Distribution

spmf_wrapper-0.5.0-py3-none-any.whl (12.1 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page