Skip to main content

Python Wrapper for SPMF

Project description

spmf-py

Python Wrapper for SPMF 🐍 🎁

Information

The SPMF [1] data mining Java library usable in Python.

Essentially, this module calls the Java command line tool of SPMF, passes the user arguments to it, and parses the output.
In addition, transformation of the data to Pandas DataFrame and CSV is possible.

In theory, all algorithms featured in SPMF are callable. Nothing is hardcoded, the desired algorithm and its parameters need to be perused in the SPMF documentation.

Installation

pip install spmf

Usage

Example:

from spmf import Spmf

spmf = Spmf("PrefixSpan", input_filename="contextPrefixSpan.txt",
            output_filename="output.txt", arguments=[0.7, 5])
spmf.run()
print(spmf.to_pandas_dataframe(pickle=True))
spmf.to_csv("output.csv")

Output:

=============  PREFIXSPAN 0.99-2016 - STATISTICS =============
 Total time ~ 2 ms
 Frequent sequences count : 14
 Max memory (mb) : 6.487663269042969
 minsup = 3 sequences.
 Pattern count : 14
===================================================

      pattern sup
0         [1]   4
1      [1, 2]   4
2      [1, 3]   4
3   [1, 3, 2]   3
4   [1, 3, 3]   3
5         [2]   4
6      [2, 3]   3
7         [3]   4
8      [3, 2]   3
9      [3, 3]   3
10        [4]   3
11     [4, 3]   3
12        [5]   3
13        [6]   3

The usage is similar to the one described in the SPMF documentation.
For all Python parameters, see the Spmf class.

SPMF Arguments

The arguments parameter are the arguments that are passed to SPMF and depend on the chosen algorithm. SPMF handles optional parameters as an ordered list. As there are no named parameters for the algorithms, if e.g. only the first and the last parameter of an algorithm are to be used, the ones in between must be filled with "" blank strings.
For advanced usage examples, see examples.

SPMF Executable

Download it from the SPMF Website.
It is assumed that the SPMF binary spmf.jar is located in the same directory as spmf-py. If it is not, either symlink it, or use the spmf_bin_location_dir parameter.

Input Formats

Either use an input file as specified by SPMF, or use one of the in-line formats as seen in examples.

Memory

The maxmimum memory can be increased in the constructor via Spmf(memory=n), where n is megabyte, see SPMF's FAQ.

Background

Why? If you're in a Python pipeline, like a Jupyter Notebook, it might be cumbersome to use Java as an intermediate step. Using spmf-py you can stay in your pipeline as though Java is never used at all.

Bibliography

Fournier-Viger, P., Lin, C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H. T. (2016).  
The SPMF Open-Source Data Mining Library Version 2.  
Proc. 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, Springer LNCS 9853,  pp. 36-40.

Disclaimer

This module has not been tested for all 184 algorithms offered in SPMF. Calling them and writing to the output file should be possible for all. Output parsing however should work for those that have outputs like the sequential pattern mining algorithms. It was not tested it with other types, some adaption of the output parsing might be necessary. If something is not working, submit an issue or create a PR yourself!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spmf-1.4.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

spmf-1.4-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file spmf-1.4.tar.gz.

File metadata

  • Download URL: spmf-1.4.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.26.0 setuptools/57.4.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.7

File hashes

Hashes for spmf-1.4.tar.gz
Algorithm Hash digest
SHA256 be11c0f24f35f2331230641f00cb0a556543d01b4bb527412110cb7b81ef263a
MD5 9197166ecd57c0b57ba69e4f36203f7e
BLAKE2b-256 8c275510f56efd5531371e42a23eb94198ade7a8d86f65db13ac7e20fa59007b

See more details on using hashes here.

File details

Details for the file spmf-1.4-py3-none-any.whl.

File metadata

  • Download URL: spmf-1.4-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.26.0 setuptools/57.4.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.7

File hashes

Hashes for spmf-1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7ccef6f0f40395fd25f0f4eb11c3b83fd5c1f33b544e9d2361db52fa4f317d83
MD5 3330d8e119e7aa60ce62e696fc990964
BLAKE2b-256 0d301c3ea99ab8d00975ade2ca8561ca2fb6c792df31cc6363fc725116047dbf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page